Skip to main content
eLife logoLink to eLife
. 2024 Aug 14;13:RP93260. doi: 10.7554/eLife.93260

Maternal smoking DNA methylation risk score associated with health outcomes in offspring of European and South Asian ancestry

Wei Q Deng 1,2,3,, Nathan Cawte 4, Natalie Campbell 1, Sandi M Azab 1,5, Russell J de Souza 1,5, Amel Lamri 1,4, Katherine M Morrison 6, Stephanie A Atkinson 6, Padmaja Subbarao 7, Stuart E Turvey 8, Theo J Moraes 7,9, Koon K Teo 1,4,5, Piush J Mandhane 10, Meghan B Azad 11, Elinor Simons 12, Guillaume Paré 4,5,13,14, Sonia S Anand 1,4,5,
Editors: Joris Deelen15, Carlos Isales16
PMCID: PMC11324234  PMID: 39141540

Abstract

Background:

Maternal smoking has been linked to adverse health outcomes in newborns but the extent to which it impacts newborn health has not been quantified through an aggregated cord blood DNA methylation (DNAm) score. Here, we examine the feasibility of using cord blood DNAm scores leveraging large external studies as discovery samples to capture the epigenetic signature of maternal smoking and its influence on newborns in White European and South Asian populations.

Methods:

We first examined the association between individual CpGs and cigarette smoking during pregnancy, and smoking exposure in two White European birth cohorts (n=744). Leveraging established CpGs for maternal smoking, we constructed a cord blood epigenetic score of maternal smoking that was validated in one of the European-origin cohorts (n=347). This score was then tested for association with smoking status, secondary smoking exposure during pregnancy, and health outcomes in offspring measured after birth in an independent White European (n=397) and a South Asian birth cohort (n=504).

Results:

Several previously reported genes for maternal smoking were supported, with the strongest and most consistent association signal from the GFI1 gene (6 CpGs with p<5 × 10-5). The epigenetic maternal smoking score was strongly associated with smoking status during pregnancy (OR = 1.09 [1.07, 1.10], p=5.5 × 10-33) and more hours of self-reported smoking exposure per week (1.93 [1.27, 2.58], p=7.8 × 10-9) in White Europeans. However, it was not associated with self-reported exposure (p>0.05) among South Asians, likely due to a lack of smoking in this group. The same score was consistently associated with a smaller birth size (–0.37±0.12 cm, p=0.0023) in the South Asian cohort and a lower birth weight (–0.043±0.013 kg, p=0.0011) in the combined cohorts.

Conclusions:

This cord blood epigenetic score can help identify babies exposed to maternal smoking and assess its long-term impact on growth. Notably, these results indicate a consistent association between the DNAm signature of maternal smoking and a small body size and low birth weight in newborns, in both White European mothers who exhibited some amount of smoking and in South Asian mothers who themselves were not active smokers.

Funding:

This study was funded by the Canadian Institutes of Health Research Metabolomics Team Grant: MWG-146332.

Research organism: Human

Introduction

Maternal smoking has adverse effects on offspring health including pre-term delivery (Stock and Bauld, 2020; Liu et al., 2020), stillbirth (Marufu et al., 2015), and low birth weight (Ventura et al., 2003), and is associated with pregnancy complications such as maternal higher blood pressure, and gestational diabetes (National Center for Chronic Disease Prevention and Health Promotion (US) Office on Smoking and Health, 2014). Consistent with the Developmental Origins of Health and Disease (DOHaD) hypothesis, maternal smoking exposes the developing fetus to harmful chemicals in tobacco that negatively impact the health of newborns, resulting in early-onset metabolic diseases, such as childhood obesity (Montgomery and Ekbom, 2002; Toschke et al., 2002; Oken et al., 2008; Philips et al., 2020). Yet self-reported smoking status is subject to underreporting among pregnant women (England et al., 2007; Shipton et al., 2009; Salmasi et al., 2010). This could subsequently impact the effectiveness of interventions aimed at reducing smoking during pregnancy and may skew data on the risks associated with maternal smoking.

DNA methylation is one of the most commonly studied epigenetic mechanisms by which cells regulate gene expression, and is increasingly recognized for its potential as a biomarker (Yousefi et al., 2022). Differential DNA methylation has been established as a reliable biochemical response to cigarette smoking and was shown to capture the long-lasting effects of persistent smoking in ex-smokers (Shenker et al., 2013; Joehanes et al., 2016; Guida et al., 2015). Recent large epigenome-wide association studies (EWAS) have robustly identified differentially methylated cytosine–phosphate–guanine (CpG) sites associated with adult smoking (Joehanes et al., 2016; Sikdar et al., 2019; Zeilinger et al., 2013) and maternal smoking (Joubert et al., 2016; Hannon et al., 2019). Our recent systematic review of 17 cord blood EWAS found that out of the 290 CpG sites reported to be associated with at least one of the following: maternal diabetes, pre-pregnancy body mass index (BMI), diet during pregnancy, smoking, and gestational age, 19 sites were identified in more than one study and all of them associated with maternal smoking (Akhabir et al., 2022). Furthermore, these findings have led to a more thorough investigation of the epigenetic mechanisms underlying associations between well-established epidemiological exposures and outcomes, such as the relationship between maternal smoking and birth weight in Europeans (Hannon et al., 2019; Witt et al., 2018; Küpers et al., 2015; Xu et al., 2021; Cardenas et al., 2019) and the less studied African American populations (Xu et al., 2021) as well as between maternal diet and cardiovascular health (Murray et al., 2021).

Only a handful of cohort studies were designed to assess the influence of maternal exposures on DNA methylation changes in non-Europeans (Xu et al., 2021; Raynor and Born in Bradford Collaborative Group, 2008). It has been suggested that systematic patterns of methylation (Elliott et al., 2022), such as cell composition, could differ between individuals of different ancestral backgrounds, which could in turn confound the association between differential DNAm and smoking behaviors (Choquet et al., 2021). These systematic differences also contribute to different smoking-related methylation signals at individual CpGs (Elliott et al., 2014). Thus, a comparative study of maternal smoking exposure is a first step towards generalizing existing EWAS results to other populations and a necessary step towards addressing health disparities that exist between populations due to societal privilege, including race or ethnicity and socioeconomic factors.

A promising direction in epigenetic studies of adult smoking is the application of a methylation score Bollepalli et al., 2019; this strategy can also be applied to disseminate current knowledge on differential DNA methylation studies of maternal smoking. A methylation score is usually tissue-specific and combines information from multiple CpGs using statistical models (Yousefi et al., 2022). Reducing the number of predictors and measurement noise in the data can lead to better statistical power and a more parsimonious instrument for subsequent analyses. It is also of interest to determine whether methylation scores demonstrate the capacity to predict outcomes in diverse human populations, given the presence of systematic differences in methylation patterns due to ancestral backgrounds (Elliott et al., 2022).

In this paper, we investigated the epigenetic signature of maternal smoking on cord blood DNA methylation in newborns, as well as its association with newborn and later life outcomes in one South Asian which refers to people who originate from the Indian subcontinent, and two predominantly European-origin birth cohorts. Similar to the Born in Bradford study (Wright et al., 2013), we observed several differentiating epidemiological characteristics between South Asian and European-origin mothers. Notably, almost none of the South Asian mothers were current smokers and had low smoking rates pre-pregnancy as compared to European mothers, which is consistent with the broader trends of lower smoking rates in South Asian females (Reitsma et al., 2021). Another relevant observation is the small birth size and low birth weight in the South Asian newborns. These differences in newborn size and weight may be influenced by various factors, including maternal nutrition, genetics, and socioeconomic status. Keeping these differences in mind, we first conducted cohort-specific epigenetic association studies between available CpGs and maternal smoking in the predominantly European-origin cohorts, benchmarking with previously identified CpGs for maternal smoking and adult smoking. Second, we leveraged the reported summary statistics from existing large EWASs to construct a methylation risk score (MRS) for maternal smoking. The MRS was first internally validated in one of the European-origin cohorts and then tested in a second independent European-origin cohort. Third, we examined the association between maternal smoking MRS and newborn health outcomes, including length, weight, BMI ponderal index, and early-life anthropometrics in both European and South Asian cohorts.

Materials and methods

Study population

The NutriGen Alliance is a consortium consisting of four prospective, population-based birth cohorts that enrolled birthing mother and newborn pairs in Canada. Details of these cohorts have been described elsewhere (de Souza et al., 2016). The current investigation focused on (i). European-origin offspring from the population-based CHILD study who were selected for methylation analysis, (ii). The Family Atherosclerosis Monitoring In early life (FAMILY) study that is predominately European-origin, and (iii). The SouTh Asian biRth cohorT (START) study that is exclusively comprised of people who originated from the Indian subcontinent known as South Asians. The ethnicity of the parents was self-reported and recorded at baseline in all three cohorts. Biological samples, clinical assessments, and questionnaires were used to derive health phenotypes and an array of genetic, epigenetic, and metabolomic data. The superordinate goal of the NutriGen study is to understand how nutrition, environmental exposures, and physical health of mothers impact the health and early development of their offspring using a multi-omics approach.

Methylation data processing and quality controls

Newborn cord blood samples were processed using two methylation array technologies. About half of the START samples and selected samples from CHILD were hybridized to the Illumina Human-Methylation450K BeadChip (HM450K) array, which covers CpGs in the entire genome (Bibikova et al., 2011) The raw methylation data were generated by the Illumina iScan software and separately pre-processed for START and CHILD using the ‘sesame’ R package following pipelines designed for HM450K BeadChip (Zhou et al., 2018). The FAMILY samples were profiled using a targeted array based on the Infinium Methylation EPIC designed by the Genetic and Molecular Epidemiology Laboratory (GMEL; Hamilton, Canada). The GMEL customized array includes ~3000 CpG sites that were previously reported to associate with complex traits or exposures and was designed to maximize discovery while keeping the costs of profiling epigenome-wide DNA methylation down. The targeted methylation data were pre-processed using a customized quality control pipeline and functions from the ‘sesame’ R package (Wanding Zhou, 2018) recommended for EPIC.

Pre-processed data were then used to derive the β-value matrix, where each column gives the methylation level at a CpG site as a ratio of the probe intensity to the overall probe intensity. Additional quality control filters were applied to the final beta-value matrices to remove samples with >10% missing probes and CpG probes with >10% samples missing. Cross-reactive probes and SNP probes were removed as recommended for HM450 (Chen et al., 2013) and EPIC arrays (Zhou et al., 2017; Pidsley et al., 2016). For CpG probes with a missing rate <10%, mean imputation was used to fill in the missing values. We further excluded samples that were either mismatches between reported sex and methylation-inferred sex or were duplicates. Finally, considering the low prevalence of smokers, we sought to reduce spurious associations by removing non-informative probes that were either all hypomethylated (β-value<0.1) or hypermethylated (β-value>0.9), which have been shown to have less optimal performance (Hillary et al., 2022). A summary of the sample and probe inclusion/exclusion is shown in Supplementary file 1a.

Cell-type proportions (CD8T, CD4T, Natural Killer cells, B cells, monocytes, granulocytes, and nucleated red blood cells) were estimated following a reference-based approach developed for cord blood (Gervin et al., 2019) and using R packages ‘FlowSorted.CordBloodCombined.450k’ and ‘FlowSorted.Blood.EPIC.’ All data processing and subsequent analyses were conducted in R v.4.1.0 (R Development Core Team, 2021).

Phenotype data processing and quality controls

At the time of enrollment, all pregnant women completed a comprehensive questionnaire that collected information on prenatal diet, smoking, education, socioeconomic factors, physical activities, and health as detailed previously (Morrison et al., 2009; Anand et al., 2013). Maternal smoking history (0=never smoked, 1=quit before this pregnancy, 2=quit during this pregnancy, or 3=current smoker) was assessed during the second trimester (at baseline). Smoke exposure was measured as the ‘number of hours exposed per week.’ GDM was determined based on a combination of oral glucose tolerance test (OGTT), self-report, and reported diabetic treatments (insulin, pills, or restricted diet). For South Asian mothers in START, the same OGTT threshold as Born in Bradford (Raynor and Born in Bradford Collaborative Group, 2008; Wright et al., 2013) was used, while the International Association of the Diabetes and Pregnancy Study Groups (IASDPSG) criteria (Metzger et al., 2010) for OGTT were used in CHILD and FAMILY cohorts. Mode of delivery (emergency c-section vs. other) was collected at the time of delivery.

Newborn length and weight were collected immediately after birth and extracted from the medical chart. The newborns were then followed up at 1, 2, 3, and 5 years of age and provided basic anthropometric measurements, including height, weight, hip and waist circumference, BMI, and sum of the skinfolds (triceps skinfold and subscapular skinfold). Additional phenotypes included smoking exposures (hours per week) at home, potential allergy based on the mother reporting any of: eczema, hay fever, wheezing , asthma, food allergy (egg, cow milk, soy, other) for her child in FAMILY and START, and asthma based on mother’s opinion in CHILD (‘In your opinion, does the child have any of the following? Asthma’).

Phenotype and methylation data consolidation

The current investigation examines the impact of maternal smoking or smoke exposure on DNA methylation derived from newborn cord blood in START and the two predominately European cohorts (CHILD and FAMILY). To maximize sample size in FAMILY and CHILD, we retained either self-identified or genetically confirmed Europeans based on available genetic data (Supplementary file 1a). The cohorts consist of representative population samples without enrichment for any clinical conditions, though only singleton mothers were invited to participate. For continuous phenotypes, an analysis of variance (ANOVA) using the F-statistics or a two-sample t-test was used to compare the mean difference across the three cohorts or two groups, respectively. For categorical phenotypes, a chi-square test of independence was used to compare the differences in frequencies of observed categories. Note that three of the categories under smoking history in the START cohort had expected cell counts less than 5, and was thus excluded from the comparison, the reported p-value was for CHILD and FAMILY.

The final analytical datasets, after combining the quality-controlled methylation data and phenotypic data, included 352, 411, and 504 mother-newborn pairs from CHILD, FAMILY, and START, respectively. Demographic characteristics and relevant covariates of the epigenetic subsample and the overall sample are summarized in Table 1 and Supplementary file 1b, respectively.

Table 1. Characteristics of the epigenetic subsample (1267 mother–newborn pairs) from the CHILD, FAMILY, START cohorts.

Phenotypes CHILD FAMILY START ANOVA F-test or Chi-squared test p-value for differences
(n=352) (n=411) (n=504)
Mother Smoking History
never smoked 247 (70.2%) 253 (61.6%) 501 (99.4%) <0.001*
quit before this pregnancy 72 (20.5%) 58 (14.1%) 1 (0.2%)
quit during this pregnancy 17 (4.8%) 57 (13.9%) 1 (0.2%)
currently smoking 11 (3.1%) 29 (7.1%) 0 (0%)
Missing 5 (1.4%) 14 (3.4%) 1 (0.2%)
Smoking Exposure (hr/week)
Mean (SD) 0.97 (±7.64) 2.52 (±12.83) 0.33 (±2.67) <0.001
Missing 12 (3.4%) 5 (1.2%) 42 (8.3%)
Gestational Diabetes Mellitus
YES 16 (4.5%) 66 (16.1%) 183 (36.3%) <0.001
NO 336 (95.5%) 345 (83.9%) 320 (63.5%)
Missing 0 (0%) 0 (0%) 1 (0.2%)
Years of Education <0.001
Mean (SD) 16.96 (±3.08) 16.85 (±3.39) 15.81 (±2.41)
Missing 7 (2.0%) 3 (0.7%) 0 (0%)
Mother’s Age
Mean (SD) 32.69 (±4.45) 31.86 (±5.42) 30.12 (±3.91) <0.001
Missing 4 (1.1%) 0 (0%) 0 (0%)
Parity
Mean (SD) 0.72 (±0.88) 0.80 (±1.02) 0.80 (±0.81) 0.098
Missing 2 (0.6%) 0 (0%) 13 (2.6%)
Pre-pregnancy BMI (kg/m2)
Mean (SD) 24.78 (±5.42) 26.46 (±6.38) 23.71 (±4.45) <0.001
Missing 132 (37.5%) 16 (3.9%) 2 (0.4%)
Newborn Sex
Male 194 (55.1%) 211 (51.3%) 239 (47.4%) 0.083
Female 158 (44.9%) 200 (48.7%) 265 (52.6%)
Plant-Based Diet
Mean (SD) –0.48 (±0.46) 0.19 (±0.67) 1.56 (±1.14) <0.001
Missing 23 (6.5%) 36 (8.8%) 16 (3.2%)
Health Conscious Diet
Mean (SD) 0.21 (±0.81) –0.73 (±0.73) –0.42 (±0.79) <0.001
Missing 23 (6.5%) 36 (8.8%) 16 (3.2%)
Western Diet
Mean (SD) –0.15 (±0.63) 1.06 (±1.20) –0.51 (±0.65) <0.001
Missing 23 (6.5%) 36 (8.8%) 16 (3.2%)
Newborn Gestational Age (weeks)
Mean (SD) 39.53 (±1.38) 39.44 (±1.47) 39.20 (±1.32) <0.001
Missing 4 (1.1%) 0 (0%) 0 (0%)
Birth Length (cm)
Mean (SD) 51.68 (±2.52) 50.20 (±2.16) 51.44 (±2.69) <0.001
Missing 71 (20.2%) 10 (2.4%) 7 (1.4%)
Birth Weight (kg)
Mean (SD) 3.50 (±0.49) 3.53 (±0.50) 3.26 (±0.46) <0.001
Missing 6 (1.7%) 0 (0%) 1 (0.2%)
Newborn BMI (kg/m2)
Mean (SD) 13.11 (±1.41) 13.94 (±1.29) 12.31 (±1.39) <0.001
Missing 72 (20.5%) 10 (2.4%) 7 (1.4%)
Newborn Ponderal Index (kg/m3)
Mean (SD) 25.45 (±3.14) 27.79 (±2.55) 24.02 (±3.17) <0.001
Missing 72 (20.5%) 10 (2.4%) 7 (1.4%)
Estimated cell proportions CD8T
Mean (SD) 0.01 (±0.01) 0.04 (±0.03) 0.02 (±0.02) <0.001
CD4T
Mean (SD) 0.11 (±0.06) 0.13 (±0.06) 0.16 (±0.07) <0.001
NK
Mean (SD) 0.02 (±0.02) 0.03 (±0.03) 0.02 (±0.03) <0.001
Bcell
Mean (SD) 0.02 (±0.02) 0.04 (±0.03) 0.04 (±0.03) <0.001
Mono
Mean (SD) 0.01 (±0.02) 0.04 (±0.03) 0.03 (±0.03) <0.001
Gran
Mean (SD) 0.80 (±0.10) 0.60 (±0.13) 0.72 (±0.14) <0.001
nRBC
Mean (SD) 0.08 (±0.08) 0.12 (±0.11) 0.07 (±0.11) <0.001
MNLR
Mean (SD) 6.59 (±6.00) 3.30 (±3.14) 3.98 (±3.08) <0.001
Missing 6 (1.7%) 0 (0%) 3 (0.6%)
* comparison for CHILD and FAMILY only

Epigenome-wide association of maternal smoking in European cohorts

Since there were no current smokers in START (Table 1), we tested the association between maternal smoking and differential methylated sites in FAMILY (# of CpG = 2544) and CHILD (# of CpG = 200,050). The primary outcome variable was ‘current smoker,’ defined by mothers self-identified as currently smoking during the pregnancy vs. those who never smoked or quit either before or during pregnancy. We also included a secondary outcome variable ‘ever smoker,’ defined by mothers who are current smokers or have quit smoking vs. those who never smoked. A tertiary outcome was smoking exposure, measured by the number of hours per week reported by the expectant mothers, and was available in all cohorts. We summarized the type of analyses for different outcomes in Supplementary file 1c.

We first conducted a separate epigenetic association study in each cohort, testing the association between methylation β-values at individual CpGs and the smoking phenotype using either a logistic regression model for smoking status or a linear regression for smoking exposure as the outcome. The model adjusted for additional covariates including the estimated cord blood cell proportions, maternal age, social disadvantage index, which is a continuous composite measure of social and economic exposures (Anand et al., 2006), mother’s years of education, GDM, and parity. The smoking exposure variable was skewed, and a rank-based transformation was applied to mimic a standard normal distribution.

We then meta-analyzed association results for maternal smoking status in the European cohorts using an inverse variance-weighted fixed-effect model. The meta-analysis was conducted for 2,112 CpGs that were available in both CHILD (profiled using HM450K) and FAMILY (profiled using the targeted array). For the tertiary outcome, we conducted an inverse variance meta-analysis including START using both a fixed-effect model. For each EWAS or meta-analysis, the false discovery rate (FDR) adjustment was used to control multiple testing and we considered CpGs that passed an FDR-adjusted p-value <0.05 to be relevant for maternal smoking.

Using DNA methylation to construct predictive models for maternal smoking

We sought to construct a predictive model in the form of a methylation risk score (MRS) using reported associations of maternal smoking. The proposed solution adapted the existing lassosum method (Mak et al., 2017) that was originally designed for polygenic risk scores, where the matrix of SNP genotypes (X) can be conveniently replaced by the β-value matrix. We hope to establish a linear regression model that can explain the variation in y using linear combinations of X:

y=Xγ+ε, (1)

where X ∈ Rn×p denotes a column standardized β-matrix of the p CpGs measured on n individuals. Multi-collinearity arises as many of the CpGs in physical proximity are highly correlated, causing instability in the model converging to a solution and/or leading to variance inflation in the resulting coefficients when estimated simultaneously. A lasso solution was designed to alleviate the multi-collinearity of this estimation problem and can be obtained by minimizing the objective function that includes an L-1 penalty term that regularizes γ, forcing some of the coefficients to be exactly zero:

γ^=minγ{n(yXγ)T(yXγ)+2λΣj=1p|γj|} (2)

Briefly, an objective function under elastic-net constraint was minimized to obtain the elastic-net solution γ, where only summary statistics (b) and a scalar of the covariance between the β-values of the CpGs (XX) are needed. This was done by modifying the elastic net solution (https://github.com/tshmak/lassosum/blob/master/R/elnetR.R; Mak, 2017) for the lassosum method (Mak et al., 2017) that depended on two tuning parameters, along with additional inputs, namely the summary statistics and a reference CpG data covariance matrix. The Elastic net using the summary statistics function contained hyperparameters for the L-1 and L-2 penalty, namely, λ1 and λ2, which needed to be selected. To select the optimal tuning parameters, we examined a range of λ1 values that forces all weights to be zero or no penalty, with 50 incremental increases, and λ2 was taken to be α(1 − λ1) where α was set to be 0–1 with incremental increases of 0.1. These together gave a grid of 10×50 choices for the two tuning parameter values. The tuning parameter pair that produced a score that was most significantly associated with the smoking history variable history (as a continuous outcome) in CHILD, without any data transformation, was chosen as the final elastic net solution. The optimized λ1 and λ2 were then used to create a final model that entails a list of CpGs and their corresponding weights, which were then used to calculate an MRS for maternal smoking in the FAMILY and START samples.

The summary statistics of the discovery of EWAS were obtained from the EWAS catalog (http://www.ewascatalog.org/) reported under ‘PubMed ID 27040690’ by Joubert and colleagues (Joubert et al., 2016). The summary statistics were restricted to the analysis of ‘sustained maternal smoking in pregnancy effect on newborns adjusted for cell composition.’ Of the 2620 maternal smoking CpGs that passed the initial screening, 2107 were available in CHILD but only 128 were common to CHILD, FAMILY, and START. To evaluate whether the targeted GMEL-EPIC array design has comparable performance as the epigenome-wide array to evaluate the epigenetic signature of maternal smoking, a total of three MRSs were constructed, two using the 128 CpGs available in all cohorts – across the HM450K and targeted GMEL-EPIC arrays – and with either CHILD (n=347 with non-missing smoking history) or FAMILY (n=397 with non-missing smoking history) as the validation cohort, and another using 2107 CpGs that were only available in CHILD and START samples with CHILD as the validation cohort. The validation model considered the continuous smoking history without modification as the outcome, while accounting for covariates, which included the estimated cord blood cell proportions, maternal age, social disadvantage index, mother’s years of education, GDM, and parity. Henceforth, we referred to these derived maternal smoking scores as the FAMILY-targeted MRS, CHILD-targeted MRS, and the HM450K MRS, respectively. To benchmark and compare with existing maternal smoking MRSs, we calculated the Reese score using 28 CpGs (Reese et al., 2017; Richmond et al., 2018), Richmond score using 568 CpGs (Richmond et al., 2018), Rauschert score using 204 CpGs (Rauschert et al., 2019), Joubert score using all 2,620 CpGs with evidence of association for maternal smoking (Joubert et al., 2016), and finally a three-CpG score for air pollution (Gondalia et al., 2019). The details of these scores and score weight can be found in Supplementary file 1d.

Statistical analysis

For each cohort, we contrasted the three versions of the derived scores using an analysis of variance analysis (ANOVA) along with pairwise comparisons using a two-sample t-test to examine how much information might be lost due to the exclusion of more than 10-fold CpGs at the validation stage, in all samples, and in non-smokers. We also examined the correlation structure between all derived and external MRSs using a heatmap summarizing their pairwise Pearson’s correlation coefficient. Then, we compared the mean difference of each MRS score among smoking history using an ANOVA F-test and two-sample t-test to understand whether there was a dosage dependence in the cord blood DNAm signature of maternal smoking. Additionally, each score was tested against a binary outcome for current smoker vs. not, and two continuous measures for smoking history and weekly smoking exposure. The binary outcome was tested using a logistic regression model and the predictive performance was assessed using the area under the receiver operating characteristic curve (AUC). The reported 95% confidence interval for each estimated AUC was derived using 2000 bootstrap samples. The continuous outcome was examined using a linear regression model and its performance was quantified using the adjusted R2.

For the derived MRS, we empirically assessed whether a systematic difference existed in the resulting score with respect to all other derived scores. This was examined via pairwise mean differences between the HM450 and other scores using a two-sample t-test and an overall test of mean difference using an ANOVA F-test, among all samples and the subset of never-smokers. Finally, we tested the association between each maternal smoking MRS and smoking phenotypes in mothers, as well as offspring phenotypes using a linear regression model, when applicable, adjusting for the child’s age at each visit. The association results were meta-analyzed for phenotypes with homogeneous effects across the cohorts using a fixed-effect model. An FDR adjustment was used to control the multiple testing of meta-analyzed associations between MRS and 25 (or 23, depending on the number of phenotypes available in the cohort) outcomes, and we considered the association that passed an FDR-adjusted p-value <0.05 to be relevant.

Results

Cohort sample characteristics

The analyses included 763 European mother-child pairs with cord blood DNAm data from the CHILD study (CHILD; n=352)(Subbarao et al., 2015) and The Family Atherosclerosis Monitoring In earLY life (FAMILY; n=411) study (Morrison et al., 2009), and 503 South Asian mother-child pairs from The SouTh Asian biRth cohorT (START) study (Anand et al., 2013). A schematic overview of the analytical flow of the study can be found in Figure 1.

Figure 1. Schematic overview of the analytical pipeline for the cord blood DNA methylation (DNAm) maternal smoking score and association study.

Figure 1.

(A) shows the epigenome-wide association studies conducted in the European cohorts (CHILD and FAMILY); (B) illustrates the workflow for methylation risk score (MRS) construction using an external epigenome-wide association studies (EWAS) (Joubert et al., 2016) as the discovery sample and The Canadian Healthy Infant Longitudinal Development (CHILD) study as the external validation study, while (C) demonstrates the evaluation of the MRS in two independent cohorts of White European (i.e. FAMILY) and South Asian (i.e. START). The validated MRS was then tested for association with smoking-specific, maternal, and children phenotypes in CHILD, FAMILY, and START, as shown in (D). *indicates cohort sample size including those with missing smoking history.

We observed lower past smoking and missingness on smoking history among pregnant women in START as compared to CHILD or FAMILY using the epigenetic subsample (Table 1) and the overall sample (Supplementary file 1b). Pregnant women in START were significantly different from CHILD or FAMILY in that they were on average younger at delivery, had a lower BMI, and a higher rate of GDM, in line with other cohort studies in South Asian populations (Brydon et al., 2000; Farrar et al., 2015). As compared to START, newborn infants from CHILD and FAMILY had a longer gestational period, a higher birth weight, and a higher BMI at birth (Table 1; Supplementary file 1b). We observed no difference between cohorts in terms of parity or newborn sex in the epigenetic subsample (Table 1). However, self-reported smoking exposure, measured by the number of hours exposed to cigarette smoking per week, was highly skewed and zero-inflated across the three cohorts (Figure 2—source data 1).

Within the European epigenetic subsample, of the 744 mother–newborn pairs with complete smoking history data, 40 (5.3%) newborns were exposed to current maternal smoking, which is on the lower end of the spectrum for the prevalence of smoking during pregnancy (9.2–32.5%) among Canadians (Lange et al., 2018). In addition, mothers who smoked during pregnancy were on average younger, had fewer years of education, and had higher household exposure to smoking (Supplementary file 1e). However, there was no statistically significant difference between newborns exposed to current and none or previous smoking in terms of birth weight, birth length, gestational age, or estimated cord blood cell proportions.

Epigenetic association of maternal smoking in White Europeans

The two predominantly White European cohorts, FAMILY (n=397) and CHILD (n=347), contributed to the meta-analysis of maternal smoking for both the primary outcome of current smoking (Figure 1A; Figure 2A) and the secondary outcome of ever smoking (Figure 2—figure supplement 1). The top associated CpGs with current maternal smoking were mapped to the growth factor independent 1 (GFI1) gene on chromosome 1, with cg12876356 as the lead (meta-analyzed effect = –1.11±0.22; meta-analyzed p=2.6 × 10–6; FDR adjusted p=0.006; Table 2). There were no CpGs associated with the ever-smoker status at an FDR of 0.05, though the top signal (cg09935388) was also mapped to the GFI1 gene (Pearson’s r2 correlation with cg12876356=0.75 and 0.68 in CHILD and FAMILY, respectively; Figure 2—figure supplement 1). The top associated CpG from the meta-analysis of smoking exposure (hours per week) in the European-origin cohorts (Figure 2B) was cpg01798813 on chromosome 17, which was also associated with maternal smoking and was consistent in the direction of association (meta-analyzed effect = –0.18±0.04; meta-analyzed p=1.4 × 10–5; FDR adjusted p=0.04; Table 2). There was no noticeable inflation of empirical type I error in the association p-values from the meta-analysis, with the median of the observed association test statistic roughly equal to the expected median (Figure 2—figure supplement 2).

Figure 2. Manhattan plots of the meta-analyzed association between cord blood DNA methylation (DNAm) and maternal smoking in Europeans.

Manhattan plots summarized the meta-analyzed association p-values between cord blood DNA methylation levels and current maternal smoking (A; n = 744) or smoking exposure (B ; n = 735) at a common set of 2114 cytosine–phosphate–guanine (CpG) sites. The red line denotes the smallest -log10(p-value) that is below the false discovery rate (FDR) correction threshold of 0.05. The red dots represent established associations with maternal smoking reported by Joubert and colleagues (Joubert et al., 2016).

Figure 2—source data 1. Histogram of the smoking exposure across the three cohorts.

Figure 2.

Figure 2—figure supplement 1. Manhattan plots of the meta-analyzed association between cord blood DNA methylation and ever maternal smoking in the combined European cohorts.

Figure 2—figure supplement 1.

The meta-analyzed association p-values for ever maternal smoking (n = 744) and methylation levels at 2114 cytosine–phosphate–guanine (CpG) sites were summarized in the Manhattan plot. Ever maternal smoking was defined to compare those who were currently smoking or quitted before or during this pregnancy vs. those never smoked. The red line denotes the smallest -log10(p-value) that is below the false discovery rate (FDR) correction threshold of 0.05. The red dots represent established associations with maternal smoking reported in Joubert and colleagues (Joubert et al., 2016).
Figure 2—figure supplement 2. Quantile-quantile plots of the meta-analyzed association between cord blood DNA methylation and maternal smoking history, smoking exposure in the combined European cohorts.

Figure 2—figure supplement 2.

Quantile-quantile plots summarized the association p-values between cord blood DNA methylation levels and current maternal smoking (A; n = 744) or ever maternal smoking (B; n = 744) or weekly smoking exposure (C; n = 735) at 2114 cytosine–phosphate–guanine (CpG) sites. The red line (y=x) is the line of reference and the genomic inflation factor, calculated as the ratio between the observed median and the theoretical median of the association test statistics, was annotated for each outcome. The horizontal lines (in A and B only) correspond to the smallest -log10(p-value) that is below the false discovery rate (FDR) correction threshold of 0.05.
Figure 2—figure supplement 3. Regression diagnostic for association between the top cytosine–phosphate–guanine (CpG) (cg09935388) and smoking exposure (n=339) without data transformation in The Canadian Healthy Infant Longitudinal Development (CHILD).

Figure 2—figure supplement 3.

The Residuals vs. Fitted plot shows the residuals on the y-axis and the fitted values on the x-axis, indicating departure from linearity (measured by distance from the blue line to each point) was quite severe. The Q-Q plot compares the standardized residuals with the theoretical quantiles from a standard normal distribution, showing non-normality was largely driven by the three extreme points. The Scale-Location plot shows the square root of the standardized residuals vs. the fitted values. There were also considerable variance heteroskedasticity as shown in the scale and location diagnostic plot. The Residuals vs. Leverage compares the residuals against the leverage of each observation, showing the main outlying points corresponded to the tail of the smoking exposure phenotype >25 hr/week.
Figure 2—figure supplement 4. Regression diagnostic for association between the top cytosine–phosphate–guanine (CpG) (cg09935388) and smoking exposure (n=396) without data transformation in Family Atherosclerosis Monitoring In early life (FAMILY).

Figure 2—figure supplement 4.

The Residuals vs. Fitted plot shows the residuals on the y-axis and the fitted values on the x-axis, indicating departure from linearity (measured by distance from the blue line to each point) was severe. The Q-Q plot compares the standardized residuals with the theoretical quantiles from a standard normal distribution, showing a large number of data points driving the departure from non-normality. The Scale-Location plot shows the square root of the standardized residuals vs. the fitted values, which suggests considerable variance heteroskedasticity. The Residuals vs. Leverage compares the residuals against the leverage of each observation, showing varying level of leverage points.
Figure 2—figure supplement 5. Regression diagnostic for association between the top cytosine–phosphate–guanine (CpG) (cg09935388) and smoking exposure (n=339) under an inverse normal rank transformation in Canadian Healthy Infant Longitudinal Development (CHILD).

Figure 2—figure supplement 5.

The Residuals vs. Fitted plot shows the residuals on the y-axis and the fitted values on the x-axis, indicating some level of departure from linearity (measured by distance from the blue line to each point), which was improved as compared to Figure 2—figure supplement 3. The Q-Q plot compares the standardized residuals with the theoretical quantiles from a standard normal distribution, some departure from non-normality. The Scale-Location plot shows the square root of the standardized residuals vs. the fitted values, which suggests some variance heteroskedasticity still remained. The Residuals vs. Leverage compares the residuals against the leverage of each observation, suggesting influential observations remained but with reduced influence on the model.
Figure 2—figure supplement 6. Regression diagnostic for association between the top cytosine–phosphate–guanine (CpG) (cg09935388) and smoking exposure (n=396) under an inverse rank transformation in Family Atherosclerosis Monitoring In early life (FAMILY).

Figure 2—figure supplement 6.

The Residuals vs. Fitted plot shows the residuals on the y-axis and the fitted values on the x-axis, indicating some level of departure from linearity (measured by distance from the blue line to each point), which was improved as compared to Figure 2—figure supplement 4. The Q-Q plot compares the standardized residuals with the theoretical quantiles from a standard normal distribution, some departure from non-normality. The Scale-Location plot shows the square root of the standardized residuals vs. the fitted values, which suggests some variance heteroskedasticity still remained. The Residuals vs. Leverage compares the residuals against the leverage of each observation, suggesting influential observations remained but with reduced influence on the model.
Figure 2—figure supplement 7. Scatterplots of meta-analyzed association effects for maternal smoking history or smoking exposure and reported effects of maternal smoking.

Figure 2—figure supplement 7.

(A) shows the scatterplot of meta-analyzed effects for maternal smoking (n=744) in the combined Canadian Healthy Infant Longitudinal Development (CHILD) and Family Atherosclerosis Monitoring In early life (FAMILY) cohorts (x-axis) vs. reported effects for maternal smoking in Joubert et al., 2016 (y-axis) for all cytosine–phosphate–guanines (CpGs) present in CHILD, FAMILY, and Joubert et al., 2016 (# CpGs = 128); (B) is the scatterplot of meta-analyzed effects for weekly smoking exposure (n=735) in the combined CHILD and FAMILY cohorts (x-axis) vs. reported effects for maternal smoking in Joubert et al., 2016 (y-axis) for all CpGs present in CHILD, FAMILY, and Joubert et al., 2016 (# CpGs = 128). The solid gray line is the best fitted line using the ordinary least square method (95% confidence interval shown as the shaded area) for the linear relationship between the effect sizes and the dashed gray line represents the reference of y=x.
Figure 2—figure supplement 8. Manhattan plots of the Epigenome-wide associations between cord blood DNA methylation (DNAm) and maternal smoking history, smoking exposure in Canadian Healthy Infant Longitudinal Development (CHILD).

Figure 2—figure supplement 8.

Manhattan plots summarized the association p-values between cord blood DNA methylation levels and current maternal smoking (A; n=347) or ever maternal smoking (B; n=347) or weekly smoking exposure (C; n=339) at 200,050 cytosine–phosphate–guanine (CpG) sites. The red line denotes the smallest -log10(p-value) that is below the false discovery rate (FDR) correction threshold of 0.05. The red dots represent established associations with maternal smoking reported in Joubert and colleagues (Joubert et al., 2016).

Table 2. Meta-analysis results of the association between cytosine–phosphate–guanines (CpGs) and maternal smoking and smoking exposure that passed a marginal p<0.05 threshold after the false discovery rate correction in European cohorts.

CHR Position CpG UCSC reference gene Meta-analysis (CHILD and FAMILY) Cohort-specific association P-value Reported Association EWAS catalog
Fixed effect Standard error Association p-value p-value for effect heterogeneity FDR adjusted the Association P-value CHILD FAMILY
Maternal Smoking 1 92481269 cg12876356 GFI1 –1.11 0.22 7.33E-07 0.51 0.0019 0.02 9.45E-06 MS;S; AC; BW
1 92482032 cg09935388 GFI1 –1.15 0.24 2.26E-06 0.52 0.0029 0.02 2.71E-05 MS;GA; S; AC; BMI; BW
1 92482405 cg14179389 GFI1 –1.48 0.32 5.03E-06 0.73 0.0035 0.01 1.12E-04 MS;S
1 92481144 cg18146737 GFI1 –0.92 0.20 5.58E-06 0.50 0.0035 0.04 3.95E-05 MS;S; AC; BW
1 92480576 cg09662411 GFI1 –0.94 0.22 1.64E-05 0.29 0.0083 0.10 3.85E-05 MS;S
1 92481479 cg18316974 GFI1 –0.74 0.18 3.58E-05 0.33 0.0152 0.13 7.34E-05 MS;S; AC; BW
17 2494783943 cg01798813 –0.83 0.21 1.09E-04 0.34 0.0395 0.02 0.0016 A; GA; BMI
Smoking Exposure 1 92482032 cg09935388 GFI1 –0.18 0.04 1.39E-05 0.23 0.04 0.15 2.45E-05 MS;GA; S; AC; BMI; BW
17 2494783943 cg01798813 –0.18 0.04 3.30E-05 0.13 0.04 0.00035 0.013 A; GA; BMI

MS: maternal smoking; GA: gestational age; AC: alcohol consumption; BMI: body mass index; T2D: type 2 diabetes; A: age; BW: birth weight.

As a sensitivity analysis, we repeated the analysis for the continuous smoking exposure under rank transformation vs. raw phenotype for the associated CpG in GFI1 and examined the regression diagnostics (Figure 2—figure supplements 36), and found that the model under rank-transformation deviated less from assumptions. Furthermore, we observed consistency in the direction of association for the 128 CpGs that overlapped between our meta-analysis and the 2620 CpGs with evidence of association for maternal smoking (Joubert et al., 2016; Figure 2—figure supplement 7). Specifically, the Pearson’s correlation coefficient for maternal smoking and weekly smoking exposure was 0.72 and 0.60, respectively. The maternal smoking and smoking exposure EWASs in CHILD alone did not yield any CpGs after FDR correction (Figure 2—figure supplement 8).

MRS captures maternal smoking and smoking exposure

The final MRSs, validated using CHILD European samples (n=347), included 15 and 143 CpG markers (Supplementary file 1g) from the targeted array and the epigenome-wide HM450 array (Figure 1B), respectively. Both produced methylation scores that were significantly associated with maternal smoking history (ANOVA F-test p-values = 1.0 × 10–6 and 2.4×10–14 in CHILD and 3.6×10–16 and <2.2 × 10–16 in FAMILY; Figure 3, Figure 3—figure supplement 1), and the best among alternative scores for CHILD and FAMILY (Supplementary file 1f). With the exception of the air pollution MRS, which only contained 3 CpGs (Supplementary file 1f), all remaining scores were marginally associated with smoking history in both CHILD and FAMILY (Figure 3—figure supplement 1) and correlated with each other (Figure 3—figure supplement 2). In particular, scores that were derived using the Joubert EWAS as the discovery sample, including ours, had higher pairwise correlation coefficients across the birth cohorts, with many of the CpGs mapping to the same genes, such as AHRR, MYO1G, GFI1, CYP1A1, and RUNX3. There was no statistically significant difference in mean between the two scores in any of the three cohorts (two-sample t-test ps >0.6) or among non-smokers (two-sample t-test ps >0.6; Figure 3—figure supplement 3). Since the HM450 score provides statistically more significant results in both CHILD and FAMILY with smoking history, despite the reduction in CpGs included (only 26 out of 143 CpGs present in FAMILY; Supplementary file 1f), we proceeded with the HM450 MRS model constructed using the 143 CpGs in subsequent analyses.

Figure 3. Relationships between maternal smoking methylation risk score (MRS) and maternal smoking history categories for Canadian Healthy Infant Longitudinal Development (CHILD) and Family Atherosclerosis Monitoring In early life (FAMILY).

Maternal smoking methylation score (y-axis) was shown as a function of maternal smoking history (x-axis) in levels of severity for prenatal exposure for CHILD (A; n=347), and FAMILY (B; n=397). Each severity level was compared to the never-smoking group and the corresponding two-sample t-test p-value was reported. The analysis of variance via an F-test p-value was used to indicate whether a mean difference in methylation score was present among all smoking history categories. The area under the receiver operating characteristic curve (AUC) for each study was shown in the lower panel.

Figure 3.

Figure 3—figure supplement 1. A comparison of results for derived and external maternal smoking methylation risk scores (MRSs).

Figure 3—figure supplement 1.

Maternal smoking methylation score (y-axis) was shown as a function of maternal smoking history (x-axis) in levels of severity ([0]=never smoked; [1]=quit before this pregnancy; [2]=quit during this pregnancy; [3]=currently smoking) for prenatal exposure for each study. The scores shown were validated in (1) Canadian Healthy Infant Longitudinal Development (CHILD; n=347), (2) CHILD but restricted to cytosine–phosphate–guanines (CpGs) that were also present on the targeted array, (3) Family Atherosclerosis Monitoring In early life (FAMILY; n=397) using CpGs on the targeted array. Each severity level was compared to the never smoking group and the corresponding two sample t-test p-value was reported. An omnibus test p-value to test whether a mean difference in methylation score was present among all smoking history categories.
Figure 3—figure supplement 2. A heatmap of correlation between derived and external maternal smoking methylation risk score (MRSs).

Figure 3—figure supplement 2.

This heatmap illustrates the pairwise correlation between MRSs calculated in (A) CHILD (n=352), (B) FAMILY (n=411), and (C) START (n=504). Each cell represents the correlation coefficient, ranging from –1–1, indicating the strength and direction of the association. A value of 1 signifies a perfect positive correlation, while –1 indicates a perfect negative correlation. Values closer to 0 suggest no correlation. The color gradient from deep blue (strong negative correlation), through white (no correlation), to deep red (strong positive correlation), visually encodes the strength of these relationships. The scores in the black box were derived using lassosum and internally validated. Note that these sample size included those with missing smoking history.
Figure 3—figure supplement 3. Comparison of all methylation scores stratified by study.

Figure 3—figure supplement 3.

The boxplots captured the standardized maternal smoking methylation scores (y-axis) stratified by study. The top panels summarized results for all samples in Canadian Healthy Infant Longitudinal Development (CHILD; n=352), Family Atherosclerosis Monitoring In early life (FAMILY; n=411), and SouTh Asian biRth cohorT (START; n=504), while the bottom panels summarized results for only those in CHILD, FAMILY, and START that never smoked. The p-values indicate the significance for a mean difference for each pairwise comparison between the HM450K score validated in CHILD with other scores using two-sample t-tests.

The HM450 MRS was significantly associated with maternal smoking history in CHILD (n=347) and FAMILY (n=397), but we failed to meaningfully validate the association in START (n=503) – not surprisingly – due to the low number of ever-smokers (n=2). A weak dose-dependent relationship between the MRS and the four categories of maternal smoking status in the severity of exposure ([0]=never smoked; [1]=quit before this pregnancy; [2]=quit during this pregnancy; [3]=currently smoking) was present in CHILD but was not replicated in FAMILY (Figure 3). The AUC for detecting current smokers were 0.95 (95% confidence interval: 0.89–1) and 0.89 (95% CI: 0.83–0.94) in CHILD and FAMILY (Figure 3), respectively, while the AUCs for detecting ever-smokers were 0.61 (95% CI: 0.54–0.67), 0.60 (95% CI: [0.55,0.69]; Supplementary file 1f), and 0.82 (95% CI: [0.55,1]; Figure 3), respectively. As a result, the epigenetic maternal smoking score was strongly associated with smoking status during pregnancy (OR = 1.09, 95% CI: [1.07,1.10], p=1.96 × 10–32) in the combined European cohorts. Meanwhile, the maternal smoking MRS was significantly associated with increased number of hours exposed to smoking per week in the two White European cohorts (1.93±0.33 hr per 1 unit of increase in MRS, FDR adjusted p=1.2 × 10–7; Supplementary file 1h; cohort-specific p=5.4 × 10–5 in CHILD and p=2.3 × 10–5 in FAMILY; Table 3), but not in the South Asian birth cohort (p=0.58; Table 3).

Table 3. Significant associations between maternal smoking methylation risk score and phenotypes in CHILD, FAMILY, and START.

CHILD FAMILY START
Fixed effect Standard error Association p-value Fixed effect Standard error Association p-value Fixed effect Standard error Association P-value
Smoking exposure (hr/week) 1.64 0.40 5.40E-05 2.58 0.60 2.34E-05 0.07 0.12 0.58
1 year Smoking exposure (hr/week) 0.44 0.15 0.0044
3 year Smoking exposure (hr/week) 1.15 0.39 0.0033
Gestational weight gain (kg) –0.36 0.38 0.35 –0.62 0.26 0.017 –0.14 0.34 0.69
Gestational age (weeks) 1.64 0.40 6.32E-05 2.84 0.62 5.52E-06 0.07 0.12 0.59
Birth weight (kg) –0.06 0.03 0.016 –0.04 0.02 0.096 –0.03 0.02 0.094
Birth length (cm) –0.14 0.15 0.35 –0.10 0.10 0.33 –0.37 0.12 0.0023
1 year Height (cm) –0.32 0.16 0.047 –0.34 0.14 0.019 –0.42 0.16 0.0079
2 year Height (cm) –0.13 0.35 0.72 –0.26 0.17 0.14 –0.57 0.21 0.0067
5 year Height (cm) –0.36 0.26 0.16 –0.43 0.26 0.095 –0.47 0.37 0.21
3 year Skinfold thickness 0.48 0.19 0.014 0.94 0.26 3.46E-04 0.24 0.27 0.38
5 year Skinfold thickness 0.56 0.24 0.019 0.68 0.37 0.068 0.12 0.42 0.77

Among individuals who had never smoked, no statistically significant mean difference was observed in the distribution of the combined methylation score between South Asian and European cohorts (Supplementary file 1i). These results provided empirical support for the portability of an European-derived maternal smoking methylation score to South Asian populations.

Association between MRS and other phenotypes

We observed several notable associations with children outcomes (Figure 1C). The maternal smoking MRS was consistently associated with increasing weekly smoking exposure in children reported by mothers at the 1 year visit (0.44±0.15, p=0.0044; Table 3) in CHILD, and at 3 year visit (0.86±0.26, p=0.0037; Table 3) in FAMILY, but not in START as all mothers reported non-exposure to smoking in children. A higher maternal smoking MRS was significantly associated with smaller birth size (–0.37±0.12, p=0.0023; Table 3) and height at 1, 2, and 5- year visits in the South Asian cohort (Table 3). We observed similar associations with body size in the White European cohorts (heterogeneity p-values >0.2), collectively, the MRS was associated with a smaller birth size (–0.22±0.07, p=0.0016; FDR adjusted p=0.019; Supplementary file 1h) in the combined European and South Asian cohorts. Meanwhile, a higher maternal smoking MRS was also associated with a lower birth weight (–0.043±0.013, p=0.001; FDR adjusted p=0.011; Supplementary file 1h) in the combined sample, though the effect was weaker in START (–0.03±0.02; p=0.094; Table 3) as compared to the White European cohorts.

The meta-analysis revealed no heterogeneity in the direction nor the effect size of associations for body size and weight between populations at birth or at later visits (heterogeneity p-values = 0.16–1; Supplementary file 1h). The association between the MRS and several children phenotypes, including height or length, weight, and skinfolds, appeared to persist with similar estimated effects throughout early developmental years (Supplementary file 1h), albeit the most significant effects were at birth, and the significance attenuated at later visits. We did not find any association with self-reported allergy or asthma in children at later visits (Supplementary file 1h). Furthermore, there was no evidence of the association between the MRS and any maternal outcomes (Supplementary file 1h).

Discussion

We examined the epigenetic signature of maternal smoking and smoking exposure using newborn cord blood samples from predominately European-origin and South Asian cohorts via two strategies: an individual CpG-level EWAS approach, and a multivariate approach in the form of a methylation score. The EWAS results replicated the association between maternal smoking and CpGs in the GFI1 gene that is well described in the literature with respect to smoking (Joehanes et al., 2016; Sikdar et al., 2019), maternal smoking (Joubert et al., 2016; Hannon et al., 2019; Markunas et al., 2014; Richmond et al., 2015), and birth weight (Küpers et al., 2015). In the latter case, we observed a significant association with maternal smoking history and smoking exposure in European-origin newborns. Furthermore, we noted a weak dose-dependent relationship between maternal smoking history and the methylation score in one European cohort (CHILD) but this was not replicated in the other (FAMILY). Since the timing and duration of maternal smoking during pregnancy were not directly available, these differences could play a role in the magnitude and specificity of DNA methylation changes in cord blood. Finally, the significant association of the MRS with the newborn health metrics in START, in the absence of mothers’ active smoking, could be the result of underreporting of smoking, poor recall of the time of quitting, and/or due to air pollution exposure (Rider and Carlsten, 2019), leading to oxidative stress. This suggests that our cord blood DNAm signature of maternal smoking is perhaps not unique to cigarette smoking, but captures similar biochemical responses, for example, via the aryl hydrocarbon receptor (Vogel et al., 2020; Reynolds et al., 2015). Our observation that a higher MRS was associated with lower birth weight and smaller birth length in both ethnic populations is thus consistent with the established link between oxidative stress and metabolic syndrome (Roberts and Sindhu, 2009).

Contrary to DNA methylation studies of smoking in adults, where whole blood is often used as a proxy tissue, there are multiple relevant tissues for maternal smoking during pregnancy, including the placenta of the mother, newborn cord blood, and children’s whole blood. However, methylation changes measured in whole blood or placenta of the mother, or cord blood of infants showed substantially different patterns of association signals (Everson et al., 2021). There are several advantages of using a cord blood-based biomarker from the DoHaD perspective. Firstly, cord blood provides a direct reflection of the in-utero environment and fetal exposure to maternal smoking. Additionally, since cord blood is collected at birth, it eliminates potential confounding factors such as postnatal exposures that may affect maternal blood samples. Furthermore, studying cord blood DNAm allows for the assessment of epigenetic changes specifically relevant to the newborn, offering valuable information on the potential long-term health implications. Meanwhile, methylation signals are known to be tissue-specific, thus it would be of interest for future research to combine differential methylation patterns from all relevant tissue to assess the immediate and long-term effects of maternal smoking. Another direction to further this line of research is to explore postnatal factors that mitigate prenatal exposures, for example, breastfeeding, which has been shown to have a protective effect against maternal tobacco smoking (Moshammer and Hutter, 2019). Indeed, more research is necessary to understand the critical periods of exposure and the dose-response relationship between maternal smoking and cord blood DNA methylation changes. Ongoing efforts to monitor the offspring and collect data in the next decade are in progress to establish the long-term association between maternal smoking and cardio-metabolic health (Morrison et al., 2009; Anand et al., 2013). As such, the constructed MRS can facilitate future research in child health and will be included as part of the generated data for others to access.

The strengths of this report include ethnic diversity, and fine phenotyping in a prospective and harmonized way with follow-up at multiple early childhood stages. This work is the first major multi-ancestry study that utilizes methylation scores to study maternal smoking and examines their portability from European-origin populations to South Asians. The use of MRS, as compared to individual CpGs, is a powerful tool to systematically investigate the influence of DNA methylation changes and whether it has lasting functional consequences on health outcomes. Our results converge with previous findings that epigenetic associations of maternal smoking are associated with newborn health, and add to the small body of evidence that these relationships extend to non-European populations and that different ancestral populations can experience the early developmental periods differently.

A few limitations should be mentioned. In the context of existing epigenetic studies of maternal smoking, we were not able to replicate signals in other well-reported genes such as AHRR, CYP1A1, and MYO1G, however, the MRS was able to pick up signals from these genes (Supplementary file 1g). This could be due to several reasons. First, the customized array with a limited number of CpGs (<3000) was designed in 2016 and many large EWASs on smoking and maternal smoking conducted more recently had not been included. Nonetheless, we have shown that from a multivariate perspective, the MRS constructed using a targeted approach that was carefully designed can be equally powerful with the advantage of being cost-effective. Second, contrary to existing EWASs where the methylation values are typically treated as the outcome, and the exposure, such as smoking, as the predictor; we reversed the regression such that the methylation levels were the predictors and smoking exposure as the outcome. This reverse regression approach is robust and our choice to reverse the regression was motivated by the goal of constructing a smoking score that combines the additive effects at multiple CpGs, which would otherwise be unfeasible. Third, systematic ancestral differences in DNA methylation patterns had been shown to vary at individual CpGs in terms of their association with smoking (Elliott et al., 2014). Converging with this conclusion, we also found the association with GFI1 to be most consistent after adjusting for cell composition. Fourth, while it would be of interest to examine a broader range of health outcomes in children, such as lung health and allergies, we were unable to acquire and standardize this information across different cohorts. This aspect should be considered in future study designs. Finally, maternal smoking is often associated with other confounding factors, such as socioeconomic status, other lifestyle behaviors, and environmental exposures. While we have done our best to control for well-known confounders that were available by study design, as in all observational studies, we could not account for unknown confounding effects. Finally, in recent years, maternal smoking has been on a decline as a result of changes in social norms and public health policies (Martin et al., 2023). This is also consistent with the lower smoking rates observed in our European cohorts (CHILD and FAMILY). Given the proportion of current smokers, the effective sample size for a direct comparison between CHILD and FAMILY, i.e., equivalently-powered sample size of a balanced (50% cases, 50% controls) design, were 41.7 and 104.7, respectively. While CHILD had a lower effective sample size, we ultimately chose it for validating the methylation score to better cover the CpGs that were significant in the discovery of EWAS. A larger validation study will likely further boost the performance of the methylation score and be considered in future research.

In conclusion, the epigenetic maternal smoking score we constructed was strongly associated with smoking status during pregnancy and self-reported smoking exposure in White Europeans, and with smaller birth size and lower birth weight in the combined South Asian and White European cohorts. The proposed cord blood epigenetic signature of maternal smoking has the potential to identify newborns who were exposed to maternal smoking in utero and to assess the long-term impact of smoking exposure on offspring health. In South Asian mothers with minimal smoking behavior, the relationship between the methylation score and negative health outcomes in newborns is still apparent, indicating that DNA methylation response is sensitive to smoking exposure, even in the absence of active smoking.

Acknowledgements

We express our sincere gratitude to all the participating families and the START, FAMILY, and CHILD study teams, including interviewers, nurses, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, and receptionists.

We would like to acknowledge the Genetic and Molecular Epidemiology Laboratory (GMEL), an associate of Hamilton Health Sciences and McMaster University, for their indispensable contributions to this work. The technical staff of GMEL conducted all epigenetic profiling, including sample processing and other technical operations.

We thank the members of the Nutrigen Alliance for providing the data: Sonia S Anand; Stephanie A Atkinson; Meghan Azad; Allan B Becker; Jeffrey Brook; Judah A Denburg; Dipika Desai; Russell J de Souza; Milan K Gupta; Michael Kobor; Diana L Lefebvre; Wendy Lou; Piushkumar J Mandhane; Sarah McDonald; Andrew Mente; David Meyre; Theo J Moraes; Katherine M Morrison; Guillaume Paré; Malcolm R Sears; Padmaja Subbarao; Koon K Teo; Stuart E Turvey; Julie Wilson; Salim Yusuf; Gita Wahi; Michael A Zulyniak.

This study was funded by the Canadian Institutes of Health Research Metabolomics Team Grant: MWG-146332. Dr. Anand is supported by a Tier 1 Canada Research Chair in Ethnicity and CVD and Heart, Stroke Foundation Chair in Population Health, a grant from the Canadian Partnership Against Cancer, Heart and Stroke Foundation of Canada and Canadian Institutes of Health Research. Dr. Azad is supported by a Tier 2 Canada Research Chair in the Developmental Origins of Chronic Disease.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Wei Q Deng, Email: dengwq@mcmaster.ca.

Sonia S Anand, Email: anands@mcmaster.ca.

Joris Deelen, Max Planck Institute for Biology of Ageing, Germany.

Carlos Isales, Augusta University, United States.

Funding Information

This paper was supported by the following grants:

  • Canadian Institutes of Health Research MWG-146332 to Russell J de Souza, Katherine M Morrison, Stephanie A Atkinson, Padmaja Subbarao, Koon K Teo, Guillaume Paré, Sonia S Anand.

  • Canadian Institutes of Health Research Tier 1 Canada Research Chair in Ethnicity and CVD and Heart, Stroke Foundation Chair in Population H to Sonia S Anand.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing.

Data curation, Validation, Methodology, Writing – review and editing.

Data curation, Validation, Investigation, Project administration, Writing – review and editing.

Validation, Investigation, Project administration, Writing – review and editing.

Funding acquisition, Validation, Investigation, Writing – review and editing.

Validation, Investigation, Writing – review and editing.

Data curation, Funding acquisition, Validation, Investigation, Writing – review and editing.

Data curation, Funding acquisition, Validation, Investigation, Writing – review and editing.

Data curation, Funding acquisition, Validation, Investigation, Writing – review and editing.

Data curation, Validation, Investigation, Writing – review and editing.

Data curation, Validation, Investigation, Writing – review and editing.

Data curation, Funding acquisition, Validation, Investigation, Writing – review and editing.

Data curation, Validation, Investigation, Writing – review and editing.

Data curation, Validation, Investigation, Writing – review and editing.

Data curation, Validation, Investigation, Writing – review and editing.

Conceptualization, Resources, Data curation, Funding acquisition, Validation, Investigation, Methodology, Writing – review and editing.

Conceptualization, Resources, Data curation, Funding acquisition, Validation, Investigation, Project administration, Writing – review and editing.

Ethics

Human subjects: Ethical approval was obtained independently from the Hamilton Integrated Research Ethics Board: CHILD (REB 07-2929), FAMILY (REB 02-060), and START (REB 10-640). CHILD was additionally approved by the respective Human Research Ethics Boards at McMaster University, the Universities of Manitoba, Alberta, and British Columbia, and the Hospital for Sick Children. Legal guardians of each participant provided written informed consent. Written informed consent was obtained from the parent/guardian (participating mother) for each study separately. We also have now obtained additional ethics board approval from HiREB (REB 16592) for using the data from the three cohorts together without additional consent from the participants.

Additional files

Supplementary file 1. Additional tables and summaries of results.

(A) Quality controls for the inclusion/exclusion of samples and methylation probes. (B) Characteristics of the overall sample include 5176 mother–newborn pairs from the Canadian Healthy Infant Longitudinal Development (CHILD), Family Atherosclerosis Monitoring In early life (FAMILY), and SouTh Asian biRth cohorT (START) cohorts. (C) A summary of available analyses and outcome variables in each cohort. (D) A summary of the DNA methylation (DNAm) maternal smoking score derivation design and results. (E) Characteristics of the epigenetic subsample from CHILD and FAMILY cohorts stratified by smoking status. (F) Score weights for external DNAm maternal smoking scores. (G) summary of cytosine–phosphate–guanines (CpGs) that contribute to the DNAm maternal smoking scores and their weights. (H) Association between maternal smoking methylation risk score and phenotypes in CHILD, FAMILY, and START. (I) Summary of mean difference in methylation risk scores between studies in overall samples and those never smoked.

elife-93260-supp1.xlsx (80.9KB, xlsx)

Data availability

The summary statistics used to construct methylation risk scores are available from EWAS catalog at http://www.ewascatalog.org/?trait=maternal%20smoking%20in%20pregnancy with additional filters of PubMID 27040690 and analysis on “Sustained maternal smoking in pregnancy effect on newborns adjusted for cell composition”. Summary statistics generated in the current study, including a total of 7 primary association studies (three smoking phenotypes in the two European cohorts and smoking exposure in the South Asian cohort) and 3 sets of meta-analyzed results in Europeans are available from the Zenodo repository (10.5281/zenodo.13286433). All scripts to reproduce and validate the predictive model can be found at https://github.com/WeiAkaneDeng/EpigeneticResearch/tree/WeiAkaneDeng-patch-1/MaternalSmoking (copy archived at Deng, 2024).

The following dataset was generated:

Deng W, Anand S. 2024. Maternal smoking DNA methylation risk score associated with health outcomes in offspring of European and South Asian ancestry. Zenodo.

References

  1. Akhabir L, Stringer R, Desai D, Mandhane PJ, Azad MB, Moraes TJ, Subbarao P, Turvey SE, Paré G, Anand SS, NutriGen Alliance DNA methylation changes in cord blood and the developmental origins of health and disease - a systematic review and replication study. BMC Genomics. 2022;23:221. doi: 10.1186/s12864-022-08451-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Anand SS, Razak F, Davis AD, Jacobs R, Vuksan V, Teo K, Yusuf S. Social disadvantage and cardiovascular disease: development of an index and analysis of age, sex, and ethnicity effects. International Journal of Epidemiology. 2006;35:1239–1245. doi: 10.1093/ije/dyl163. [DOI] [PubMed] [Google Scholar]
  3. Anand SS, Vasudevan A, Gupta M, Morrison K, Kurpad A, Teo KK, Srinivasan K, START Cohort Study Investigators Rationale and design of south asian birth cohort (START): A Canada-India collaborative study. BMC Public Health. 2013;13:79. doi: 10.1186/1471-2458-13-79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, Delano D, Zhang L, Schroth GP, Gunderson KL, Fan JB, Shen R. High density DNA methylation array with single CpG site resolution. Genomics. 2011;98:288–295. doi: 10.1016/j.ygeno.2011.07.007. [DOI] [PubMed] [Google Scholar]
  5. Bollepalli S, Korhonen T, Kaprio J, Anders S, Ollikainen M. EpiSmokEr: A robust classifier to determine smoking status from DNA methylation data. Epigenomics. 2019;11:1469–1486. doi: 10.2217/epi-2019-0206. [DOI] [PubMed] [Google Scholar]
  6. Brydon P, Smith T, Proffitt M, Gee H, Holder R, Dunne F. Pregnancy outcome in women with type 2 diabetes mellitus needs to be addressed. International Journal of Clinical Practice. 2000;54:418–419. [PubMed] [Google Scholar]
  7. Cardenas A, Lutz SM, Everson TM, Perron P, Bouchard L, Hivert MF. Mediation by placental DNA methylation of the association of prenatal maternal smoking and birth weight. American Journal of Epidemiology. 2019;188:1878–1886. doi: 10.1093/aje/kwz184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Chen Y, Lemire M, Choufani S, Butcher DT, Grafodatskaya D, Zanke BW, Gallinger S, Hudson TJ, Weksberg R. Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics. 2013;8:203–209. doi: 10.4161/epi.23470. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Choquet H, Yin J, Jorgenson E. Cigarette smoking behaviors and the importance of ethnicity and genetic ancestry. Translational Psychiatry. 2021;11:120. doi: 10.1038/s41398-021-01244-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Deng WQ. EpigeneticResearch/maternalsmoking. swh:1:rev:3c5cd0fad5e42e72a599a1926da37063176a0745Software Heritage. 2024 https://archive.softwareheritage.org/swh:1:dir:6b71e97fc9bcadb070f1f18b3297d1e57bcd957d;origin=https://github.com/WeiAkaneDeng/EpigeneticResearch;visit=swh:1:snp:57fc7320b0eba85c97016857ff09f48304a30e42;anchor=swh:1:rev:3c5cd0fad5e42e72a599a1926da37063176a0745
  11. de Souza RJ, Zulyniak MA, Desai D, Shaikh MR, Campbell NC, Lefebvre DL, Gupta M, Wilson J, Wahi G, Atkinson SA, Teo KK, Subbarao P, Becker AB, Mandhane PJ, Turvey SE, Sears MR, Anand SS, NutriGen Alliance Investigators Harmonization of food-frequency questionnaires and dietary pattern analysis in 4 ethnically diverse birth cohorts. The Journal of Nutrition. 2016;146:2343–2350. doi: 10.3945/jn.116.236729. [DOI] [PubMed] [Google Scholar]
  12. Elliott HR, Tillin T, McArdle WL, Ho K, Duggirala A, Frayling TM, Davey Smith G, Hughes AD, Chaturvedi N, Relton CL. Differences in smoking associated DNA methylation patterns in South Asians and Europeans. Clinical Epigenetics. 2014;6:4. doi: 10.1186/1868-7083-6-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Elliott HR, Burrows K, Min JL, Tillin T, Mason D, Wright J, Santorelli G, Davey Smith G, Lawlor DA, Hughes AD, Chaturvedi N, Relton CL. Characterisation of ethnic differences in DNA methylation between UK-resident South Asians and Europeans. Clinical Epigenetics. 2022;14:130. doi: 10.1186/s13148-022-01351-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. England LJ, Grauman A, Qian C, Wilkins DG, Schisterman EF, Yu KF, Levine RJ. Misclassification of maternal smoking status and its effects on an epidemiologic study of pregnancy outcomes. Nicotine & Tobacco Research. 2007;9:1005–1013. doi: 10.1080/14622200701491255. [DOI] [PubMed] [Google Scholar]
  15. Everson TM, Vives-Usano M, Seyve E, Cardenas A, Lacasaña M, Craig JM, Lesseur C, Baker ER, Fernandez-Jimenez N, Heude B, Perron P, Gónzalez-Alzaga B, Halliday J, Deyssenroth MA, Karagas MR, Íñiguez C, Bouchard L, Carmona-Sáez P, Loke YJ, Hao K, Belmonte T, Charles MA, Martorell-Marugán J, Muggli E, Chen J, Fernández MF, Tost J, Gómez-Martín A, London SJ, Sunyer J, Marsit CJ, Lepeule J, Hivert MF, Bustamante M. Placental DNA methylation signatures of maternal smoking during pregnancy and potential impacts on fetal growth. Nature Communications. 2021;12:5095. doi: 10.1038/s41467-021-24558-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Farrar D, Fairley L, Santorelli G, Tuffnell D, Sheldon TA, Wright J, van Overveld L, Lawlor DA. Association between hyperglycaemia and adverse perinatal outcomes in south Asian and white British women: analysis of data from the born in Bradford cohort. The Lancet Diabetes & Endocrinology. 2015;3:795–804. doi: 10.1016/S2213-8587(15)00255-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Gervin K, Salas LA, Bakulski KM, van Zelm MC, Koestler DC, Wiencke JK, Duijts L, Moll HA, Kelsey KT, Kobor MS, Lyle R, Christensen BC, Felix JF, Jones MJ. Systematic evaluation and validation of reference and library selection methods for deconvolution of cord blood DNA methylation data. Clinical Epigenetics. 2019;11:125. doi: 10.1186/s13148-019-0717-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Gondalia R, Baldassari A, Holliday KM, Justice AE, Méndez-Giráldez R, Stewart JD, Liao D, Yanosky JD, Brennan KJM, Engel SM, Jordahl KM, Kennedy E, Ward-Caviness CK, Wolf K, Waldenberger M, Cyrys J, Peters A, Bhatti P, Horvath S, Assimes TL, Pankow JS, Demerath EW, Guan W, Fornage M, Bressler J, North KE, Conneely KN, Li Y, Hou L, Baccarelli AA, Whitsel EA. Methylome-wide association study provides evidence of particulate matter air pollution-associated DNA methylation. Environment International. 2019;132:104723. doi: 10.1016/j.envint.2019.03.071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Guida F, Sandanger TM, Castagné R, Campanella G, Polidoro S, Palli D, Krogh V, Tumino R, Sacerdote C, Panico S, Severi G, Kyrtopoulos SA, Georgiadis P, Vermeulen RCH, Lund E, Vineis P, Chadeau-Hyam M. Dynamics of smoking-induced genome-wide methylation changes with time since smoking cessation. Human Molecular Genetics. 2015;24:2349–2359. doi: 10.1093/hmg/ddu751. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Hannon E, Schendel D, Ladd-Acosta C, Grove J, Hansen CS, Hougaard DM, Bresnahan M, Mors O, Hollegaard MV, Bækvad-Hansen M, Hornig M, Mortensen PB, Børglum AD, Werge T, Pedersen MG, Nordentoft M, Buxbaum JD, Daniele Fallin M, Bybjerg-Grauholm J, Reichenberg A, Mill J, iPSYCH-Broad ASD Group Variable DNA methylation in neonates mediates the association between prenatal smoking and birth weight. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences. 2019;374:20180120. doi: 10.1098/rstb.2018.0120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Hillary RF, McCartney DL, McRae AF, Campbell A, Walker RM, Hayward C, Horvath S, Porteous DJ, Evans KL, Marioni RE. Identification of influential probe types in epigenetic predictions of human traits: implications for microarray design. Clinical Epigenetics. 2022;14:100. doi: 10.1186/s13148-022-01320-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Joehanes R, Just AC, Marioni RE, Pilling LC, Reynolds LM, Mandaviya PR, Guan W, Xu T, Elks CE, Aslibekyan S, Moreno-Macias H, Smith JA, Brody JA, Dhingra R, Yousefi P, Pankow JS, Kunze S, Shah SH, McRae AF, Lohman K, Sha J, Absher DM, Ferrucci L, Zhao W, Demerath EW, Bressler J, Grove ML, Huan T, Liu C, Mendelson MM, Yao C, Kiel DP, Peters A, Wang-Sattler R, Visscher PM, Wray NR, Starr JM, Ding J, Rodriguez CJ, Wareham NJ, Irvin MR, Zhi D, Barrdahl M, Vineis P, Ambatipudi S, Uitterlinden AG, Hofman A, Schwartz J, Colicino E, Hou L, Vokonas PS, Hernandez DG, Singleton AB, Bandinelli S, Turner ST, Ware EB, Smith AK, Klengel T, Binder EB, Psaty BM, Taylor KD, Gharib SA, Swenson BR, Liang L, DeMeo DL, O’Connor GT, Herceg Z, Ressler KJ, Conneely KN, Sotoodehnia N, Kardia SLR, Melzer D, Baccarelli AA, van Meurs JBJ, Romieu I, Arnett DK, Ong KK, Liu Y, Waldenberger M, Deary IJ, Fornage M, Levy D, London SJ. Epigenetic signatures of cigarette smoking. Circulation. Cardiovascular Genetics. 2016;9:436–447. doi: 10.1161/CIRCGENETICS.116.001506. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Joubert BR, Felix JF, Yousefi P, Bakulski KM, Just AC, Breton C, Reese SE, Markunas CA, Richmond RC, Xu CJ, Küpers LK, Oh SS, Hoyo C, Gruzieva O, Söderhäll C, Salas LA, Baïz N, Zhang H, Lepeule J, Ruiz C, Ligthart S, Wang T, Taylor JA, Duijts L, Sharp GC, Jankipersadsing SA, Nilsen RM, Vaez A, Fallin MD, Hu D, Litonjua AA, Fuemmeler BF, Huen K, Kere J, Kull I, Munthe-Kaas MC, Gehring U, Bustamante M, Saurel-Coubizolles MJ, Quraishi BM, Ren J, Tost J, Gonzalez JR, Peters MJ, Håberg SE, Xu Z, van Meurs JB, Gaunt TR, Kerkhof M, Corpeleijn E, Feinberg AP, Eng C, Baccarelli AA, Benjamin Neelon SE, Bradman A, Merid SK, Bergström A, Herceg Z, Hernandez-Vargas H, Brunekreef B, Pinart M, Heude B, Ewart S, Yao J, Lemonnier N, Franco OH, Wu MC, Hofman A, McArdle W, Van der Vlies P, Falahi F, Gillman MW, Barcellos LF, Kumar A, Wickman M, Guerra S, Charles MA, Holloway J, Auffray C, Tiemeier HW, Smith GD, Postma D, Hivert MF, Eskenazi B, Vrijheid M, Arshad H, Antó JM, Dehghan A, Karmaus W, Annesi-Maesano I, Sunyer J, Ghantous A, Pershagen G, Holland N, Murphy SK, DeMeo DL, Burchard EG, Ladd-Acosta C, Snieder H, Nystad W, Koppelman GH, Relton CL, Jaddoe VWV, Wilcox A, Melén E, London SJ. DNA methylation in newborns and maternal smoking in pregnancy: genome-wide consortium meta-analysis. American Journal of Human Genetics. 2016;98:680–696. doi: 10.1016/j.ajhg.2016.02.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Küpers LK, Xu X, Jankipersadsing SA, Vaez A, la Bastide-van Gemert S, Scholtens S, Nolte IM, Richmond RC, Relton CL, Felix JF, Duijts L, van Meurs JB, Tiemeier H, Jaddoe VW, Wang X, Corpeleijn E, Snieder H. DNA methylation mediates the effect of maternal smoking during pregnancy on birthweight of the offspring. International Journal of Epidemiology. 2015;44:1224–1237. doi: 10.1093/ije/dyv048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Lange S, Probst C, Rehm J, Popova S. National, regional, and global prevalence of smoking during pregnancy in the general population: a systematic review and meta-analysis. The Lancet. Global Health. 2018;6:e769–e776. doi: 10.1016/S2214-109X(18)30223-7. [DOI] [PubMed] [Google Scholar]
  26. Liu B, Xu G, Sun Y, Qiu X, Ryckman KK, Yu Y, Snetselaar LG, Bao W. Maternal cigarette smoking before and during pregnancy and the risk of preterm birth: A dose-response analysis of 25 million mother-infant pairs. PLOS Medicine. 2020;17:e1003158. doi: 10.1371/journal.pmed.1003158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Mak TSH. GitHub repository. Lassosum. 2017 https://github.com/tshmak/lassosum
  28. Mak TSH, Porsch RM, Choi SW, Zhou X, Sham PC. Polygenic scores via penalized regression on summary statistics. Genetic Epidemiology. 2017;41:469–480. doi: 10.1002/gepi.22050. [DOI] [PubMed] [Google Scholar]
  29. Markunas CA, Xu Z, Harlid S, Wade PA, Lie RT, Taylor JA, Wilcox AJ. Identification of DNA methylation changes in newborns related to maternal smoking during pregnancy. Environmental Health Perspectives. 2014;122:1147–1153. doi: 10.1289/ehp.1307892. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Martin JA, Osterman MJK, Driscoll AK. Declines in cigarette smoking during pregnancy in the united states, 2016-2021. NCHS Data Brief. 2023;458:1–8. [PubMed] [Google Scholar]
  31. Marufu TC, Ahankari A, Coleman T, Lewis S. Maternal smoking and the risk of still birth: systematic review and meta-analysis. BMC Public Health. 2015;15:239. doi: 10.1186/s12889-015-1552-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Metzger BE, Gabbe SG, Persson B, Buchanan TA, Catalano PA, Damm P, Dyer AR, Leiva A, Hod M, Kitzmiler JL, Lowe LP, McIntyre HD, Oats JJN, Omori Y, Schmidt MI, International Association of Diabetes and Pregnancy Study Groups Consensus Panel International association of diabetes and pregnancy study groups recommendations on the diagnosis and classification of hyperglycemia in pregnancy. Diabetes Care. 2010;33:676–682. doi: 10.2337/dc09-1848. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Montgomery SM, Ekbom A. Smoking during pregnancy and diabetes mellitus in a British longitudinal birth cohort. BMJ. 2002;324:26–27. doi: 10.1136/bmj.324.7328.26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Morrison KM, Atkinson SA, Yusuf S, Bourgeois J, McDonald S, McQueen MJ, Persadie R, Hunter B, Pogue J, Teo K, FAMILY investigators The FAMILY Atherosclerosis Monitoring In earLY life (FAMILY) study: rationale, design, and baseline data of a study examining the early determinants of atherosclerosis. American Heart Journal. 2009;158:533–539. doi: 10.1016/j.ahj.2009.07.005. [DOI] [PubMed] [Google Scholar]
  35. Moshammer H, Hutter HP. Breast-feeding protects children from adverse effects of environmental tobacco smoke. International Journal of Environmental Research and Public Health. 2019;16:304. doi: 10.3390/ijerph16030304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Murray R, Kitaba N, Antoun E, Titcombe P, Barton S, Cooper C, Inskip HM, Burdge GC, Mahon PA, Deanfield J, Halcox JP, Ellins EA, Bryant J, Peebles C, Lillycrop K, Godfrey KM, Hanson MA, EpiGen Consortium Influence of maternal lifestyle and diet on perinatal DNA methylation signatures associated with childhood arterial stiffness at 8 to 9 years. Hypertension. 2021;78:787–800. doi: 10.1161/HYPERTENSIONAHA.121.17396. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. National Center for Chronic Disease Prevention and Health Promotion (US) Office on Smoking and Health . The Health Consequences of Smoking—50 Years of Progress A Report of the Surgeon General . Centers for Disease Control and Prevention; 2014. [PubMed] [Google Scholar]
  38. Oken E, Levitan EB, Gillman MW. Maternal smoking during pregnancy and child overweight: systematic review and meta-analysis. International Journal of Obesity. 2008;32:201–210. doi: 10.1038/sj.ijo.0803760. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Philips EM, Santos S, Trasande L, Aurrekoetxea JJ, Barros H, von Berg A, Bergström A, Bird PK, Brescianini S, Ní Chaoimh C, Charles MA, Chatzi L, Chevrier C, Chrousos GP, Costet N, Criswell R, Crozier S, Eggesbø M, Fantini MP, Farchi S, Forastiere F, van Gelder M, Georgiu V, Godfrey KM, Gori D, Hanke W, Heude B, Hryhorczuk D, Iñiguez C, Inskip H, Karvonen AM, Kenny LC, Kull I, Lawlor DA, Lehmann I, Magnus P, Manios Y, Melén E, Mommers M, Morgen CS, Moschonis G, Murray D, Nohr EA, Nybo Andersen AM, Oken E, Oostvogels A, Papadopoulou E, Pekkanen J, Pizzi C, Polanska K, Porta D, Richiardi L, Rifas-Shiman SL, Roeleveld N, Rusconi F, Santos AC, Sørensen TIA, Standl M, Stoltenberg C, Sunyer J, Thiering E, Thijs C, Torrent M, Vrijkotte TGM, Wright J, Zvinchuk O, Gaillard R, Jaddoe VWV. Changes in parental smoking during pregnancy and risks of adverse birth outcomes and childhood overweight in Europe and North America: An individual participant data meta-analysis of 229,000 singleton births. PLOS Medicine. 2020;17:e1003182. doi: 10.1371/journal.pmed.1003182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Pidsley R, Zotenko E, Peters TJ, Lawrence MG, Risbridger GP, Molloy P, Van Djik S, Muhlhausler B, Stirzaker C, Clark SJ. Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling. Genome Biology. 2016;17:208. doi: 10.1186/s13059-016-1066-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Rauschert S, Melton PE, Burdge G, Craig JM, Godfrey KM, Holbrook JD, Lillycrop K, Mori TA, Beilin LJ, Oddy WH, Pennell C, Huang RC. Maternal smoking during pregnancy induces persistent epigenetic changes into adolescence, independent of postnatal smoke exposure and is associated with cardiometabolic risk. Frontiers in Genetics. 2019;10:770. doi: 10.3389/fgene.2019.00770. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Raynor P, Born in Bradford Collaborative Group Born in Bradford, a cohort study of babies born in Bradford, and their parents: protocol for the recruitment phase. BMC Public Health. 2008;8:327. doi: 10.1186/1471-2458-8-327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. R Development Core Team . Vienna, Austria: R Foundation for Statistical Computing; 2021. https://www.r-project.org [Google Scholar]
  44. Reese SE, Zhao S, Wu MC, Joubert BR, Parr CL, Håberg SE, Ueland PM, Nilsen RM, Midttun Ø, Vollset SE, Peddada SD, Nystad W, London SJ. DNA methylation score as a biomarker in newborns for sustained maternal smoking during pregnancy. Environmental Health Perspectives. 2017;125:760–766. doi: 10.1289/EHP333. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Reitsma MB, Flor LS, Mullany EC, Gupta V, Hay SI, Gakidou E. Spatial, temporal, and demographic patterns in prevalence of smoking tobacco use and initiation among young people in 204 countries and territories, 1990-2019. The Lancet. Public Health. 2021;6:e472–e481. doi: 10.1016/S2468-2667(21)00102-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Reynolds LM, Wan M, Ding J, Taylor JR, Lohman K, Su D, Bennett BD, Porter DK, Gimple R, Pittman GS, Wang X, Howard TD, Siscovick D, Psaty BM, Shea S, Burke GL, Jacobs DR, Jr, Rich SS, Hixson JE, Stein JH, Stunnenberg H, Barr RG, Kaufman JD, Post WS, Hoeschele I, Herrington DM, Bell DA, Liu Y. DNA methylation of the aryl hydrocarbon receptor repressor associations with cigarette smoking and subclinical atherosclerosis. Circulation. 2015;8:707–716. doi: 10.1161/CIRCGENETICS.115.001097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Richmond RC, Simpkin AJ, Woodward G, Gaunt TR, Lyttleton O, McArdle WL, Ring SM, Smith ADAC, Timpson NJ, Tilling K, Davey Smith G, Relton CL. Prenatal exposure to maternal smoking and offspring DNA methylation across the lifecourse: findings from the Avon Longitudinal Study of Parents and Children (ALSPAC) Human Molecular Genetics. 2015;24:2201–2217. doi: 10.1093/hmg/ddu739. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Richmond RC, Suderman M, Langdon R, Relton CL, Davey Smith G. DNA methylation as a marker for prenatal smoke exposure in adults. International Journal of Epidemiology. 2018;47:1120–1130. doi: 10.1093/ije/dyy091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Rider CF, Carlsten C. Air pollution and DNA methylation: effects of exposure in humans. Clinical Epigenetics. 2019;11:131. doi: 10.1186/s13148-019-0713-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Roberts CK, Sindhu KK. Oxidative stress and metabolic syndrome. Life Sciences. 2009;84:705–712. doi: 10.1016/j.lfs.2009.02.026. [DOI] [PubMed] [Google Scholar]
  51. Salmasi G, Grady R, Jones J, McDonald SD, Knowledge Synthesis Group* Environmental tobacco smoke exposure and perinatal outcomes: a systematic review and meta-analyses. Acta Obstetricia et Gynecologica Scandinavica. 2010;89:423–441. doi: 10.3109/00016340903505748. [DOI] [PubMed] [Google Scholar]
  52. Shenker NS, Ueland PM, Polidoro S, van Veldhoven K, Ricceri F, Brown R, Flanagan JM, Vineis P. DNA methylation as a long-term biomarker of exposure to tobacco smoke. Epidemiology. 2013;24:712–716. doi: 10.1097/EDE.0b013e31829d5cb3. [DOI] [PubMed] [Google Scholar]
  53. Shipton D, Tappin DM, Vadiveloo T, Crossley JA, Aitken DA, Chalmers J. Reliability of self reported smoking status by pregnant women for estimating smoking prevalence: A retrospective, cross sectional study. BMJ. 2009;339:b4347. doi: 10.1136/bmj.b4347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Sikdar S, Joehanes R, Joubert BR, Xu CJ, Vives-Usano M, Rezwan FI, Felix JF, Ward JM, Guan W, Richmond RC, Brody JA, Küpers LK, Baïz N, Håberg SE, Smith JA, Reese SE, Aslibekyan S, Hoyo C, Dhingra R, Markunas CA, Xu T, Reynolds LM, Just AC, Mandaviya PR, Ghantous A, Bennett BD, Wang T, Consortium TB, Bakulski KM, Melen E, Zhao S, Jin J, Herceg Z, Meurs J, Taylor JA, Baccarelli AA, Murphy SK, Liu Y, Munthe-Kaas MC, Deary IJ, Nystad W, Waldenberger M, Annesi-Maesano I, Conneely K, Jaddoe VW, Arnett D, Snieder H, Kardia SL, Relton CL, Ong KK, Ewart S, Moreno-Macias H, Romieu I, Sotoodehnia N, Fornage M, Motsinger-Reif A, Koppelman GH, Bustamante M, Levy D, London SJ. Comparison of smoking-related DNA methylation between newborns from prenatal exposure and adults from personal smoking. Epigenomics. 2019;11:1487–1500. doi: 10.2217/epi-2019-0066. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Stock SJ, Bauld L. Maternal smoking and preterm birth: An unresolved health challenge. PLOS Medicine. 2020;17:e1003386. doi: 10.1371/journal.pmed.1003386. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Subbarao P, Anand SS, Becker AB, Befus AD, Brauer M, Brook JR, Denburg JA, HayGlass KT, Kobor MS, Kollmann TR, Kozyrskyj AL, Lou WYW, Mandhane PJ, Miller GE, Moraes TJ, Pare PD, Scott JA, Takaro TK, Turvey SE, Duncan JM, Lefebvre DL, Sears MR, CHILD Study investigators The canadian healthy infant longitudinal development (CHILD) study: examining developmental origins of allergy and asthma. Thorax. 2015;70:998–1000. doi: 10.1136/thoraxjnl-2015-207246. [DOI] [PubMed] [Google Scholar]
  57. Toschke AM, Koletzko B, Slikker W, Jr, Hermann M, von Kries R. Childhood obesity is associated with maternal smoking in pregnancy. European Journal of Pediatrics. 2002;161:445–448. doi: 10.1007/s00431-002-0983-z. [DOI] [PubMed] [Google Scholar]
  58. Ventura SJ, Hamilton BE, Mathews TJ, Chandra A. Trends and variations in smoking during pregnancy and low birth weight: evidence from the birth certificate, 1990-2000. Pediatrics. 2003;111:1176–1180. [PubMed] [Google Scholar]
  59. Vogel CFA, Van Winkle LS, Esser C, Haarmann-Stemmann T. The aryl hydrocarbon receptor as a target of environmental stressors - Implications for pollution mediated stress and inflammatory responses. Redox Biology. 2020;34:101530. doi: 10.1016/j.redox.2020.101530. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Wanding Zhou HS. Bioconductor; 2018. [DOI] [Google Scholar]
  61. Witt SH, Frank J, Gilles M, Lang M, Treutlein J, Streit F, Wolf IAC, Peus V, Scharnholz B, Send TS, Heilmann-Heimbach S, Sivalingam S, Dukal H, Strohmaier J, Sütterlin M, Arloth J, Laucht M, Nöthen MM, Deuschle M, Rietschel M. Impact on birthweight of maternal smoking throughout pregnancy mediated by DNA methylation. BMC Genomics. 2018;19:290. doi: 10.1186/s12864-018-4652-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Wright J, Small N, Raynor P, Tuffnell D, Bhopal R, Cameron N, Fairley L, Lawlor DA, Parslow R, Petherick ES, Pickett KE, Waiblinger D, West J, Born in Bradford Scientific Collaborators Group Cohort Profile: the Born in Bradford multi-ethnic family cohort study. International Journal of Epidemiology. 2013;42:978–991. doi: 10.1093/ije/dys112. [DOI] [PubMed] [Google Scholar]
  63. Xu R, Hong X, Zhang B, Huang W, Hou W, Wang G, Wang X, Igusa T, Liang L, Ji H. DNA methylation mediates the effect of maternal smoking on offspring birthweight: a birth cohort study of multi-ethnic US mother-newborn pairs. Clinical Epigenetics. 2021;13:47. doi: 10.1186/s13148-021-01032-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Yousefi PD, Suderman M, Langdon R, Whitehurst O, Davey Smith G, Relton CL. DNA methylation-based predictors of health: applications and statistical considerations. Nature Reviews. Genetics. 2022;23:369–383. doi: 10.1038/s41576-022-00465-w. [DOI] [PubMed] [Google Scholar]
  65. Zeilinger S, Kühnel B, Klopp N, Baurecht H, Kleinschmidt A, Gieger C, Weidinger S, Lattka E, Adamski J, Peters A, Strauch K, Waldenberger M, Illig T. Tobacco smoking leads to extensive genome-wide changes in DNA methylation. PLOS ONE. 2013;8:e63812. doi: 10.1371/journal.pone.0063812. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Zhou W, Laird PW, Shen H. Comprehensive characterization, annotation and innovative use of Infinium DNA methylation BeadChip probes. Nucleic Acids Research. 2017;45:e22. doi: 10.1093/nar/gkw967. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Zhou W, Triche TJ, Laird PW, Shen H. SeSAMe: reducing artifactual detection of DNA methylation by Infinium BeadChips in genomic deletions. Nucleic Acids Research. 2018;46:e123. doi: 10.1093/nar/gky691. [DOI] [PMC free article] [PubMed] [Google Scholar]

eLife assessment

Joris Deelen 1

This study offers a useful advance by introducing a cord blood DNA methylation score for maternal smoking effects, with the inclusion of cohorts from diverse backgrounds. However, the overall strength of evidence is deemed incomplete, due to concerns regarding low exposure levels and low statistical power, which hampers the generalisability of their findings. The study provides an interesting basis for future studies, but would benefit from the addition of more cohorts to validate the findings and a focus on more diverse health outcomes.

Reviewer #2 (Public Review):

Anonymous

Summary:

The authors generated a DNA methylation score in cord blood for detecting exposure to cigarette smoke during pregnancy. They then asked if it could be used to predict height, weight, BMI, adiposity and WHR throughout early childhood.

Strengths:

The study included two cohorts of European ancestry and one of South Asian ancestry.

Weaknesses:

(1) Numbers of mothers who self-reported any smoking was very low likely resulting in underpowered analyses.

(2) Although it was likely that some mothers were exposed to second-hand smoke and/or pollution, data on this was not available.

(3) One of the European cohorts and half of the South Asian cohort had DNA methylation measured on only 2500 CpG sites including only 125 sites previously linked to prenatal smoking.

Reviewer #3 (Public Review):

Anonymous

Summary:

Deng et al. assess neonatal cord blood methylation profiles and the association with (self-reported) maternal smoking in multiple populations, including two European (CHILD, FAMILY) and one South Asian (START), via two approaches: (1) they perform an independent epigenome-wide association study (EWAS) and meta-analysis across the CHILD and FAMILY cohort, during which they also benchmark previously reported maternal-smoking associated sites, and (2) they generate new composite methylation risk scores for maternal smoking, and assess their performance and association with phenotypic characteristics in the three populations, in addition to previously described maternal smoking methylation risk scores.

Strengths and weaknesses:

Their meta-analysis across multiple cohorts and comparison with previous findings represents a strength. In particular the inclusion of a South Asian birth cohort is commendable as it may help to bolster generalizability. However, their conclusions are limited by several important weaknesses:

(1) the low number of (self-reported) maternal smokers in particular their South Asian population, resulting in an inability to conduct benchmarking of maternal smoking sites in this cohort. As such, the inclusion of the START cohort in certain figures is not warranted (e.g., Figure 3) and the overall statement that smoking-associated MRS are portable across populations are not fully supported;

(2) different methylation profiling tools were used: START and CHILD methylation profiles were generated using the more comprehensive 450K array while the FAMILY cohort blood samples were profiled using a targeted array covering only 3,000, as opposed to 450,000 sites, resulting in different coverage of certain sites which affects downstream analyses and MRS, and importantly, omission of potentially relevant sites as the array was designed in 2016 and substantial additional work into epigenetic traits has been conducted since then;

(3) the authors train methylation risk scores (MRS) in CHILD or FAMILY populations based on sites that are associated with maternal smoking in both cohorts and internally validate them in the other cohort, respectively. As START cohort due to insufficient numbers of self-reported maternal smokers, the authors cannot fully independently validated their MRS, thus limiting the strength of their results.

Overall strength of evidence and conclusions:

Despite these limitations, the study overall does explore the feasibility of using neonatal cord blood for the assessment of maternal smoking. However, their conclusion on generalizability of the maternal smoking risk score is currently not supported by their data as they were not able to validate their score in a sufficiently large number of maternal smokers and never smokers of South Asian populations.

While their generalizability remains limited due to small sample numbers and previous studies with methylation risk scores exist, their findings may nonetheless provide the basis for future work into prenatal exposures which will be of interest to the research community. In particular their finding that the maternal smoking-associated MRS was associated with small birth sizes and weights across birth cohorts, including the South Asian birth cohort that had very few self-reported smokers, is interesting and the author suggest these findings could be associated with factors other than smoking alone (e.g., pollution), which warrant further investigation and would be highly novel.

Future exploration should also include a strong focus on more diverse health outcomes, including respiratory conditions that may have long-lasting health consequences.

eLife. 2024 Aug 14;13:RP93260. doi: 10.7554/eLife.93260.4.sa3

Author response

Wei Deng 1, Nathan Cawte 2, Natalie Williams 3, Sandi M Azab 4, Russell J de Souza 5, Amel Lamri 6, Katherine M Morrison 7, Stephanie A Atkinson 8, Padmaja Subbarao 9, Stuart E Turvey 10, Theo J Moraes 11, Koon K Teo 12, Piush J Mandhane 13, Meghan B Azad 14, Elinor Simons 15, Guillaume Paré 16, Sonia S Anand 17

The following is the authors’ response to the previous reviews.

Reviewer #2:

(1) P-values should be reported adjusted for multiple tests or, at the very least, note that they are unadjusted to alert the reader that they may be biased by winner's curse.

Throughout the manuscript, we applied the false discovery rate threshold to declare results that were statistically relevant for discussion. However, for reporting in abstract, we believe the raw p-values are most straightforward as we only reported the most important and robust results, and considering that (1) multiple testing correction does not change the ranking of the adjusted p-values; (2) p-value adjustment depends on both the method and the number of hypothesis tested; (3) all reporting of the most significant discovery results are prone to winner’s curse, but in the context of our study: the GFI1 finding was confirmatory in nature, thus raw p-value allows for a direct comparison with existing studies.

We have taken the suggestion to quote the FDR-adjusted p-values throughout the manuscript for meta-analyzed results and discussed the impact of FDR correction for the EWAS and MRS association differed as a result of the number of hypothesis in each context:

“For each EWAS or meta-analysis, the false discovery rate (FDR) adjustment was used to control multiple testing and we considered CpGs that passed an FDR-adjusted p-value < 0.05 to be relevant for maternal smoking.”

“An FDR adjustment was used to control the multiple testing of meta-analyzed association between MRS and 25 (or 23, depending on the number of phenotypes available in the cohort) outcomes, and we considered association that passed an FDR-adjusted p-value < 0.05 to be relevant.”

(2) The odds ratios and p-values reported in the abstract for associations of the MRS with smoking status and smoking exposure per week appear to be missing from the results section of the manuscript or (supplementary) tables.

The results for smoking status during pregnancy was added to the results:

“As a result, the epigenetic maternal smoking score was strongly associated with smoking status during pregnancy (OR=1.09 [1.07,1.10], p=5.5×10-33) in the combined European cohorts.”

The exposure association was reported in the result section and Supplementary Table 8. We do note the typo in the cohort specific p-values, which now has been corrected.

(3) It is misleading to report a lack of MRS associations with maternal smoking in South Asians without also stating that there were only two smokers.

We agree with the reviewer that an association test would not be justified given the lack of smoking in the present South Asian cohort. We also removed the p-value of association for the START cohort in Figure 3, based on this and comment #4 from reviewer #3. The relevant results have been revised as follows:

“The HM450 MRS was significantly associated with maternal smoking history in CHILD and FAMILY (n = 397), but we failed to meaningfully validate the association in START (n = 503; Figure 3) – not surprisingly – due to the low number of ever-smokers (n = 2).”

(4) It is potentially confusing to report MRS associations with maternal smoking by ethnicity but then report associations with birth size and length combined without any explanation. The most novel result of this study is that there is virtually no maternal smoking among the South Asians and yet the MRS is associated with birth weight and size and with height at age 2. This result is buried in the combined analysis. I would suggest reporting the MRS associations with height and weight separately as has been done for maternal smoking behavior.

We thank the reviewer for this suggestion and this has now been added the new Table 3, showing the cohort specific and meta-analyzed effect sizes. In the revision, we highlighted that the ethnic specific MRS associations, such as with smoking exposure at various age (1 and 3 years) and skinfold thickness in European cohorts but not the South Asian cohort, as well as associations that were more homogenous, such as the birth weight and unique body size association in combined cohorts. In particular, the MRS in the South Asian cohort exhibited a consistent association with body size at various time points (at birth, 1, 2, and 5 year) with similar effect sizes. The following was added to the results:

“A higher maternal smoking MRS was significantly associated with smaller birth size (-0.37±0.12, p = 0.0023; Table 3) and height at 1, 2, and 5 year visits in the South Asian cohort (Table 3). We observed similar associations with body size in the white European cohorts (heterogeneity p-values> 0.2), collectively, the MRS was associated with a smaller birth size (-0.22±0.07, p=0.0016; FDR adjusted p = 0.019) in the combined European and South Asian cohorts (Table 3). Meanwhile, a higher maternal smoking MRS was also associated with a lower birth weight (-0.043±0.013, p = 0.001; FDR adjusted p = 0.011) in the combined sample, though the effect was weaker in START (-0.03±0.02; p = 0.094) as compared to the white European cohorts.

The meta-analysis revealed no heterogeneity in the direction nor the effect size of associations for body size and weight between populations at birth or at later visits (heterogeneity p-values = 0.16–1; Supplementary Table 8).”

Reviewer #3:

(1) You mention that the 450K Score performs best even though only 10/143 are included for some populations. Did you explore recalibration of the MRS using only those 10 CpGs?

We thank the reviewer for this comment – due to an error in result transferring, the number of overlapping CpGs between the 450K score and the targeted array was in fact 26. This error only impacted results relevant to the FAMILY study using the HM450K score and did not materially change our results nor conclusions. We have updated accordingly, Table 3, Suppl. Tables 5, 8, 9, Figure 3-B, and Suppl. Figures 5, 6-B, 7-B and 7-D, and throughout the manuscript for meta-analyzed MRS associations.

The subset of 26 CpGs using the originally derived weight was expected to perform worse than the original HM450K score using the full 143 CpGs. When we did restrict the methylation score construction to these 26 CpGs, the performance in CHILD was worse than the original score, but comparable to FAMILY (updated Suppl. Table 5). These 26 CpGs did overlap with the targeted score derived in CHILD (13 out of 15 present) and in FAMILY (19 out of 63 present), suggesting moderate agreement between the array platform as well as across studies.

In other words, while the subset of 26 CpGs had reasonable performance in both CHILD and FAMILY, both studies could benefit by inclusion of the additional CpGs in the original score. We have included a sentence to discuss the choice of validation study and the trade-off between sample size and # of CpGs under response to Reviewer 3 comment # 2.

(2) Could the internal validation performance be driven by sample size of the training, providing support for the need for larger training sizes? Should this be discussed in the study?

The validation study, CHILD, has the smaller sample size between the two European cohorts. While both potential data for validation had smaller sample sizes, we chose CHILD (n=347), rather than FAMILY (n=397) as it had better coverage with respect to the discovery EWAS or the training data (# of associated CpGs = 3,092, n = 5,647). Beyond the signals of association, the validation performance also depends on a mix of overall sample size and the proportion of current smokers. Given the proportion of current smokers, the effective sample size for a direct comparison, i.e. equivalently-powdered sample size of a balanced (50% cases, 50% controls) design, are 41.7 and 104.7 for CHILD and FAMILY, respectively. While we are unable to directly compare whether a larger effective sample size produced a better performing score, we believe this to be the case, and thus a larger validation study would boost the performance of the methylation score. We have added the following to the discussion:

“Given the proportion of current smokers, the effective sample size for a direct comparison between CHILD and FAMILY, i.e. equivalently-powdered sample size of a balanced (50% cases, 50% controls) design, were 41.7 and 104.7, respectively. While CHILD had a lower effective sample size, we ultimately chose it for validating the methylation score to better cover the CpGs that were significant in the discovery EWAS. A larger validation study will likely further boost the performance of the methylation score and be considered in future research.”

(3) Figure 1: It is very helpful to have an overview diagram, but this should then follow the flow of the manuscript to aid the reader. Currently, the diagram does not follow the flow of the manuscript and thus is rather confusing - for instance, the figure starts with the MRS but initially an EWAS is conducted in the manuscript itself. I suggest to adapt the overview figure accordingly. Moreover, a description for (A), (B), (C) is not provided in the figure legends. Figure 1 could thus be improved further.

We thank the reviewer for the suggestion to improve the key figure that summarizes the manuscript. The EWAS workflow for the primary, secondary and tertiary outcomes, as well as the European cohorts meta-analysis has been added to the updated sub-figure A. The description for each subfigures has also been added to the figure legends as follows:

“Figure 1-A shows the epigenome-wide association studies conducted in the European cohorts (CHILD and FAMILY); Figure 1-B illustrated the workflow for methylation risk score (MRS) construction using an external EWAS (Joubert et al., 2016) as the discovery sample and CHILD study as the external validation study, while Figure 1-C demonstrates the evaluation of the MRS in two independent cohorts of white European (i.e. FAMILY) and South Asian (i.e. START). The validated MRS was then tested for association with smoking specific, maternal, and children phenotypes in CHILD, FAMILY, and START, as shown in Figure 1-D.”

(4) Figure 3: The readability and information content in this figure, and other figures containing boxplots (e.g., Supplementary Figure 5), could be improved. I would suggest to justify X axis labels to the axis rather than overlapping, and importantly, show individual data points wherever possible (e.g., overlaying the box plots). In (c), the ANOVA is not justified given the sample size in START. In general, it is worth excluding the START cohorts from this analysis on the justification of a too small sample size for maternal smokers.

We thank the reviewer for their thoughtful points for improvement. The axis labels have been wrapped to avoid overlapping, and the data points added to the boxplots. ANOVA p-value for START was removed due to the low counts of smokers in the figure and manuscript throughout. However, we retained START in Figure 3 and other boxplots to show the distribution of the score for non-smokers to benchmark with the European cohorts.

(5) In addition to boxplots, it may be helpful to show AUC diagrams for ROC curves (e.g. Figure 3). AUCs are reported in the Tables but not shown. Additionally, all AUC results should include 95% Confidence intervals.

This is a great suggestion and we have added the corresponding ROC, annotated with AUC (95% CI) to Figure 3. The 95% CI for all AUC results were added to the Tables and main text. The following was added to Methods:

“The reported 95% confidence interval for each estimated AUC was derived using 2,000 bootstrap samples.”

(6) Supplementary Figure 6: It could be helpful to discuss the amount of overlap between the different MRS.

Most of the scores were derived using the Joubert et al., (2016) EWAS as the discovery sample, including ours, and thus there will be overlap between the scores. The exception was the GondaliaScore, which contained only 3 CpGs that do not overlap with any other scores.

While different scores might not have selected completely identical sets of CpGs, the mapped genes are highly consistent across the scores. We have added to the discussion and results the extent of overlap between the top scores:

“In particular, scores that were derived using the Joubert EWAS as the discovery sample, including ours, had higher pairwise correlation coefficients across the birth cohorts, with many of the CpGs mapping to the same genes, such as AHRR, MYO1G, GFI1, CYP1A1, and RUNX3.”

(7) Supplementary Figure 7: This figure is never referenced in the text and from the legend itself it is not too clear what it is trying to show. Please refer to it in the main text with some additional context.

Supplementary Figure 7 was referenced in the Results under subsection “Methylation Risk Score (MRS) Captures Maternal Smoking and Smoking Exposure”, following the

Methods subsection “Statistical analysis” where we wanted to examine a systematic difference. We made revision to the main text to clarify the analysis:

“For the derived MRS, we empirically assessed whether a systematic difference existed in the resulting score with respect to all other derived scores. This was examined via pairwise mean differences between the HM450 and other score using a two-sample t-test and an overall test of mean difference using an ANOVA F-test, among all samples and the subset of never smokers.”

(8) Tables: Tables are currently challenging to read and perhaps more formatting could be done to improve readability.

We thank the reviewer for the suggestion. Main tables have been reformatted to a landscape layout and each numeric cell moved to the centre to improve readability.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Deng W, Anand S. 2024. Maternal smoking DNA methylation risk score associated with health outcomes in offspring of European and South Asian ancestry. Zenodo. [DOI] [PubMed]

    Supplementary Materials

    Figure 2—source data 1. Histogram of the smoking exposure across the three cohorts.
    Supplementary file 1. Additional tables and summaries of results.

    (A) Quality controls for the inclusion/exclusion of samples and methylation probes. (B) Characteristics of the overall sample include 5176 mother–newborn pairs from the Canadian Healthy Infant Longitudinal Development (CHILD), Family Atherosclerosis Monitoring In early life (FAMILY), and SouTh Asian biRth cohorT (START) cohorts. (C) A summary of available analyses and outcome variables in each cohort. (D) A summary of the DNA methylation (DNAm) maternal smoking score derivation design and results. (E) Characteristics of the epigenetic subsample from CHILD and FAMILY cohorts stratified by smoking status. (F) Score weights for external DNAm maternal smoking scores. (G) summary of cytosine–phosphate–guanines (CpGs) that contribute to the DNAm maternal smoking scores and their weights. (H) Association between maternal smoking methylation risk score and phenotypes in CHILD, FAMILY, and START. (I) Summary of mean difference in methylation risk scores between studies in overall samples and those never smoked.

    elife-93260-supp1.xlsx (80.9KB, xlsx)

    Data Availability Statement

    The summary statistics used to construct methylation risk scores are available from EWAS catalog at http://www.ewascatalog.org/?trait=maternal%20smoking%20in%20pregnancy with additional filters of PubMID 27040690 and analysis on “Sustained maternal smoking in pregnancy effect on newborns adjusted for cell composition”. Summary statistics generated in the current study, including a total of 7 primary association studies (three smoking phenotypes in the two European cohorts and smoking exposure in the South Asian cohort) and 3 sets of meta-analyzed results in Europeans are available from the Zenodo repository (10.5281/zenodo.13286433). All scripts to reproduce and validate the predictive model can be found at https://github.com/WeiAkaneDeng/EpigeneticResearch/tree/WeiAkaneDeng-patch-1/MaternalSmoking (copy archived at Deng, 2024).

    The following dataset was generated:

    Deng W, Anand S. 2024. Maternal smoking DNA methylation risk score associated with health outcomes in offspring of European and South Asian ancestry. Zenodo.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES