Skip to main content
Human Molecular Genetics logoLink to Human Molecular Genetics
. 2023 May 17;32(14):2399–2407. doi: 10.1093/hmg/ddad086

Time-varying effects are common in genetic control of gestational duration

Julius Juodakis 1,, Karin Ytterberg 2, Christopher Flatley 3, Pol Sole-Navais 4, Bo Jacobsson 5,6
PMCID: PMC10321382  PMID: 37195282

Abstract

Preterm birth is a major burden to neonatal health worldwide, determined in part by genetics. Recently, studies discovered several genes associated with this trait or its continuous equivalent—gestational duration. However, their effect timing, and thus clinical importance, is still unclear. Here, we use genotyping data of 31 000 births from the Norwegian Mother, Father and Child cohort (MoBa) to investigate different models of the genetic pregnancy ‘clock’. We conduct genome-wide association studies using gestational duration or preterm birth, replicating known maternal associations and finding one new fetal variant. We illustrate how the interpretation of these results is complicated by the loss of power when dichotomizing. Using flexible survival models, we resolve this complexity and find that many of the known loci have time-varying effects, often stronger early in pregnancy. The overall polygenic control of birth timing appears to be shared in the term and preterm, but not very preterm, periods and exploratory results suggest involvement of the major histocompatibility complex genes in the latter. These findings show that the known gestational duration loci are clinically relevant and should help design further experimental studies.

Introduction

Preterm birth—birth before completing 37 weeks of gestation—is a major cause of neonatal mortality and morbidity, affecting about 10% of deliveries worldwide (1). Various environmental factors associated with preterm birth have been reported, but the biological mechanisms leading to it are still largely unknown. As a result, there is no robust strategy for prevention of preterm birth, and its rates continue to increase in many countries (1–3).

Genetics is clearly important for this trait: around 20–40% of the variation in pregnancy duration is attributed to maternal genetic factors and another 10–30% to fetal genetics (4). Although the specific variants or genes underlying these effects are still mostly unknown, several loci were recently identified in large genome-wide association studies (GWASs) (5–7). Understanding the biological effects of variants at these loci could lead to broader understanding of pregnancy maintenance and suggest possible interventions for preterm birth.

Various statistical models have been used for discovering the variants, and their comparison could provide some clues about the mechanism of the pregnancy ‘clock’. Typically, the variants are tested for association with pregnancy (gestational) duration in days as a continuous outcome using linear models, or preterm birth as a binary outcome, using e.g. logistic regression. Other dichotomizations of pregnancy duration have also been used, such as early (5), extreme (8) or very preterm birth (9) or post-term birth (5). So far, almost all of the associated variants have been identified using the continuous outcome and have weaker or no associations with preterm birth (5–7).

This contrast between results for pregnancy duration and preterm birth raises questions about the action and clinical relevance of the discovered genes. On one hand, if their variants have uniform effects throughout pregnancy (as modeled in the standard linear regression), such results are expected, as the continuous model then has much higher detection power (10). On the other hand, pathways initiating preterm birth are considered to be distinct from those acting near term (11). This is supported by epidemiological associations that differ by gestational duration groups (12), and registry data also suggest that genetic effects change throughout pregnancy (13). It is possible that the loci discovered for gestational duration have little relevance for preterm birth—or they might act uniformly throughout pregnancy and contradict this view.

In this study, we aim to resolve this uncertainty. We analyze a genotyping cohort of more than 30 000 parent–offspring trios using models explicitly incorporating the timing of genetic effects. With these, we aim to answer the following: (1) Are the pregnancy duration genes discovered so far clinically relevant for preterm birth? (2) Do the genes support the view that preterm and term birth timing is etiologically different? (3) Is there a period of gestation where the gene effects are particularly strong, weak or otherwise unusual?

We start by conducting GWASs of gestational duration and preterm birth in this cohort and directly showing the challenges in comparing the results of the two outcomes. We then select the top variants known for these traits and re-analyze them in time-to-event models, specifically designed to investigate their effect dynamics. We also repeat this analysis on a polygenic score (PGS) of gestational duration and further explore possibly different mechanisms acting in the very preterm range.

Results

GWAS results for gestational duration and preterm delivery

We ran maternal and fetal GWASs of gestational duration and preterm delivery on 22 247 mothers and 21 262 fetuses (610 and 529 preterm cases, respectively) from the Norwegian MoBa cohort, using standard association methods (linear and logistic regression). In the maternal GWAS of gestational duration, we identified two loci associated with gestational duration at genome-wide significance (Fig. 1A), located in or near the KCNAB1 and ADCY5 genes. Both of these loci have been reported previously in (6); however, KCNAB1 was not identified in another large GWAS (7). In the fetal GWAS of gestational duration, one single genome-wide significant variant was identified, rs34207099, nearest to the gene LRATD2 (FAM84B) (beta = −2.4 days, SE = 0.44 days, P = 3.0e-8, imputation INFO score = 0.75, Fig. 1C). We note, however, that the variant has a low frequency (MAF = 1.1%), is close to the significance threshold and has no linked proxies, and this association is not observed in the previous, larger fetal meta-analysis (P = 0.20) (5). For preterm delivery, no genome-wide significant associations were detected either in the maternal (Fig. 1B) or fetal GWAS (Supplementary Material, Fig. S1). Genomic inflation was small in all four GWASs, with lambda < 1.06 (Supplementary Material, Fig. S2).

Figure 1.

Figure 1

(A, B) Manhattan plots illustrating the maternal GWAS results for gestational duration (A) and preterm delivery (B). The dashed line marks the significance threshold. The variants marked in red, and the corresponding loci, were reported as significant for the respective phenotype in the largest previous GWAS meta-analysis (6). Under the liability model, the GWAS in (B) has lower power: for a variant that can be detected genome-wide with 80% power in (A), the significance thresholds in (B) should be reduced to the dotted white lines to achieve 50% or 80% detection power. (C) LocusZoom plot showing the region around the only genome-wide significant association identified in the fetal GWAS. Shown P-values are for gestational duration. The significant variant is the purple diamond while all other variants are colored based on their linkage disequilibrium with it. Variants not available in the reference panel are colored in gray. (D) Effect size comparison for the suggestive variants in the maternal GWASs (P-value < 10−5 for either outcome). Shown are the log odds ratios (ORs) and linear regression betas in the same position. The identity line is marked.

The null results of the preterm-delivery GWAS at first glance suggest differences from the continuous phenotype. Of the 165 preterm delivery loci at suggestive association (P < 10−5) none were (genome-wide) significant in the GWAS of gestational duration. However, the regression coefficients of the suggestive variants across the two maternal GWASs were concordant in direction and size, i.e. variants that shorten gestational duration also increase preterm delivery risk (Fig. 1D). If the genetic effects are indeed shared between term and preterm periods, then dichotomization incurs a great loss of power: we estimate that such analysis would then require a 6.7× larger cohort to reach equal power with the continuous analysis (see also Fig. 1B; calculations based on 2.7% prevalence and the liability model, see Materials and Methods). For reference, the current maternal GWAS had 80% power to detect effects of at least 0.74 days per allele (linear model) or OR > 1.53 (logistic model), assuming MAF = 0.2 and 5e-8 significance threshold. In light of this power difference, comparison between the two GWAS is complicated, and we could not make definite conclusions about the effect dynamics or clinical relevance of the findings.

Top gestational duration variants have time-varying effects

From previous meta-analyses, we identified 25 maternal and 3 fetal genetic variants involved in delivery timing. One additional fetal locus was identified in the present study. Most (26) of these were initially discovered in analyses of gestational duration, although one (maternal) variant was detected only for preterm delivery and two (fetal) only for early preterm birth.

We then re-analyzed these 29 variants with a flexible survival model (PAMM) and found that many of them have time-varying effects. PAMM (see Materials and Methods for detailed presentation) explicitly models the time to event, i.e. delivery. The effect of a variant is the change in instantaneous risk to deliver, relative to the major allele, and is allowed to change during gestation. Figure 2 shows the estimated effect dynamics for 20 variants that were nominally associated with gestational duration in this cohort, and 14 of these were significantly changing over time (P < 0.05 for the non-linear term in PAMM). Notably, even though all of the 20 variants were discovered in linear models and cohorts with few preterm births, their effects generally do not appear to be limited to the term period. In fact, for 10 loci (KCNAB1, FBXO32, LRATD2, ADCY5 and others), the estimated effect was largest early in the pregnancy and decreased toward the term. In three other cases, the effect peaked in the preterm birth range, around week 34 (TET3, EBF1 and COL27A1). For the remaining loci, the confidence intervals are wide and we are cautious about interpreting the estimated shapes, but most still show effects around or before 37 weeks (Fig. 2). One exception is the IL1A locus, which only had a significantly non-zero effect during weeks 38–41, consistently with the analyses performed in (5), which also concluded that its effects are mostly seen around the term. We emphasize that all of these variants were initially identified from other cohorts, except for the LRATD2 hit, which was discovered here and hence may show inflated effect size.

Figure 2.

Figure 2

Effects of 20 genetic variants on delivery timing throughout gestation, estimated by a flexible PAMM. For each week of gestation, the line is the estimated instantaneous log hazard ratio (positive = risk of birth is currently increased) and its 95% confidence interval is shaded. The preterm limit of 37 weeks is highlighted. Note that the minor allele count was used as a continuous covariate in the analysis, so the effects of 1 and 2 minor alleles are constrained to be proportional.

The remaining 9 out of 29 variants are shown in Supplementary Material, Figure S3. Eight of these variants did not have any detectable effects on delivery timing in this cohort (P > 0.05 both in the gestational duration GWAS and for the non-linear effects in PAMM). This includes all three loci discovered in dichotomized analyses (LRP5, SPATA6 and LPP). The EEFSEC locus had borderline associations (P = 0.06 for the linear effects in GWAS, P = 0.02 for the non-linear effects in PAMM) and is also shown in Supplementary Material, Figure S3. Details of all 29 variants are listed in Supplementary Material, Table S1.

As diagnostics of our PAMM setup, we verified that it maintains the type I error rate in this setting by simulations (P < 0.05 observed in 4% of simulations) and that if only constant effects are included, the estimates are very similar between PAMM and standard Cox regression (Supplementary Material, Fig. S4).

As a sensitivity analysis, we applied two other methods for testing time-varying effects. These methods assume different (linear) shapes of the effects and produce larger P-values for the non-linear effect loci (Supplementary Material, Table S1). The ranking of most other loci and the overall number of significant results are similar across the methods.

We also confirmed that almost all of the 28 top loci genes were actively transcribed in the placenta during early gestation (note that there were 29 loci, but 2 map to the EBF1 gene). In the data from (14), only AGTR2 and LEKR1 had few detections, while all other genes had at least five counts in 95% of the samples. Overall, the expression level of our top loci genes (median count = 326, taken over samples and then over genes) was at least as high as of other protein-coding genes (median count = 139; see also Supplementary Material, Fig. S5). This shows that the genes discovered in term-centric analyses are already present in the transcriptome and could plausibly have effects, much earlier in gestation.

Testing for enrichment of gene sets among the 28 genes mostly showed generic terms, with the top three being (all ‘positive regulation of’): biosynthetic, macromolecule biosynthetic and nucleobase-containing compound metabolic processes (Supplementary Material, Table S2). Results were almost identical using only the 21 genes with significant time-varying effects detected here. However, restricting the set to 17 genes with effects strongest in early pregnancy (i.e. excluding TET3, EBF1, COL27A1 and IL1A), primary ovarian insufficiency appeared as one of top three enriched pathways (WikiPathways WP5316; Supplementary Material, Table S2). Notably, this pathway also includes two of the three strongest GWAS hits overall—WNT4 and AGTR2 genes. Robust analysis would require determining the effect dynamics of each hit more precisely, but it could thus highlight particular pathways to birth, which has been difficult using all associated genes unselectively.

Polygenic effects are shared at term and early, but not very early, gestation

We constructed a maternal PGS for gestational duration based on the summary statistics of a previous large meta-analysis. The score includes ~ 1.1 million variants and, on its own, explains 2.0% of the variance of gestational duration in the present cohort (or 2.8% in combination with the clinical covariates). For comparison, if this score is limited to 50-kbp regions surrounding the 25 top hits, the variance explained drops to 0.6%, or 1.5% with clinical covariates.

Firstly, we confirmed that the estimated time-varying effects of each variant remain after adjusting for this score; they were also robust to adding or removing the clinical covariates (Supplementary Material, Fig. S6).

We then analyzed the effects of this score in time-to-event models (Fig. 3A). Similar to the individual variants, the PGS shows effects extending well into the preterm range, even though it was derived based on analyses of continuous gestational age. Surprisingly, the ‘protective’ PGS groups appear to have neutral or even slightly harmful effects in the very early gestation. While there is large uncertainty in the estimates, the pattern is still seen with different groupings of the score (Supplementary Material, Fig. S7) or even by just directly observing the risks of different outcomes in each group (Fig. 3B): the risk for preterm delivery decreases smoothly with the PGS, but for delivery before 32 weeks (or very preterm delivery, vPTD), the risk is similar across four PGS groups and only elevated in one.

Figure 3.

Figure 3

(A) Effects of the gestational duration PGS throughout gestation, estimated by a flexible PAMM. For each week of gestation, the line is the estimated instantaneous log hazard ratio (positive = risk of birth is currently increased), and its 95% confidence interval is shaded. (B) Unadjusted risks of preterm delivery (left) and vPTD (right) in each PGS quintile. Lines show the 95% confidence intervals.

We tested some possible explanations of this pattern (although we caution that the data contain only 59 vPTD cases). For example, it is biologically plausible that vPTD risk is determined mostly by rare alleles with strong deleterious effects, but the frequency of such alleles should decrease smoothly across all PGS groups, and indeed, we observed that for rare deleterious alleles in the PGS (Fig. 4A). Similarly, the number of homozygous deleterious genotypes decreased smoothly with PGS, not enriched in any particular group (Fig. 4A). Neither of these counts was predictive of vPTD (i.e. no association after adjusting for PGS, P = 0.37 and 0.41). We then tested the top principal components, seeking to identify any broader genetic features related to vPTD, and found that only the third PC was significantly associated with it (P = 0.016). Also, this PC was not associated with all PTDs (P = 0.16). Inspecting it revealed that its loadings were higher in chromosome 6 and specifically in the major histocompatibility complex (MHC) region, 6p22.1–6p21.3 (Fig. 4B). This suggests that immune pathways may be particularly relevant to the genetic risk of vPTD, possibly through gene interactions or other complex mechanisms producing the observed non-linear relation with PGS.

Figure 4.

Figure 4

(A) Boxplots showing counts of rare alleles (left) or minor homozygous genotypes (right) across groups of gestational duration PGS. Only variants with estimated negative effects are shown here. Also shown are the correlation r between the count and the PGS and the P-value for the count term in a model of vPTD. (B) Variant loadings of the vPTD-associated principal component 3. Boxplots show the loadings across chromosomes, excluding any with magnitudes smaller than 1. Chromosome 6 is higshlighted, and the inset shows all its loadings. The MHC region 6p22.1–6p21.3 is highlighted in the inset.

Discussion

In summary, we conducted GWASs of gestational duration, noting three associated loci, and of preterm birth, which did not find any significant associations. Together with existing genome-wide studies, this brings the total number of loci known to be involved in pregnancy length to 29. Re-analyzing these loci, as well as a gestational duration PGS, in time-to-event models showed that time-varying effects are common and are often strongest in the preterm period. The PGS results and exploratory analyses also suggest that very preterm birth may be controlled differently, possibly involving the major histocompatibility gene region.

Our maternal GWAS results, in terms of the top variants, are largely similar to previous ones obtained from a direct-to-consumer genotyping cohort (7). This cohort substantially differed from ours in phenotype ascertainment, exclusion criteria and preterm delivery rates (which in Norway are relatively low (1)), so the agreement between results is encouraging for the generalizability of GWASs. The weak fetal genetic associations are also consistent with literature (5). Still, cohorts focusing on non-European ancestries, and especially high-case-rate populations, are largely missing and will be crucial to further progress in the genetics of pregnancy.

Our primary finding of practical importance is that many of the variants discovered for gestational duration also have strong effects in the preterm period. In other words, the lack of associations with preterm delivery does not mean lack of effects—rather, the power loss when dichotomizing is just too great (this is difficult to quantify directly under time-varying effects, but we showed a large loss under the simple liability model). Researchers thus should be encouraged to use gestational duration as a continuous phenotype that leads to clinically relevant findings. Stronger early effects in survival models can also be a technical artifact of frailty—unaccounted heterogeneity between the individuals, typically owing to genetics or other covariates (15). However, as we observed several different effect shapes across the loci, and the effects were unchanged after accounting for known clinical covariates or the genetic background (PGS), we do not believe that frailty was a major factor in this study. Alternatively, such data could be analyzed with accelerated failure time models, which are not impacted by frailty (16), but they still require some development of significance tests and computational tools before practical use.

We also find that, while the gene effects are often present at some level throughout pregnancy, their strength greatly varies over time. This suggests that the biological clock controlling pregnancy length does not ‘tick’ evenly but rather has distinct phases at which different genes become relevant. A similar theory has been proposed for miscarriage timing—namely, that there are distinct checkpoint times at which fetal development is evaluated and the pregnancy continued or not (17). In relation to preterm birth, different, simpler clock models have been discussed in (18), but here, we present the first data from human genomics directly showing the complexity of this regulation.

Further functional work will still be needed to clarify the mechanisms behind the associations observed here. In particular, it is possible that the relevant genes directly act very early in pregnancy, e.g. in the trophoblast invasion (9), but the impact of that only becomes apparent later. This is compatible with the decreasing effects that we observe for many genes (19). On the other hand, we do show that our loci are actively expressed in early-to-mid-gestation placenta, which supports a more persistent role throughout pregnancy and is also notable as many transcripts are missing in the placenta (20). In any case, we expect that follow-up studies can use our results to prioritize the most relevant genes and periods for each gene.

Finally, the analysis framework presented here can be applied to other exposures and outcomes as well. Similar questions about the timing of effects have been investigated with other factors of preterm birth as well, but the studies so far used ad hoc techniques, such as separate logistic regression models within subgroups (12). Moreover, there is great interest in applying time-to-event models to the age of onset of various adulthood diseases (19,21,22). In both cases, adopting the procedures presented here could provide easier interpretation, likely more precise effect estimates (owing to the use of continuous data across the full range of the phenotype) and more direct testing of clinically and biologically relevant questions.

Materials and Methods

Data source

The Norwegian Mother, Father and Child Cohort Study (MoBa) is a population-based pregnancy cohort study conducted by the Norwegian Institute of Public Health (23). Participants were recruited from all over Norway in 1999–2008. The women consented to participation in 41% of the pregnancies. The cohort includes ~114 500 children, 95 200 mothers and 75 200 fathers. The data used in this study are version 12 of the quality-assured data files released by MoBa and include genotype data, questionnaires filled out by the parents and linked records from Medical Birth Registry (MBRN), a national health registry containing information about all births in Norway.

The establishment of MoBa and initial data collection were based on a license from the Norwegian Data Protection Agency and approval from The Regional Committees for Medical and Health Research Ethics. The MoBa cohort is currently regulated by the Norwegian Health Registry Act. The current study was approved by the Norwegian Regional Committee for Medical and Health Research Ethics South-East (2015/2425) and Swedish Ethical Review Authority (Dnr 2022–03248-01).

Study population and genotyping

The present study uses a subset of this cohort: pregnancy outcomes, maternal and fetal genotypes from 31 496 parent-offspring trios that were genotyped over 2012–2018. We note that these samples have been used as a small part of a previous meta-analysis of gestational duration (6). Genotyping included only singleton, live-birth pregnancies with complete birth registry data and at least the first MoBa questionnaire answered and individuals alive at time of genotyping. Further, we excluded pregnancies that used IVF, where the mother had diabetes, the child’s APGAR score was 0 at both 1 and 5 min, recorded gestational duration was an extreme outlier and kept one random pregnancy for each mother that had repeated pregnancies in the cohort.

Genotyping was performed in nine batches on different Illumina arrays (HumanCoreExome-12 v1.1, HumanCoreExome-24 v1.0, Global Screening Array v1.0, InfiniumOmniExpress-24 v.2, HumanOmniExpress-24 v1.0). Genotypes were called in Illumina GenomeStudio software (v.2011.1 and v.2.0.3). Cluster positions were identified from samples with a call rate ≥ 0.98 and a GenCall score ≥ 0.15. Variants were excluded if they matched any of GenomeStudio parameters of cluster separation < 0.4, 10% GC-score < 0.3, AA T Dev > 0.025, or call rate < 98% or HWE P-value <1e-6. Samples were excluded if they showed < 98% markers called, heterozygosity excess > 4 SD, genotyped sex mismatch, or were related to another sample in the cohort with estimated identity proportion (PI_HAT) > 0.1. Finally, individuals of primarily non-European ancestry were excluded based on PCA with reference samples from the HapMap project, version 3.

Pre-phasing was conducted locally using Shapeit v2.790 (24). Imputation was performed at the Sanger Imputation Server with Positional Burrows–Wheeler Transform and HRC version 1.1 reference panel (25). All coordinates are provided in reference to the human genome build GRCh37.

Genome-wide association studies

Four GWASs were performed in this data, using maternal or fetal genomes and gestational duration (in days, determined by ultrasound) or preterm delivery (gestational duration < 259 days) as the outcome. The analyses were performed in PLINK v2.00 alpha 3.3 (26) using an additive regression model (linear or logistic) with minor allele dosage and batch covariates. Non-spontaneous deliveries (initiated by induction or cesarean) and genetic variants with minor allele frequency < 1% were excluded here. Standard genome-wide significance threshold of 5e-8 was used. Lambda and intercept parameters for genomic inflation were calculated using LD score regression (27) and pre-computed LD scores obtained in subjects of recent European ancestry from the 1000 Genomes Project. The regional scatter plot for the fetal locus was done using the LocusZoom’s web application (28). The LD reference panel was the 1000 Genomes Mar 2012 EUR cohort.

For comparing the effect directions, variants at suggestive association in both of the maternal GWASs (P < 1e-5) were selected and pruned by distance (i.e. removing other variants within 1.5 Mbp).

For comparing the GWAS power for the two phenotypes, we used formulas from (29). Specifically, we assume that the gestational age distribution is normal, and variants only have additive, constant effects on its mean, not in any way specific to preterm or other time periods. This is the liability model, and then, the sample size required to achieve equal power in continuous and dichotomized analysis (namely, Wald tests of simple linear or logistic regression coefficients) is related by a factor of Inline graphic, where Inline graphic, K is the prevalence of cases, and Inline graphic—the pdf and quantile function of the standard normal distribution, respectively. The same factor relates the continuous and dichotomized effect sizes and SEs: Inline graphic. The significance level allowing power f in the dichotomous analysis is Inline graphic; we calculated this for effect size corresponding to the minimum standardized effect detectable in the continuous analysis at genome-wide significance and 80% power, i.e. Inline graphic, converted as before. Power for the conducted GWASs was also calculated using the genpwr R package, with observed sample size, prevalence and SD.

Time-to-event analysis of top variants

For detailed analysis of timing, we selected all loci identified as genome-wide significant for pregnancy duration or preterm birth in the largest available maternal (6) and fetal (5) meta-analyses (all loci discovered in the third large study by (7) are represented by at least one variant in these results). From each locus, the top variant as initially reported was chosen. One new fetal variant identified in the present study was added to this set. The name of each locus here is the primary gene assigned to it in the source study.

The variants were re-analyzed in time-to-event models based on the Cox proportional hazards framework (see (30) for a general introduction to such models, or (13) about their use with pregnancy duration). Our event of interest is spontaneous delivery. The hazard Inline graphic then is defined as the instantaneous risk of delivery at time Inline graphic for a mother who has not delivered before Inline graphic. The covariates x are modeled to change this risk log-linearly:

graphic file with name DmEquation1.gif (1)

where Inline graphic is a baseline hazard, and Inline graphic are the effects of covariates. Note that this effect may vary in time, e.g. a covariate may be only relevant during some particular periods of pregnancy. Inspecting the shape of Inline graphic for each selected variant and testing whether it is constant are the main aims of our analysis.

Our primary approach for computing this uses an approximation of (1) the piecewise exponential additive mixed model, PAMM (31). Its idea is to partition the range of observed event times into short intervals, within which the hazard is assumed to be constant and the number of events is Poisson-distributed. Given sufficiently short intervals, the likelihood of this model is proportional to that of (1), so this way, existing tools for Poisson regression and generalized additive mixed models can be used to fit model (1). In particular, the baseline hazard and the effect Inline graphic can be expanded using a spline basis to obtain smooth non-linear forms (31,32).

Specifically, for each variant, we fit the following hazard rate model:

graphic file with name DmEquation2.gif (2)

where Inline graphic is the end time of the interval containing t, Inline graphic is the basis functions of a cubic regression spline and g is the minor allele dosage and other covariates (with constant effects) z. The model is fit using R packages mgcv (32) and pammtools (31). Our null hypothesis is that the variant has only constant effects, i.e. all Inline graphic. We test its significance and extract the estimated effects over time and their confidence intervals using standard mgcv functionality (32). We also checked that this test maintains the type I error rate in our setting: we simulated the phenotype by bootstrap from the observed distribution in the sample, simulated a locus with no effect and minor allele frequency of 0.3 and fitted the model, in 2000 replicates.

The outcome and covariates were prepared as follows. The pregnancy durations were shifted to start at 1 (by subtracting 169 days), so that the period where no live deliveries are expected would not affect the estimation. Non-spontaneous deliveries were treated as censored. The PAMM interval endpoints were set to 0, 20 and then every 7 days. Minor (within this dataset) allele dosage was used as retrieved from the imputation, i.e. a number between 0 and 2, with occasional non-integer values representing uncertain imputation. Maternal dosage data were used for variants discovered in maternal GWASs and fetal dosages for fetal variants. The other covariates were genotyping batch, maternal age (in years) as a pair of degree-2 orthogonal polynomials, the year of the delivery, maternal height (in centimeters), fetal sex (reference level male), presence of congenital malformations (reference level no) and parity (reference 1, other categories 0, 2 or ≥3 previous deliveries). For maternal height, missing or extreme outlier values were mean-imputed.

To verify that our PAMM setup approximates the Cox model (1), at least when the effects are constant, we ran a simplified PAMM,

graphic file with name DmEquation3.gif (3)

and a Cox regression with the same covariates, as implemented in the R survival package (33), and compared their Inline graphic estimates. As a sensitivity analysis, two other tests for non-proportionality of hazards were applied. First, a PAMM that includes a time × genetic effect interaction as a specified form (linear), instead of the spline, i.e.

graphic file with name DmEquation4.gif (4)

The significance of the interaction term, as calculated by mgcv, is reported. The second test was the cox.zph test implemented in the survival package; it again reports the significance of the interaction term, but here the times are replaced with the corresponding Kaplan–Meier survival estimates Inline graphic to reduce the effect of outliers, i.e. the term is Inline graphic (33).

Among the top loci, gene pathway enrichment was checked using the gprofiler2 R package (34). The package tests pathways from multiple sources, including GeneOntology and WikiPathways, and reports a P-value for overrepresentation (the package applies a custom ‘g:SCS’ multiple testing adjustment, determined empirically).

As a functional support for the observed effects of the top loci, we also inspected the expression levels of the corresponding genes in RNA sequencing data from (14). The data were obtained from 125 placenta samples, collected at pregnancy termination between 6 and 23 weeks of gestation. The read counts for each gene were retrieved from GEO (GSE150830), and we removed non-protein-coding genes based on Ensembl annotations. We report the median of sample counts for each gene and the median of these over genes in each set (top loci or all other).

Construction and analysis of the gestational-duration PGS

We calculated a gestational-duration PGS for each mother included in the study. The score weights were previously trained in a maternal GWAS meta-analysis on gestational duration (6). The MoBa cohort of the present study was excluded, resulting in n = 173 162 mothers for the GWAS, and the score weights were trained using LDpred2. Each individual’s PGS is the product of the weighted effects of the variants from the training and the number of corresponding effect alleles. In this section, we derived hardcall genotypes for the study population and calculated the individual scores using PLINK 1.9. Note that fetal scores were not used here owing to low overall heritability. The phenotype was prepared as described in the previous sections, with induced births censored in the PAMMs and excluded in all other models.

The variance explained by the PGS was reported as the R-squared from a linear regression model of gestational duration with PGS as a covariate (with or without the batch and clinical covariates described before). For comparison, we reran this model but excluding all variants further than 50 kb away from any of the top maternal hits (weights were unchanged for the retained variants). We also refitted the PAMM model for each variant with PGS as an additional constant-effect covariate.

The PGS was then categorized into quintiles and a PAMM fitted with a smooth term for each category (except the middle one as the reference). This was repeated with tertile or septile categories of PGS instead. The genotyping batch and the clinical covariates were kept as constant-effect covariates. We also show the risks of preterm delivery and vPTD (fraction of spontaneous births in this sample < 259 or < 224 days of gestation, respectively) and their confidence intervals obtained with R prop.test.

In the exploratory analyses, we first inspected the total per-individual counts of rare deleterious alleles, which we defined as having a negative effect of at least 0.001 days in the PGS and frequency < 10%. We similarly plotted the counts of deleterious recessive genotypes—minor homozygous genotypes for alleles with < −0.001 days effect in the PGS. We then ran a principal component (PC) analysis on the maternal genetic data, using only autosomal PGS variants, pruned to reduce dependency with PLINK’s —indep-pairwise 100 10 0.2 command. Top 10 PCs were extracted. All PCs and the two counts were tested for association with preterm delivery or vPTD in logistic regression models, adjusting for the PGS and genotyping batch. Variant loadings on PC3 were taken as reported by PLINK, and the Genome Reference Consortium boundaries of the MHC region were retrieved from https://www.ncbi.nlm.nih.gov/grc/human/regions/MHC?asm=GRCh37.

Supplementary Material

suppl_material_ddad086

Acknowledgements

The Norwegian Mother, Father and Child Cohort Study is supported by the Norwegian Ministry of Health and Care Services and the Ministry of Education and Research. We are grateful to all the participating families in Norway who take part in this on-going cohort study. We thank the Norwegian Institute of Public Health (NIPH) for generating high-quality genomic data. This research is part of the HARVEST collaboration, supported by the Research Council of Norway (#229624). We also thank the NORMENT Centre for providing genotype data, funded by the Research Council of Norway (#223273), South East Norway Health Authorities and Stiftelsen Kristian Gerhard Jebsen. We further thank the Center for Diabetes Research, the University of Bergen for providing genotype data and performing quality control and imputation of the data funded by the ERC AdG project SELECTionPREDISPOSED, Stiftelsen Kristian Gerhard Jebsen, Trond Mohn Foundation, the Research Council of Norway, the Novo Nordisk Foundation, the University of Bergen and the Western Norway Health Authorities. We thank the research participants and employees of 23andMe for making this work possible.

Conflict of Interest statement. None declared.

Contributor Information

Julius Juodakis, Department of Obstetrics and Gynecology, Institute of Clinical Sciences, University of Gothenburg, Gothenburg 416 50, Sweden.

Karin Ytterberg, Department of Obstetrics and Gynecology, Institute of Clinical Sciences, University of Gothenburg, Gothenburg 416 50, Sweden.

Christopher Flatley, Department of Obstetrics and Gynecology, Institute of Clinical Sciences, University of Gothenburg, Gothenburg 416 50, Sweden.

Pol Sole-Navais, Department of Obstetrics and Gynecology, Institute of Clinical Sciences, University of Gothenburg, Gothenburg 416 50, Sweden.

Bo Jacobsson, Department of Obstetrics and Gynecology, Institute of Clinical Sciences, University of Gothenburg, Gothenburg 416 50, Sweden; Department of Genetics and Bioinformatics, Division of Health Data and Digitalisation, Norwegian Institute of Public Health, Oslo 0456, Norway.

Funding

B.J. received funding from The Swedish Research Council, Stockholm, Sweden (2019-01004), The Research Council of Norway, Oslo, Norway (FRIMEDBIO #547711), March of Dimes (#21-FY16-121), Agreement concerning research and education of doctors (ALFGBG-965353). J.J. received funding from Lilla Barnets Foundation. Research by B.J. was also supported by the Eunice Kennedy Shriver National Institute of Child Health & Human Development of the National Institutes of Health under Award Number R01HD101669. The funders played no role in the study design, data collection, analysis, interpretation or the writing of this manuscript. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funders.

Data and code availability

All codes for replicating the analyses presented here are available at https://github.com/PerinatalLab/time-varying. Summary statistics from the GWASs performed as part of this project (fetal and maternal GWASs of gestational duration and preterm delivery) are available from the GWAS Catalogue, accession IDs: GCST90268419, GCST90268420, GCST90268421 and GCST90268422. The full GWAS summary statistics for the 23andMe data used in the construction of PGS are available through 23andMe to qualified researchers under an agreement with 23andMe that protects the privacy of the 23andMe participants. Please visit https://research.23andme.com/collaborate/#dataset-access/ for more information and to apply to access the data. A simplified set of weights for the polygenic score of gestational duration, trained excluding 23andMe, Inc. data, is available in the PGS Catalog. The consent given by the participants does not permit public access of data at the individual level. Researchers can apply for access to data through helsedata.no. Access requires approval from the Regional Committees for Medical and Health Research Ethics in Norway and an agreement with MoBa.

References

  • 1. Vogel, J.P., Chawanpaiboon, S., Moller, A.-B., Watananirun, K., Bonet, M. and Lumbiganon, P. (2018) The global epidemiology of preterm birth. Best Pract. Res. Clin. Obstet. Gynaecol., 52, 3–12. [DOI] [PubMed] [Google Scholar]
  • 2. Blencowe, H., Cousens, S., Oestergaard, M.Z., Chou, D., Moller, A.-B., Narwal, R., Adler, A., Vera Garcia, C., Rohde, S., Say, L. and Lawn, J.E. (2012) National, regional, and worldwide estimates of preterm birth rates in the year 2010 with time trends since 1990 for selected countries: a systematic analysis and implications. Lancet, 379, 2162–2172. [DOI] [PubMed] [Google Scholar]
  • 3. Jing, S., Chen, C., Gan, Y., Vogel, J. and Zhang, J. (2020) Incidence and trend of preterm birth in China, 1990-2016: a systematic review and meta-analysis. BMJ Open, 10, e039303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Wadon, M., Modi, N., Wong, H.S., Thapar, A. and O’Donovan, M.C. (2020) Recent advances in the genetics of preterm birth. Ann. Hum. Genet., 84, 205–213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Liu, X., Helenius, D., Skotte, L., Beaumont, R.N., Wielscher, M., Geller, F., Juodakis, J., Mahajan, A., Bradfield, J.P., Lin, F.T.J.  et al. (2019) Variants in the fetal genome near pro-inflammatory cytokine genes on 2q13 associate with gestational duration. Nat. Commun., 10, 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Solé-Navais, P., Flatley, C., Steinthorsdottir, V., Vaudel, M., Juodakis, J., Chen, J., Laisk, T., LaBella, A.L., Westergaard, D., Bacelis, J.  et al. (2023) Genetic effects on the timing of parturition and links to fetal birth weight. Nat. Genet., 55, 559–567. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Zhang, G., Feenstra, B., Bacelis, J., Liu, X., Muglia, L.M., Juodakis, J., Miller, D.E., Litterman, N., Jiang, P.-P., Russell, L.  et al. (2017) Genetic associations with gestational duration and spontaneous preterm birth. N. Engl. J. Med., 377, 1156–1167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Rappoport, N., Toung, J., Hadley, D., Wong, R.J., Fujioka, K., Reuter, J., Abbott, C.W., Oh, S., Hu, D., Eng, C.  et al. (2018) A genome-wide association study identifies only two ancestry specific variants associated with spontaneous preterm birth. Sci. Rep., 8, 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Tiensuu, H., Haapalainen, A.M., Karjalainen, M.K., Pasanen, A., Huusko, J.M., Marttila, R., Ojaniemi, M., Muglia, L.J., Hallman, M. and Rämet, M. (2019) Risk of spontaneous preterm birth and fetal growth associates with fetal SLIT2. PLoS Genet., 15, e1008107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Ragland, D.R. (1992) Dichotomizing continuous outcome variables: dependence of the magnitude of association and statistical power on the cutpoint. Epidemiology, 3, 434–440. [DOI] [PubMed] [Google Scholar]
  • 11. Romero, R., Dey, S.K. and Fisher, S.J. (2014) Preterm labor: one syndrome, many causes. Science, 345, 760–765. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Auger, N., Le, T.U.N., Park, A.L. and Luo, Z.-C. (2011) Association between maternal comorbidity and preterm birth by severity and clinical subtype: retrospective cohort study. BMC Pregnancy Childbirth, 11, 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Juodakis, J., Bacelis, J., Zhang, G., Muglia, L.J. and Jacobsson, B. (2017) Time-variant genetic effects as a cause for preterm birth: insights from a population of maternal cousins in Sweden. G3, 7, 1349–1356. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Breen, J., McAninch, D., Jankovic-Karasoulos, T., McCullough, D., Smith, M.D., Bogias, K.J., Wan, Q., Choudhry, A., Hin, N., Pederson, S.M., Bianco-Miotto, T. and Roberts, C.T. (2020) Temporal placental genome wide expression profiles reflect three phases of utero-placental blood flow during early to mid human gestation  In medRxiv. 10.1101/2020.06.25.20139709. [DOI]
  • 15. Balan, T.A. and Putter, H. (2020) A tutorial on frailty models. Stat. Methods Med. Res., 29, 3424–3454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Crowther, M.J., Royston, P. and Clements, M. (2022) A flexible parametric accelerated failure time model and the extension to time-dependent acceleration factors. Biostatistics. 10.1093/biostatistics/kxac009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Baird, D.D. (2009) The gestational timing of pregnancy loss: adaptive strategy?  American Journal of Human Biology: The Official Journal of the Human Biology Council, 21, 725–727. [DOI] [PubMed] [Google Scholar]
  • 18. Rokas, A., Mesiano, S., Tamam, O., LaBella, A., Zhang, G. and Muglia, L. (2020) Developing a theoretical evolutionary framework to solve the mystery of parturition initiation. eLife, 9. 10.7554/eLife.58343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Jiang, X., Holmes, C. and McVean, G. (2021) The impact of age on genetic risk for common diseases. PLoS Genet., 17, e1009723. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Gong, S., Gaccioli, F., Aye, I.L.M., Avellino, G., Cook, E., Lawson, A.R.J., Harvey, L.M.R., Smith, G.C.S. and Stephen Charnock-Jones, D. (2022) The human placenta exhibits a unique transcriptomic void  In bioRxiv. 10.1101/2022.07.01.498408. [DOI] [PubMed]
  • 21. He, L. and Kulminski, A.M. (2020) Fast algorithms for conducting large-scale GWAS of age-at-onset traits using cox mixed-effects models. Genetics, 215, 41–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Pedersen, E.M., Agerbo, E., Plana-Ripoll, O., Steinbach, J., Krebs, M.D., Hougaard, D.M., Werge, T., Nordentoft, M., Børglum, A.D., Musliner, K.L.  et al. (2022) ADuLT: an efficient and robust time-to-event GWAS  In bioRxiv. 10.1101/2022.08.11.22278618. [DOI] [PMC free article] [PubMed]
  • 23. Magnus, P., Birke, C., Vejrup, K., Haugan, A., Alsaker, E., Daltveit, A.K., Handal, M., Haugen, M., Høiseth, G., Knudsen, G.P.  et al. (2016) Cohort profile update: the Norwegian mother and child cohort study (MoBa). Int. J. Epidemiol., 45, 382–388. [DOI] [PubMed] [Google Scholar]
  • 24. O’Connell, J., Gurdasani, D., Delaneau, O., Pirastu, N., Ulivi, S., Cocca, M., Traglia, M., Huang, J., Huffman, J.E., Rudan, I.  et al. (2014) A general approach for haplotype phasing across the full Spectrum of relatedness. PLoS Genet., 10, e1004234. 10.1371/journal.pgen.1004234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. McCarthy, S., Das, S., Kretzschmar, W., Delaneau, O., Wood, A.R., Teumer, A., Kang, H.M., Fuchsberger, C., Danecek, P. and Sharp, K. (2016) A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet., 48, 1279–1283. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Chang, C.C., Chow, C.C., Tellier, L.C., Vattikuti, S., Purcell, S.M. and Lee, J.J. (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience, 4, 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Bulik-Sullivan, B.K., Loh, P.-R., Finucane, H.K., Ripke, S., Yang, J., Schizophrenia Working Group of the Psychiatric Genomics Consortium, Patterson, N, Daly, M.J., Price, A.L. and Neale, B.M. (2015) LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet., 47, 291–295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Pruim, R.J., Welch, R.P., Sanna, S., Teslovich, T.M., Chines, P.S., Gliedt, T.P., Boehnke, M., Abecasis, G.R. and Willer, C.J. (2010) LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics, 26, 2336–2337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Wu, T. and Sham, P.C. (2021) On the transformation of genetic effect size from logit to liability scale. Behav. Genet., 51, 215–222. [DOI] [PubMed] [Google Scholar]
  • 30. Kleinbaum, D.G. and Klein, M. (2006) Survival Analysis: A Self-Learning Text. Springer, New York, NY. [Google Scholar]
  • 31. Bender, A., Groll, A. and Scheipl, F. (2018) A generalized additive model approach to time-to-event analysis. Stat. Modell., 18, 299–321. [Google Scholar]
  • 32. Wood, S.N. (2017) Generalized Additive Models: An Introduction with R, Second edn. CRC Press, Boca Raton, FL. [Google Scholar]
  • 33. Therneau, T.M. and Grambsch, P.M (2000) Modeling survival data: extending the Cox Model. Springer, New York. ISBN 0-387-98784-3. [Google Scholar]
  • 34. Raudvere, U., Kolberg, L., Kuzmin, I., Arak, T., Adler, P., Peterson, H. and Vilo, J. (2019) G:profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res., 47, W191–W198. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

suppl_material_ddad086

Data Availability Statement

All codes for replicating the analyses presented here are available at https://github.com/PerinatalLab/time-varying. Summary statistics from the GWASs performed as part of this project (fetal and maternal GWASs of gestational duration and preterm delivery) are available from the GWAS Catalogue, accession IDs: GCST90268419, GCST90268420, GCST90268421 and GCST90268422. The full GWAS summary statistics for the 23andMe data used in the construction of PGS are available through 23andMe to qualified researchers under an agreement with 23andMe that protects the privacy of the 23andMe participants. Please visit https://research.23andme.com/collaborate/#dataset-access/ for more information and to apply to access the data. A simplified set of weights for the polygenic score of gestational duration, trained excluding 23andMe, Inc. data, is available in the PGS Catalog. The consent given by the participants does not permit public access of data at the individual level. Researchers can apply for access to data through helsedata.no. Access requires approval from the Regional Committees for Medical and Health Research Ethics in Norway and an agreement with MoBa.


Articles from Human Molecular Genetics are provided here courtesy of Oxford University Press

RESOURCES