Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 May 25.
Published in final edited form as: J R Stat Soc Ser C Appl Stat. 2015 Apr 3;64(5):711–730. doi: 10.1111/rssc.12100

Modelling the type and timing of consecutive events: application to predicting preterm birth in repeated pregnancies

Joanna H Shih 1, Paul S Albert 2, Pauline Mendola 3, Katherine L Grantz 4
PMCID: PMC4879837  NIHMSID: NIHMS784878  PMID: 27239073

Summary

Predicting the occurrence and timing of adverse pregnancy events such as preterm birth is an important analytical challenge in obstetrical practice. Developing statistical approaches that can be used to assess the risk and timing of these adverse events will provide clinicians with tools for individualized risk assessment that account for a woman’s prior pregnancy history. Often adverse pregnancy outcomes are subject to competing events; for example, interest may focus on the occurrence of pre-eclampsia-related preterm birth, where preterm birth for other reasons may serve as a competing event. We propose modelling the type and timing of adverse outcomes in repeated pregnancies. We formulate a joint model, where types of adverse outcomes across repeated pregnancies are modelled by using a polychotomous logistic regression model with random effects, and gestational ages at delivery are modelled conditionally on the types of adverse outcome. The correlation between gestational ages conditional on the adverse pregnancies is modelled by the semiparametric normal copula function. We present a two-stage estimation method and develop the asymptotic theory for the estimators proposed. The model and estimation procedure proposed are applied to the National Institute of Child Health and Human Development consecutive pregnancies study data and evaluated by simulations.

Keywords: Adverse pregnancy outcome, Normal copula, Polychotomous random-effects logistic model, Pre-eclampsia, Preterm birth, Repeated pregnancies

1. Introduction

Adverse pregnancy outcomes affect the health of both the mother and the fetus. Women tend to repeat adverse outcomes in subsequent pregnancies, but our ability to predict recurrence for an individual woman remains limited (Louis et al., 2006). Developing individualized assessment of risk in consecutive pregnancies could result in risk stratification that both improves the predictive value of early pregnancy screening tests and facilitates development of individualized management plans. The National Institute of Child Health and Development (NICHD) consecutive pregnancies study was designed to assess the association of adverse pregnancies and their timings during gestation across repeated pregnancies. Specifically, the primary objective of this study was to estimate the incidence of adverse pregnancy outcomes, to identify risk factors that are associated with this incidence and to estimate individual risk incidence curves as a function of gestational age that account for both risk factors and prior pregnancy history.

The study collected retrospective data on 114630 pregnancies from 51066 women delivering later than 20 weeks of gestation from 20 hospitals in the state of Utah from 2002 to 2010. The data set captured all consecutive births (later than 20 weeks of gestation) within this 9-year time period.

One of the current challenges in evaluating subsequent risk of adverse pregnancy outcomes is that prior history of one outcome increases the risk not only of that particular outcome but also of a number of different outcomes in a subsequent pregnancy, potentially due to similar biologic pathways. For example, history of medically indicated preterm birth is associated not only with an increased risk of subsequent medically indicated preterm birth but also spontaneous preterm birth (Ananth et al., 2006). Therefore, at each pregnancy, women may be at risk of multiple types of adverse pregnancy outcomes where the occurrence of one event precludes the occurrence of another. In this paper, we study two different indications for preterm birth under 37 weeks of gestation that are types of competing adverse pregnancy outcomes:

  1. preterm birth complicated by pre-eclampsia designated as pre-eclampsia-related (PR) preterm birth and

  2. the remaining preterm births complicated by other medications or spontaneous labour, designated as pre-eclampsia-unrelated (PU) preterm birth.

Pre-eclampsia is a syndrome in which a pregnant women develops new onset high blood pressure and protein in the urine, after 20 weeks gestational age. The only treatment for pre-eclampsia is delivery, which may be required preterm (so-called PR preterm birth). PR preterm birth may be associated with different risk factors from other preterm births, and studying these associations may be important for managing pregnant women who have had prior PR preterm births. Recent research in obstetrics has focused on predicting cause-specific preterm birth from prior pregnancy history and other aetiological factors (Laughon et al., 2013). However, because of the lack of a statistical approach, predicting the timing as well as the occurrence has not been studied. This paper proposes a competing risk formulation for modelling the dual outcome of adverse pregnancy occurrence and gestational age. Specifically, our focus will be on modelling the type and gestational age of preterm birth, where PR and PU preterm birth are two competing events. The adverse preterm births and their timing are different from the conventional competing risk data in two aspects. First, the risk of a preterm birth is capped at 37 weeks after which the birth is classified as a term birth. In other words, the risk of having a preterm birth after 37 weeks is 0 and, as such, term birth is not a competing event. Second, gestational age is observed for each pregnancy and thus not subject to censoring. Because of censoring, much of the work in the competing risk literature (Kalbfleisch and Prentice (2002) and references therein) has been focused on modelling the cause-specific hazard function, even though it is a quantity that is difficult to interpret. In the current application, it is more desirable to correlate the gestational age of preterm birth with risk factors directly as compared with modelling the cause-specific hazard. Also, the investigators are interested in assessing the event (preterm birth) process alone as well as together with the timing of the consecutive events. For this, we develop a joint model to establish the interrelationship between the two types of preterm births and their gestational ages at delivery in repeated pregnancies. In this formulation, types of preterm birth are modelled with a polychotomous logistic model by using random effects, and repeated gestational ages at delivery are modelled conditionally on the types of preterm births. The correlation between types of preterm birth in repeated pregnancies is induced by the type-specific random effects, and the correlation between gestational ages conditional on the types of preterm birth is modelled by the normal copula function. With the joint model proposed, the aforementioned quantities of interest such as the incidence of recurrent adverse pregnancy outcome and its relationship to the occurrence and timing of adverse outcomes on previous pregnancy are readily derived.

The paper is organized as follows. In Section 2, we give an overview of the NICHD consecutive pregnancies study. In Section 3, we describe the model proposed and estimation method and present the asymptotic theory. We present the NICHD consecutive pregnancies study analysis results in Section 4 and evaluate the performance of the proposed estimators with simulations in Section 5. We conclude the paper with a brief discussion in Section 6.

The data that are analysed in the paper and the programs that were used to analyse them can be obtained from http://wileyonlinelibrary.com/journal/rss-datasets

2. National Institute of Child Health and Development consecutive pregnancy study

An important objective of the study was to characterize the association in adverse pregnancy outcomes and their timing across consecutive pregnancies. As such, pregnancy, labour and delivery medical records for women in the state of Utah who had at least two pregnancies during 2002–2010 were retrospectively retrieved. Gestational age at delivery and type of pregnancy outcome were collected at each pregnancy during the 9 years of the study period. For the analyses, we consider a woman to have pre-eclampsia during her pregnancy if she has the more serious conditions of eclampsia or superimposed pre-eclampsia. Eclampsia is defined as pre-eclampsia followed by seizure and superimposed pre-eclampsia is pre-eclampsia among women with chronic hypertension. A large number of clinical variables and pregnancy history before 2002 were collected as well.

Information on a total of 114 630 pregnancies from 51 066 women were recorded in the study. Table 1 tabulates the number of pregnancies and preterm births that were contributed by the women. Of the 114 630 pregnancies, 9552 were preterm births delivered by 7794 women. Fig. 1(a) displays the histogram of gestational age of preterm birth. As the gestational age of preterm birth is capped at 37 weeks, the distribution is skewed to the left. Of these women, 1571 (20.2%) had at least two preterm births, which together accounted for 3329 (34.9%) preterm births, suggesting that preterm births tend to recur. Of the 9552 preterm births, 1043 (10.9%) were PR. Table 2 tabulates the 1571 women with multiple preterm births by PR preterm birth and PU preterm birth. For example, 143 women had one PR and one PU preterm birth. As the two types of preterm birth occurred from the same woman, we explored whether the two types of preterm birth were correlated or not. Fig. 1(b) displays the normal rank score of gestational age of the first pregnancy versus that of the second pregnancy among women who delivered preterm births in both pregnancies, where the squares and full LOWESS smooth curve correspond to PR preterm births in both pregnancies, cross and dotted LOWESS smooth curves to PU preterm births in both pregnancies, and triangles and chain LOWESS smooth curves to one of each type. It appears that the strength of correlation between gestational ages of preterm births might depend on the types of preterm births. In addition, a quick assessment of the effect of adverse pregnancy outcome on the subsequent pregnancy can also be illustrated by the conditional cumulative incidence. Fig. 2(a) displays the cumulative incidence curve of the second pregnancy, where the full curve corresponds to the marginal cumulative incidence of PR preterm birth, and the broken and dotted curves correspond to the conditional counterpart given that the women had PR preterm birth and PU preterm birth at first pregnancy respectively. Fig. 2(b) displays the similar cumulative incidence plot for PU preterm birth. In both cases, there is a larger increase in the incidence of the same type of preterm birth than for different types. For example, the marginal chance of having a PR preterm delivery by week 32 at the second pregnancy was 0.14%, but such a chance was increased to 0.21% and 4.4% if the woman had PU and PR preterm birth at first pregnancy respectively.

Table 1.

Tabulation of woman participants by the number of pregnancies and preterm births

Number of pregnancies Result for the following numbers of preterm births:
Total
0 1 2 3 4 5
2 34255 4711 995 39961
3 8012 1313 337 125 9787
4 950 190 60 34 11 1245
5 53 9 6 1 1 1 71
6 2 0 0 0 0 0 2
Total 43272 6223 1398 160 12 1 51066

Fig. 1.

Fig. 1

Histogram and scatter plot of gestational age: (a) histogram of gestational age of preterm birth; (b) scatter plot of the normal rank score of gestational age of the first pregnancy versus normal rank score of gestational age of the second pregnancy among women who delivered preterm births in both pregnancies (□,——, PR preterm births in both pregnancies; ×,⋯⋯, PU preterm births in both pregnancies; △, ·–·–·, one of each type)

Table 2.

Tabulation of women with multiple adverse pregnancy outcomes by PR and PU birth

Number of PR births Result for the following numbers of PU preterm births:
Total
0 1 2 3 4
0 1165 140 10 1315
1 143 12 0 0 155
2 90 2 1 1 0 94
3 6 1 0 0 0 7
Total 96 146 1178 141 10 1571

Fig. 2.

Fig. 2

Cumulative incidence curve of the second preterm birth (——, marginal cumulative incidence; – – –, cumulative incidence of PR or PU preterm birth given PR preterm birth at first pregnancy; ⋯⋯, cumulative incidence of PR or PU preterm birth given PU preterm birth at first pregnancy): (a) PR preterm birth; (b) PU preterm birth

This brief summary demonstrates several features of the consecutive pregnancy data including

  1. the correlation of same type as well as different type of repeated pregnancy outcomes,

  2. the distribution of gestational age of preterm birth being skewed,

  3. the correlation of gestational ages of multiple pregnancies depending on the types of pregnancy outcomes and

  4. the association between multiple pregnancy outcomes and gestational ages.

Developing personalized cumulative incidence curves is an important goal for clinical management of pregnant women. In the next section, we present a statistical model that will allow us to incorporate clinical history at each consecutive pregnancy to develop individualized risk prediction.

3. Method

3.1. Model

Consider a sample of n women of whom the pregnancy history data were collected. For the ith woman, let Yi = (Yi1, …, Yiki) and Ti = (Ti1, …, Tiki) denote multinomial pregnancy outcomes and gestational ages of ki pregnancies, where Yik = 0, 1, 2 corresponds to term, PR preterm, and PU preterm birth respectively. Let ZiY=(Zi1Y,,ZikiY)T and ZiT=(Zi1T,,ZikiT)T be ki × p1 and ki × p2 covariate matrices associated with Yi and Ti respectively. The two sets of covariates may overlap. We model the joint distribution of (Yi, Ti) via Yi and Ti|Yi. The joint distribution of Yi is specified through the polychotomous logistic regression with random-effects model

pijc(bic)=p(Yij=cbic)=exp(α0c+αcZijY+bic)1+c=12exp(α0c+αcZijY+bic), (1)

where bic, c = 1, 2, are the random intercepts indexing person-specific sensitivity to PR preterm birth (c = 1) and PU preterm birth (c = 2) respectively. It is assumed that bi = (bi1, bi2) follows a bivariate zero-mean normal distribution with variances σc2 and correlation coefficient ν. Conditionally on bi = (bi1, bi2), (Yi1, …, Yiki) are assumed to be independent with the conditional probability function given by

p(yibi)=p(Yi1=y1,,Yik=ykbi)=j=1kic=02pijc(bic)I(yij=c), (2)

where pij0=1-c=12pijc, i = 1, …, n. The correlated random intercepts are used to induce correlation between multiple adverse pregnancies of the same type as well as of different types in repeated pregnancies. For example, if ν is positive, then a woman with a large value of b1 has a higher chance of having PR preterm birth, which in turn increases the chance of having PU preterm birth in subsequent pregnancies. The joint distribution function of Yi is obtained by integrating b in equation (2) and equals

p(yi)=p(Yi1=y1,,Yik=ykb)g(b)db,

where g(·) is the density function of the bivariate normal distribution with mean 0 and variance–covariance matrix

(σ12σ1σ2νσ1σ2νσ22).

Next, we model the joint distribution of Ti = (Ti1, …, Tki) conditionally on Yi. Since our interest is on inference related to PR and PU preterm birth, the distribution of gestational age beyond 37 weeks is not of interest and is not modelled. Correspondingly, modelling the joint distribution of gestational ages in the first 37 weeks given the pregnancy outcomes is equivalent to modelling the joint distribution of gestational ages among the pregnancies with Yijs>0. Since the conditional distribution of Tij given Yij > 0 is skewed as seen in Fig. 1(a), no distributional assumption is made for Tij. Rather, we assume that Tij (or a suitable transformation of Tij) given Yij > 0 follows the linear model

Tij=β0+βZijT+θI(Yij=1)+εij,j=1,,li, (3)

where li=j=1kiI(Yj>0), is the number of preterm births for woman i, and εij is the error term of which the distribution function is left unspecified. For model parsimony, we assumed that gestational age depends on the type of preterm birth, Y, of the same pregnancy only, not of different pregnancies. This is the reproducibility assumption that is often used in modelling the marginal distribution in multivariate outcomes analysis (Whittemore, 1995).

The correlation of repeated gestational ages of preterm births is specified by the joint distribution of εijs which is assumed to follow a normal copula model given by

(G(εi1),,G(εili)T)~Φρ (4)

for an unspecified monotone increasing transformation G, where Φ is the standard multivariate normal distribution with common correlation coefficient ρ. Define the marginal distribution of εij by F(e) = Pr(εije). As implied by the normal copula model (4), G(·) = Φ−1{F(·)}, where Φ is the cumulative standard normal distribution. One major advantage of modelling the multivariate distribution of gestational ages by the normal copula function is that, with the monotonic transformation G, the joint distribution can be fully described by arbitrary continuous univariate distributions and a correlation matrix which fully specifies the dependence of a multivariate normal distribution. A normal copula has been used in other settings. For example, Huang and Berry (2006) used a normal copula to estimate the association between the mark time and survival time, Song et al. (2009) used a normal copula to model the correlation of mixed (continuous and/or discrete) correlated data where the marginal data follow a parametric exponential dispersion family distribution. Othus and Li (2011) assumed a proportional hazards model for the marginal distribution and used a normal copula to model the correlation of multivariate survival data.

In model (4), all the pairs of gestational ages of preterm births share a common correlation. The scatter plot that is displayed in Fig. 1(b), however, suggests that the correlation may vary with the types of preterm birth, i.e. the correlation between pairs of gestational ages with PR preterm in both members may be different from that with the other two pair types (PU, PU) and (PR, PU). To accommodate this possibility, model (4) can be extended to model (5):

(G{εik(yik)},G{εij(yij)})~Φρyij,yik, (5)

where ρ11, ρ22 and ρ12 correspond to the correlation between gestational ages of PR preterm births in both pregnancies, PU preterm births in both pregnancies and one type each respectively.

Models (1)–(5) together determine the joint distribution of pregnancy outcome types and gestational ages of preterm births of all the repeated pregnancies in each woman. They can be used to make inference on a variety of quantities of interest. Two such quantities are

  1. the cumulative incidence of PR (or PU) preterm birth of the first pregnancy recorded in the study for a set of baseline covariates and

  2. the cumulative incidence of PR (or PU) preterm birth of a pregnancy given the previous adverse outcome, gestational age and covariates.

For brevity of notation, the subscript indexing subject is omitted in the formula displaying these two quantities. The first quantity is formulated as p(T1t,Y1=yZ1Y,Z1T) which equals F{t-E(T1Z1T,y)}p(Y1=yZ1Y), where p(Y1=yZ1Y)=p(Y1=yZ1Y,b)g(b)db.

The second quantity for the second pregnancy can be formulated as p(T2t,Y2=y2a<T1b,Y1=y1,Z1T,Z1Y,Z2T,Z2Y), a < b < 37 weeks, which equals

p(a<T1b,T2tY1=y1,Y2=y2,Z1T,Z2T)p(Y1=y1,Y2=y2Z1Y,Z2Y)p(a<T1bY1=y1,Z1T)p(Y1=y1Z1Y)={p(T1b,T2tY1=y1,Y2=y2,Z1T,Z2T)-p(T1a,T2tY1=y1,Y2=y2,Z1T,Z2T)}p12y1,y2(Z1Y,Z2Y){p(T1bY1=y1Z1T)-p(T1aY1=y1,Z1T)}p1y1(Z1Y),

where

p(T1t1,T2t2Y1=y1,Y2=y2,Z1T,Z2T)=Φ(Φ-1[F{t1-E(T1Z1T,y1)}],Φ-1[F{t2-E(T2Z2T,y2)}]),

p(T1tY1=y1,Z1T)=F{t-E(T1Z1T,y1)}, and Φ(·, ·) is the bivariate cumulative distribution, and p1y1(Z1Y) and p12y1,y2(Z1Y,Z2Y) are the probability of Y1 = y1 and the joint probability of Y1 = y1 and Y2 = y2 given the corresponding covariates. Even though T1 would be known, when it is used in predicting the cumulative incidence of T2, we use a range for gestational age to allow for the flexibility in relating a range of T1 to T2. This is important for our scientific problem. From a population perspective, it is interesting to the obstetrics community to estimate the cumulative incidence of PR preterm birth in a second pregnancy for women who have very early preterm birth in the first pregnancy (34 weeks or earlier). If one is interested in relating a specific gestational age of the first pregnancy to that of the second pregnancy (i.e. individual prediction), the value of a can be chosen such that its distance to value b is infinitesimal. In the joint model, outcomes of all the repeated pregnancies in each woman are included, and thus, for those with two or more previous pregnancies, we could estimate cumulative incidence of the preterm events by using the entire pregnancy history and not just the last pregnancy. For example, for a woman who had two preterm births, it can be useful to predict a woman’s risk of PR (or PU) preterm birth in her third pregnancy.

3.2. Estimation

In this section, we present an estimation procedure to estimate the parameters in models (1)–(4). We start with the estimation of parameters Ω= (α, σ1, σ2, ν) in the polychotomous logistic model with random effects for the repeated adverse pregnancy outcomes by maximizing its likelihood function given by

L(Ω)=ijc=02pijcI(yij=c)g(b)db. (6)

Numerical integration such as Gauss–Hermite quadrature can be used to perform the integration over the bivariate normal distribution of the correlated random effects specified in model (3). Alternatively the EM algorithm may be used to compute the maximum likelihood estimate of Ω. On the basis of the standard asymptotic properties of the maximum likelihood estimators, under the correct specification of the random-effects polychotomous logistic model, the maximum likelihood estimator Ω̂ is consistent and asymptotically normal with mean equal to the true parameter values Ω and variance matrix Σ which can be consistently estimated by the inverted observed information matrix.

Parameters in models (3) and (4) for gestational age of preterm birth include γ = (β0, β, θ), ρ and infinite dimensional F. Joint estimation of (γ, ρ) and F is complex. Instead, we extend the work of Klaassen and Wellner (1997) by estimating γ, ρ and F in different stages. First, we used generalized estimation equations under working independence to estimate γ. Since gestational ages that are used to fit model (3) are all inside the range of preterm birth (i.e. within 37 weeks), if the model is correctly specified, it is unlikely that the estimated mean gestational age would be outside the range. Hence it is important to check that this condition is satisfied in the estimated model. In the NICHD consecutive pregnancies study data analysis, this scenario did not occur. We then estimated the distribution function F of ε by the empirical distribution of the estimated residuals, ε^ij=Tij-β^0-β^ZijT-θ^I(Yij=1), given by

F^(u)=i:li>0jI(ε^iju)/N,

where γ̂ = (β̂0, β̂, θ̂) are the generalized estimating equation estimators, and N = Σi li is the total number of preterm births. Finally the correlation coefficient ρ was estimated by the normal score rank correlation. Let = {N/(N + 1)} denote the rescaled empirical distribution function. The normal scores rank correlation coefficient ρ̂ is given by

ρ^={1/(N2-q-1)}i:li>1j<kΦ-1{F(ε^ij)}Φ-1{F(ε^ik)}{1/(N-q-1)}i:li>0j=1liΦ-1{F(ε^ij)}2, (7)

where N2 = Σi:li>0 li(li − 1)/2 and q = p2 + 2.

On the basis of the generalized estimating equation asymptotic theory, under the correct specification of the mean function for the gestational age of preterm birth, γ̂ is consistent for the true parameter values γ and, as n1 → ∞, √n1(γ̂γ) converges to a zero normal distribution with sandwich-type variance–covariance, where n1 is the number of women with at least one preterm birth. The consistency and asymptotic normality of have been established (see Houseman et al. (2004) and references therein). The asymptotic normality of the semiparametric estimator ρ̂ is presented in the following theorem. The proof follows that of Klaassen and Wellner (1997) for bivariate data without covariates and is sketched in Appendix A.

Theorem 1

Under regularity conditions and given that (γ̂, ) are consistent and asymptotically normal for (γ, F) and the normal copula model (4) holds for εij, the estimator ρ̂ of ρ is consistent and, as n2 → ∞, √n2(ρ̂ρ) converges weakly to a zero-mean normal distribution with variance τ2, where n2 is the number of subjects with more than one preterm birth.

A derivation for τ2 is supplied in Appendix A. The variance τ2 is a function of (β, θ, F) and the density f of F. An estimator of τ2 can be obtained by inserting (β̂, θ̂, ) for (β, θ, F) and a non-parametric estimate, e.g. a kernel density estimate, for f.

The above estimator of ρ can be adjusted to estimate the three types of correlation coefficient in model (5). Specifically,

ρ^lm={1/(Nlm-q-1)}i:li>1(yij,yik)=(l,m)(m,l)Φ-1{F(ε^ij)}Φ-1{F(ε^ik)}{1/(N-q-1)}i:li>0j=1liΦ-1{F(ε^ij)}2, (8)

where Nlm = Σi:li>1 Σ(yij,yik) = (l,m)∨(m,l) is the number of pairs with the adverse pregnancy outcomes equal to (l, m) or (m, l). The asymptotic property of estimator (8) follows theorem 1.

4. Analysis of National Institute of Child Health and Development consecutive pregnancies study data

We began by fitting the random-effects polychotomous logistic model, where the covariates included history of preterm birth, parity (greater than 0 versus equal to 0), number of fetuses (greater than 1 versus equal to 1), body mass index BMI in the beginning of each pregnancy, maternal age, chronic hypertension (yes versus no) and smoking (yes versus no). Of these covariates, chronic hypertension and smoking were subject specific collected at the beginning of the study, and the rest of the covariates were pregnancy specific collected at each pregnancy. We treated smoking as a subject-specific covariate assessed at entry into the cohort, because we were interested in long-term smoking status rather than short-term status for women who suddenly stopped smoking, possibly because of an abnormal prior pregnancy. In addition to these covariates, some interaction terms were also considered in the model. Particularly, a woman with a history of preterm birth must have positive parity, and hence the effect of history of preterm birth must be estimated through the interaction term of history of preterm birth and parity status (greater than 0 versus 0). In addition, the covariate effects on the pregnancy outcomes of single-fetus pregnancies (singleton) may be different from those of multiple-fetus pregnancies (twins, triplets, etc.). It is not feasible to do a stratified analysis on singleton versus multiple births, since each woman can potentially have both singleton and multiple births in her repeated pregnancies. To deal with this issue, we added the interaction terms between all other covariates and number of fetuses (greater than 1) to the model. The estimates were obtained by maximizing expression (2) where the double integrals were approximated by the product of 10-point one-dimensional Gauss–Hermite quadrature. With the exception of the interactions between a history of preterm birth and parity and between BMI and number of fetuses, no other interaction terms were significant, and the parameter estimates in the final model are listed in Table 3. For PR preterm birth, with the exception of maternal age, all the other covariates were significant (p < 0.05). Among them, number of fetuses (greater than 1 versus equal to 1) had the largest effect on the probability of having PR preterm birth. For a woman with BMI = 24 kg m−2, the odds of a PR preterm birth compared with a term birth was 51 times higher (exp(−0.086 × 24 + 5.996)) with two or more fetuses than with a single fetus. Increasing BMI, chronic hypertension, being a smoker and history of preterm birth all increased the likelihood of having a PR preterm birth, whereas parity was negatively associated with PR preterm birth: the odds of a PR preterm birth for parous women (parity > 0) was 25% (exp(−1.395)) of that for nulliparous women (parity = 0). For PU preterm birth, all the covariates that are listed in Table 3 were significant except the interaction between number of fetuses and BMI. Number of fetuses was also the strongest predictor. Although chronic hypertension was also a significant predictor of PU preterm birth, the odds ratio was 56% (exp(0.79)/ exp(1.37) = 0.56) of that for PR preterm birth, Parity was also negatively associated with PU preterm birth, but its effect on PU preterm birth was much smaller than on PR preterm birth. The standard deviations of the random intercepts for both PR and PU preterm birth were large, implying high correlations of developing the same type of preterm birth in repeated pregnancies. In contrast, the correlation of the two random intercepts was negative and nearly 0, implying that the correlation of these two types of preterm birth was negligible.

Table 3.

Parameter estimates of model (1)

Covariate Results for PR preterm births
Results for PU births
Estimate Standard error 95% confidence interval Estimate Standard error 95% confidence interval
intercept −7.871 0.311 (−8.481, −7.26) −2.205 0.098 (−2.397, −2.014)
mothers’ age 0.004 0.009 (−0.013, 0.021) −0.027 0.003 (−0.033, −0.021)
BMI 0.083 0.006 (0.072, 0.095) −0.008 0.002 (−0.013, −0.003)
parity (> 0) −1.395 0.088 (−1.568, −1.222) −0.139 0.034 (−0.205, −0.073)
number of fetuses (> 1) 5.996 0.682 (4.658, 7.333) 3.813 0.317 (3.192, 4.434)
chronic hypertension (yes) 1.37 0.194 (0.989, 1.75) 0.79 0.107 (0.58, 0.999)
smoker (yes) 0.664 0.185 (0.303, 1.026) 0.928 0.061 (0.809, 1.048)
parity × history of preterm birth 1.578 0.114 (1.354, 1.802) 1.555 0.042 (1.474, 1.637)
number of fetuses × BMI −0.086 0.026 (−0.137, −0.036) −0.004 0.012 (−0.028, 0.02)
σ1 1.801 0.096 (1.613, 1.988)
σ2 0.882 0.042 (0.8, 0.965)
ν −0.087 0.088 (−0.259, 0.085)

In the next step, we estimated models (3) and (5) for repeated gestational ages of preterm births. Model (3) was estimated by generalized estimating equations under a working independence assumption. Maternal age and BMI were not significant and were excluded from model (3). The interaction between history of preterm birth and number of fetuses was negatively correlated with gestational age. The estimates are listed in Table 4. If a woman had a history of preterm birth and had more than one fetus, the mean gestational age was shortened by almost 2 weeks compared with not having these conditions. The gestational age for PU birth on average was 0.35 weeks longer than for PR preterm birth. The normal scores rank correlation equalled (0.47, 0.18, 0.07) for gestational ages of PR preterm birth in both pregnancies, PU preterm birth in both pregnancies and one each type respectively, confirming our initial observation (Fig. 1(b)). This indicates that the correlation between gestational ages of PR preterm birth is much stronger than either of PU preterm birth or of different types of preterm birth.

Table 4.

Parameter estimates of model (5)

Estimate Standard error 95% confidence interval
intercept 34.06 0.105 (33.85, 34.27)
PU birth (yes) 0.348 0.100 (0.152, 0.544)
parity (> 0) 0.682 0.086 (0.413, 0.751)
number of fetuses (> 1) −0.374 0.110 (−0.590, −0.158)
smoker (yes) −0.542 0.134 (−0.805, −0.279)
parity × history of preterm birth (yes) −0.189 0.075 (−0.336, −0.042)
parity × history of preterm birth (yes) × number of fetuses −1.330 0.361 (−2.038, −0.622)
ρ11 0.469 0.179 (0.118, 0.820)
ρ22 0.182 0.032 (0.119, 0.245)
ρ12 0.071 0.205 (−0.331, 0.473)

Individualized estimation of the cumulative incidence function is one of the primary interests. The cumulative incidence of PR preterm birth with respect to parity and history of preterm birth is plotted in Fig. 3(a), where the other covariates that were used to compute the cumulative incidence were set at their median values (maternal age = 27.1 years, BMI = 24.8 kg m−2, number of fetuses = 1, chronic hypertension ≡ no, smoker ≡ no). Fig. 3 shows that the cumulative incidence is highest for nulliparous women and lowest for parous women with no history of preterm birth. The risk of PR preterm birth for women with a history of preterm birth was almost identical to that for nulliparous women. The cumulative incidence of PU preterm birth for the same set of covariate values is plotted in Fig. 3(b). It is highest for women with a history of preterm birth and lowest for parous women with no history of preterm birth.

Fig. 3.

Fig. 3

Individualized cumulative incidence curve and 95% pointwise confidence interval of PR preterm birth of (a) PR and (b) PU preterm birth (——, nulliparous woman; – – –, parous woman with no history of preterm birth; ⋯⋯, parous women with history of preterm birth

The cumulative incidence of any recurrent preterm (PR and PU) birth for parous women with history of preterm birth at the previous pregnancy is plotted in Fig. 4, where the full curve is the marginal cumulative incidence of PR (or PU) preterm birth for women with a history of preterm birth. In Fig. 4, the broken and dotted curves correspond to the conditional cumulative incidence given the PR (or PU) preterm birth before 32 weeks and between 32 and less than 37 weeks of gestational age at the first pregnancy respectively. The covariate values that were used to compute the conditional cumulative incidence were set the same as before except maternal age = 29 years. A few patterns are commonly observed in the four parts of Fig. 4. First, the conditional cumulative incidence is higher than the marginal cumulative incidence if the type of recurrent preterm birth is the same as that in the previous pregnancy. If the preterm births of the two pregnancies are different types (Figs 4(b) and 4(c)), the conditional cumulative incidence is slightly lower than the marginal counterpart. This is due to the negative correlation coefficient estimate of the random intercepts. However, because the correlation coefficient estimate is not significant, the 95% confidence intervals of these cumulative incidences overlap. Second, the risk of recurrent PR (or PU) preterm birth is higher if the gestational age of the previous preterm birth is less than 32 weeks than if the gestational age is between 32 and less than 37 weeks. The 95% pointwise confidence intervals that are plotted in Figs 3 and 4 were obtained from the 2.5-and 97.5-percentiles of 250 bootstrap samples, where the sampling unit is woman participant.

Fig. 4.

Fig. 4

Individualized cumulative incidence curve and 95% confidence interval of any recurrent preterm birth (——, marginal cumulative incidence; – – –, conditional cumulative incidence given the PR or PU preterm birth before 32 weeks of gestational age at the previous pregnancy; ⋯⋯, conditional cumulative incidence given the PU or PR preterm birth between 32 and less than 37 weeks of gestational age at the previous pregnancy): (a), (b) PR preterm birth; (c), (d) PU preterm birth

We checked the modelling assumptions by comparing the observed types and timing of preterm birth versus their predicted counterparts. In model (1), after dichotomizing the continuous covariates age and BMI at their respective medians, the data set was grouped according to the unique combination of the eight binary covariates. In each group, the mean observed proportion of PR (or PU) preterm birth was calculated. Since the incidence of preterm birth is low, the observed mean proportions for groups with fewer than 200 observations were highly variable and were excluded from the comparison. The observed mean versus predicted proportion of PR and PU preterm birth are displayed in Figs 5(a) and 5(b) respectively. Overall, there is high agreement between the observed and predicted proportions, indicating that model (1) fits the data well. We also compared the number of women having (PR, PR), (PR, PU), (PU, PR) and (PU, PU) in their first two pregnancies against the predicted counterparts. The discrepancy is within 10%, indicating that model (1) describes the dependence structure in outcome types across pregnancies well. For model (5), all the covariates are binary. We repeated the same procedure described above and plotted the observed mean gestational age against the predicted gestational age of preterm birth, where groups with fewer than 15 observations were excluded. The scatter plot fluctuating around the 45° line indicates that model (5) fits the distribution of gestational age of preterm birth well. Since the correlation between gestational ages of repeated preterm births was modelled through a semiparametric normal copula model, and the transformed normal rank scores by definition are multivariate normal, model checking is not necessary for the correlation.

Fig. 5.

Fig. 5

Mean observed versus predicted type and timing of preterm birth (for (a) and (b): •, 25000 pregnancies; ⚫, 10000 pregnancies; ●, 15000 pregnancies; ⬤, 20000 pregnancies): (a) mean observed versus predicted proportion of PR preterm birth; (b) mean observed versus predicted proportion of PU preterm birth; (c) mean observed versus predicted gestational age of preterm birth (•, 1000 preterm births; ⚫, 2000 preterm births; ●, 3000 preterm births)

5. Simulation study

We conducted a simulation study to evaluate the performance of the estimation procedure that was presented in Section 3. The parameter settings in the simulation study were chosen to mimic the NICHD study characteristics, including the number of pregnancies, the low baseline rates of PR and PU preterm birth and higher correlation between repeated PR preterm births than repeated PU preterm births. We first generated the correlated event types Yi = (Yi1, …, Yiki) from the polychotomous logistic regression with random-effects model (1), where the number of repeated pregnancies ki was random ranging from 2 to 6 with probability (0.7, 0.15, 0.1, 0.045, 0.005), the fixed effect intercepts (α01, α02) = (−3.81, − 2.42) corresponding to a baseline event rate of 0.02 and 0.08 for PR and PU preterm birth respectively, slopes for one fixed pregnancy-specific binary covariate Xij, (α1, α2) = (−1, − 0.5) and the random intercepts (bi1, bi2) follow a bivariate zero-mean normal distribution with standard deviations σ1 = 2 and σ2 = 1.5, and correlation coefficient ν = 0.5. Gestational age was assumed to follow a normal distribution with mean equal to μij = β0 + β1Xij + θI(Yij = 2) and standard deviation SD = 4. Correspondingly, the gestational age of preterm birth was conditional normal given Tij ≤37 and the correlation of the gestational ages of preterm birth from the same woman was specified through the multivariate normal copula function. Specifically, Tij |Tij ≤ 37, or equivalently Tij |Yij > 0, was generated by Φ−1[uij Φ{(37 − μij)/SD}]SD + μij. where the uijs are multivariate standard normal with common correlation coefficient ρ = 0.3. Since the mean of the conditional normal distribution of the gestational age of preterm birth was no longer equal to μij, a saturated linear model was fitted with the mean function μij=β0+β1Xij+θI(Yij=2)+γXijI(Yij=2). The sample size was set at 20 000 and the simulations were repeated 500 times

Owing to the low preterm birth rate, one of the standard deviations of the random intercepts converged to the boundary value of 0 in 7% of the simulations. This problem is alleviated when the preterm birth rate is set at a higher value (the data are not shown). The simulation results with positive standard deviation of the random intercept are summarized in Table 5. As the Monte Carlo means of all the parameter estimates were close to the true values, there was little bias in the estimation. The Monte Carlo standard errors are close to the square root of the Monte Carlo means of variances which were obtained by inverting the numerically approximated Hessian matrix, and the coverage probabilities for most of the parameters were close to the 95% nominal level. The correlation coefficient ρ in the normal copula model was estimated by the normal score rank correlation coefficient given in equation (7). If the correlation coefficients were estimated by using the untransformed residuals ε̂ij, the estimator would have negative bias with the Monte Carlo mean equal to 0.28.

Table 5.

Results of the simulation study

Parameter Value Monte Carlo mean Monte Carlo standard error Square root of Monte Carlo mean of variance 95% coverage probability
α01 −3.81 −3.81 0.060 0.054 96.5
α02 −2.42 −2.42 0.030 0.032 96.3
α1 −1 −1 0.074 0.073 95.4
α2 −0.5 −0.50 0.043 0.044 95.6
σ1 2 2.00 0.053 0.055 94.8
σ2 1.5 1.50 0.034 0.035 94.6
ν 0.5 0.50 0.030 0.032 95.9
β0
34.68 34.68 0.027 0.028 96.3
β1
0.11 0.11 0.048 0.048 93.7
θ* −0.12 −0.12 0.067 0.069 95.9
γ −0.0070 −0.0070 0.14 0.13 91.9
ρ 0.30 0.30 0.022 0.022 93.2

6. Discussion

We proposed a joint model for individualized risk prediction of adverse outcomes in repeated pregnancies when the outcomes are subject to other competing events. In this paper we develop a novel statistical methodology and apply it to a unique data set to solve an important problem in obstetrics. The estimates appear to be obtained by a two-step procedure, but the estimates so obtained for model (1) would be identical to the estimates that are obtained from the joint likelihood. This is because the parameters in model (1) do not appear in model (3), and the likelihood specified in model (6) is proportional to the likelihood of the joint model. The estimation of parameters in model (3) involves the infinite dimensional distribution function of residual gestational age, and the likelihood-based approach would be computationally challenging. For this reason, we proposed a plug-in-based approach for the estimation of model (3) parameters. Since model (3) does not involve parameters in model (1), as long as the model is correctly specified, according to theorem 1, the parameter estimates are consistent.

In the analysis, we focus on the examination of risk factors for PR preterm birth, where preterm birth for other reasons is a competing event. The results will have important implications for managing women with a past history of pre-eclampsia. There are many other important adverse outcomes that are competing risks, such as preterm birth due to spontaneous labour where medically indicated preterm births would be a competing event. We plan to use this methodology to estimate incidence curves for this type of preterm birth in a subsequent publication in the medical literature.

There is a limited literature on competing risks for correlated time-to-event data. Bandeen-Roche and Liang (2002) considered a frailty model for correlated failure times subject to competing risks and introduced a non-parametric cause-specific hazard ratio association measure for bivariate competing risk data. Shih and Albert (2010) proposed a bivariate model for correlated event times subject to competing risk by incorporating association between times to first events and associations between failure types given the first event times. However, extending their proposed model to multivariate competing risk survival data is complex and yet to be developed. Gorfine and Hsu (2011) proposed a frailty-based proportional hazards competing risks model for multivariate survival data, where cause-specific frailty processes are used to induce the association between cause-specific failure times. However, these approaches do not directly address many scientific questions in the NICHD consecutive pregnancies study. In the approach proposed, we model the dual outcome of the occurrence and type of preterm birth as well as the gestational ages for those pregnancies that are preterm. Specifically, the approach proposed directly

  1. estimates regression relationships between important subject- and pregnancy-specific covariates and the risk of different types of preterm birth (PR and PU preterm birth) and

  2. estimates the effect of these covariates on gestational age for births that are preterm.

Further, the approach proposed allows us to estimate and to interpret directly the correlation in these two components (the occurrence of preterm birth and timing of preterm birth) across consecutive pregnancies. Furthermore, similarly to both Shih and Albert (2010) and Gorfine and Hsu (2011), we can estimate cumulative incidence functions, which are also an important objective.

In both model (1) and model (3), we assume exchangeable correlation between repeated pregnancy outcomes (for example the occurrence of preterm birth type and the timing of gestational age for a preterm birth pregnancy each have an exchangeable correlation structure across repeated pregnancies). This means that the correlation in the outcomes from different pregnancies does not depend on the distance between these pregnancies (e.g. first and second or first and third). To evaluate whether consecutive and non-consecutive preterm births have similar correlation, we collapsed the two types of preterm birth and examined the frequencies of preterm births in the first three pregnancies. 98 women had a preterm birth, followed by a term birth, followed by another preterm birth, and 99 women had a preterm birth, followed by another preterm birth, followed by a term birth. The fact that the two preterm birth patterns had an almost identical frequency indicates that the exchangeable correlation structure for model (1) is reasonable. The correlation of the gestational ages of the 98 non-consecutive preterm births and 99 consecutive preterm births are 0.173 and 0.167 respectively. The similarity in the two correlations indicates that the exchangeability assumption for model (3) is reasonable as well. However, for some adverse birth outcomes, a serial correlation may be more appropriate; in this case, the correlation may be stronger for pregnancies that occurred closer either in time or in order. Serial correlation can be incorporated in various ways including the introduction of a shared random process rather than a shared random effect. We would need to employ Monte Carlo EM or other numerically intensive methods for estimation in this case.

The modelling framework also assumes that the number of pregnancies is not related to the occurrence of adverse outcomes or gestational age at the repeated births. We examined this assumption with some simple data analysis. For example, a simple plot of the proportion of preterm births versus number of pregnancies showed no overall pattern (the data are not shown). In addition, a similar plot of the average patient-specific gestational age versus number of pregnancies also showed no pattern. Accounting for an informative number of pregnancies (those with an adverse outcome are more likely to have a larger number of pregnancies over this fixed 9-year interval) is an area of future research.

Appendix A: Derivation of the asymptotic distribution of ρ̂

Since (γ̂, ) are consistent for (γ, F), the denominator of ρ̂ is asymptotically equivalent to

1N-q-1i:li>0j=1liΦ-1{F(εij)}2

which converges to 1 as Φ−1{F(εij)} is a standard normal random variable. Hence √n2(ρ̂ρ) is asymptotically equivalent to √n2[(1/N2) Σi:li>1 Σj<k Φ−1{(ε̂ij)} Φ−1{ (ε̂ik)} − ρ] which can be rewritten as

n2(ρ^-ρ)n2N2i:li>1j<kΦ-1{F(ε^ij)}Φ-1{F(ε^ik)}-n2ρ=n2N2i:li>1j<k[Φ-1{F(ε^ij)}-Φ-1{F(εij)}]Φ-1{F(εik)}+n2N2i:li>1j<k[Φ-1{F(ε^ik)}-Φ-1{F(εik)}]Φ-1{F(εij)}+n2N2i:li>1j<k[Φ-1{F(εij)}Φ-1{F(εik)}-ρ]+op(1). (9)

The first term in this equation by Taylor’s expansion and integration by parts can be rewritten as

n2N2i:li>1j<k[Φ-1{F(ε^ij)}-Φ-1{F(εij)}]Φ-1{F(εik)}=n2N2ij<k(1ϕ[Φ-1{F(εij)}](F-F)(εij)-f(εij)ϕ[Φ-1{F(εij)}]XijT(γ^-γ))Φ-1{F(εik)}+op(1)=n2{1ϕ[Φ-1{F(u)}](F-F)(u)-f(u)ϕ[Φ-1{F(u)}]XT(γ^-γ)}Φ-1{F(v)}dPn{(u,X),v}+op(1)=n2(Φ-1{F(v)}ϕ[Φ-1{F(u)}](F-F)(u)dPρ,F(u,v)-XTf(u)Φ-1{F(v)}ϕ[Φ-1{F(u)}]dP{(u,X),v}(γ^-γ))+op(1)=n2[(F-F)(u)ρΦ-1{F(u)}dΦ-1{F(u)}-ξT(γ^-γ)]+op(1)=-ρ2n2n2Nij[Φ-1{F(εij)}2-1]-n2ξT(i:li>0XiTXi)-1{iXiT(Yi-Xiγ)}+op(1)1n2i:li>0ψi+op(1),

where Xij={1,ZijT,I(Yij=1)},Xi=(Xi1T,,XiliT)T, Pρ,F is the bivariate normal copula model with the common marginal distribution F and correlation coefficient ρ, Pn{(u, X), v} is the empirical cumulative distribution function with mass points ((εij, Xij), εik) of equal mass 1/N2, P is the converging distribution function of Pn{(u, X), v},

ξ=Xf(u)ϕ[Φ-1{F(u)}]Φ-1{F(v)}dP{(u,X),v}

and

ψi=-ρn22Nj[Φ-1{F(εij)}2-1]-ξT(n2-1i:li>0XiTXi)-1{XiT(Yi-Xiγ)}

The ψis are n2 independent and identically distributed random variables, and thus the first term in equation (9) converges to a normal distribution with mean 0 and variance E(ψ12). The second term in equation (9) has the same asymptotic distribution as the first term, and the third term is asymptotically equivalent to n2-1/2i:li>0φi which is the sum of n2 independent and identically distributed random variables with φi = (n2/N2) Σj<k Φ−1 {F(εij)} Φ−1 {F(εik)}− ρ. It follows that √n2(ρ̂ρ) converges to a normal distribution with mean 0 and variance equal to τ2 = E{(2ψ1 + φ1)2}.

Contributor Information

Joanna H. Shih, National Cancer Institute, Bethesda, USA

Paul S. Albert, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Rockville, USA

Pauline Mendola, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Rockville, USA.

Katherine L. Grantz, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Rockville, USA

References

  1. Ananth CV, Getahun D, Peltier MR, Salihu HM, Vintzileos AM. Recurrence of spontaneous versus medically indicated preterm birth. Am J Obstet Gyn. 2006;195:643–650. doi: 10.1016/j.ajog.2006.05.022. [DOI] [PubMed] [Google Scholar]
  2. Bandeen-Roche K, Liang K. Modelling multivariate failure time associations in the presence of a competing risk. Biometrika. 2002;89:299–314. doi: 10.1093/biomet/asm091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Gorfine M, Hsu L. Frailty-based competing risks model for multivariate survival data. Biometrics. 2011;67:415–426. doi: 10.1111/j.1541-0420.2010.01470.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Houseman EA, Ryan LM, Coull BA. Cholesky residuals for assessing normal errors in a linear model with correlated outcomes. J Am Statist Ass. 2004;99:383–394. [Google Scholar]
  5. Huang Y, Berry K. Semiparametric estimation of marginal mark distribution. Biometrika. 2006;93:895–910. [Google Scholar]
  6. Kalbfleisch J, Prentice R. The Statistical Analysis of Failure Time Data. 2. New York: Wiley; 2002. [Google Scholar]
  7. Klaassen CA, Wellner JA. Efficient estimation in the bivariate normal copula model: normal margins are least favourable. Bernoulli. 1997;3:55–77. [Google Scholar]
  8. Laughon K, Albert P, Leishear K, Mendola P. The NICHD consecutive pregnancies study: recurrent preterm delivery by subtype. Am J Obstet Gyn. 2013 doi: 10.1016/j.ajog.2013.09.014. to be published. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Louis GM, Dukic V, Heagerty PJ, Louis TA, Lynch CD, Ryan LM, Schisterman EF, Trumble A Pregnancy Modeling Working Group. Analysis of repeated pregnancy outcomes. Statist Meth Med Res. 2006;15:103–126. doi: 10.1191/0962280206sm434oa. [DOI] [PubMed] [Google Scholar]
  10. Othus M, Li Y. A gaussian copula model for multivariate survival data. Statist Biosci. 2011;2:154–179. doi: 10.1007/s12561-010-9026-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Shih J, Albert P. Modeling familiar association of ages at onset of disease in the presence of competing risk. Biometrics. 2010;66:1012–1023. doi: 10.1111/j.1541-0420.2009.01372.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Song PXK, Li M, Ying Y. Joint regression analysis of correlated data using gaussian copulas. Biometrics. 2009;65:60–68. doi: 10.1111/j.1541-0420.2008.01058.x. [DOI] [PubMed] [Google Scholar]
  13. Whittemore A. Logistic regression of family data from case-control studies. Biometrika. 1995;82:57–67. [Google Scholar]

RESOURCES