Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Jan 1.
Published in final edited form as: J R Stat Soc Ser C Appl Stat. 2017 Jun 15;67(1):255–273. doi: 10.1111/rssc.12226

Pattern–mixture models with incomplete informative cluster size: Application to a repeated pregnancy study

Ashok Chaurasia 1, Danping Liu 2,, Paul S Albert 3
PMCID: PMC5844500  NIHMSID: NIHMS862012  PMID: 29531406

Abstract

The incomplete informative cluster size problem is motivated by the NICHD Consecutive Pregnancies Study, aiming to study the relationship between pregnancy outcomes and parity. These pregnancy outcomes are potentially associated with the number of births over a woman’s lifetime, resulting in an incomplete informative cluster size (censored at the end of the study window). We develop a pattern mixture model for informative cluster size by treating the lifetime number of births as a latent variable. We compare this approach with a simple alternative method that approximates the pattern mixture model. We show that the latent variable approach possesses good statistical properties for estimating both the mean trajectory of birthweight and the proportion of gestational hypertension with increasing parity.

Keywords: Generalized linear mixed model, Incomplete informative cluster size, Latent variable, Pattern mixture model, Repeated Pregnancy, Sensitivity analysis

1 Introduction

In longitudinal studies, informative cluster size arises when the number of observations per subject is correlated with the outcome of interest. For example, in repeated pregnancy studies, women with adverse pregnancy outcomes, e.g., low birthweight, preterm birth, preeclampsia, etc., may have fewer future pregnancies. The informative cluster size mechanism postulates that the pregnancy outcome and the total number of pregnancies (cluster size) are linked through a underlying process such as a woman’s underlying health or fertility, which is not accounted for with the observed covariates. If not properly accounted for, informative cluster size may lead to biased and misleading inference (Seaman et al., 2014).

Modeling clustered data with informative cluster size is an active area of statistical research. Ryzin and Rai (1987) first proposed to adjust for the cluster size directly as a covariate in the regression model and directly interpreted the cluster size adjusted regression model. Hoffman et al. (2001) proposed a nonparametric within-cluster resampling technique that does not make model assumptions on the cluster size distribution. Williamson et al. (2003) considered a weighted generalized estimating equation (WGEE) approach, with the inverse of cluster size being the weight for each subject. Huang and Leroux (2011) extended the WGEE approach and proposed two doubly weighting schemes that account for both the cluster size and unbalanced nature of observation-level covariates across clusters. Gueorguieva and Agresti (2001) and Dunson et al. (2003) were the first papers to develop a shared random parameter model (SRPM) that jointly models the cluster size and the outcome. Su et al. (2009) and Neuhaus and McCulloch (2011) found that substantial bias could result from using ordinary mixed effects models ignoring the informative cluster size, especially when estimating the covariate effects associated with the random effects. Chen et al. (2011) further studied the robustness of the SRPM under model mis-specification. Seaman et al. (2014) provided extensive reviews of informative cluster size methodology.

To our knowledge, the existing methods require the cluster size to be fully observed. However, this is not true in some applications. For instance, our motivating example is a repeated pregnancy study where the appropriate cluster size is a woman’s lifetime number of births. In this study, we only know that the lifetime number of births is larger than the number of births we observed at the end of the study. The Consecutive Pregnancies Study (CPS) was a retrospective cohort study that included 51,086 women with two or more deliveries after twenty weeks of gestation from 2002 to 2010 at twenty Utah hospitals (Laughon et al., 2014). The study outcomes are either continuous (as in birthweight) or discrete (as in the occurrence of pregnancy abnormality), and our interest lies in estimating the relationship between such outcomes and parity.

In a recent paper, Hinkle et al. (2014) analysed the CPS data and found that birthweight increased with parity with the largest increase occurred between the first and second births. However, they did not specifically account for the potentially informative cluster size. Let w be the number of births observed at the end of the study, and u be the number of births before the start of the study (i.e., onset parity). In order to visualize the informative cluster size, we plot the mean trajectory of birthweight versus parity for different combinations of (u,w). For example, in Figure 1a, the line corresponding to u = 0 and w = 3 is the average birthweight z-score for all the subjects who were nulliparous at the beginning of the study and had 3 births in the study window. We show profiles for w up to 4, which constitute 88.82% of the subjects in the CPS data. The figure suggests that mean trajectories tend to increase with w, which is consistent with an informative clustering mechanism. The observed distribution of w in the CPS data is as follows: w=2 (42%), w=3 (29%), w=4 (18%), w = 5 (7%), w=6 (3%), w=7 (1%), and w=8 (1%). Stratifying by w may be problematic since women may have additional births after the end of the study. Ideally, we may conjecture that the informative mechanism manifests itself through z rather than w. In Figure 1b, we examined the binary outcome hypertensive disorder and observed a similar phenomenon, suggesting the possibility of informative cluster size when studying the relationship between a hypertensive disorder and parity. Suppose that each of the participants were followed during their fertile period, then cluster size would naturally be the number of lifetime births, denoted by z, and we could adopt a pattern mixture model (PMM) framework for modeling the longitudinal trajectory by adjusting for z as a covariate. PMM was first proposed to deal with non-ignorable missing data problems in clustered or longitudinal data (Little, 1995; Little and Wang, 1996). A similar model, with cluster size as a covariate, was used in Ryzin and Rai (1987); however, there are two major differences. First, Ryzin and Rai directly interpreted the cluster size adjusted model, whereas we treat the model as a PMM and propose marginalized inference for the mean outcome trajectory (i.e. averaging outcome profiles over the cluster size). As others have emphasized with PMMs (Little, 1995), inference on the conditional model is not in itself meaningful since it conditions on a future outcome. However, marginalizing the conditional model over cluster size provides a natural way to incorporate informative cluster size. To our knowledge, the marginalized inference for informative cluster size has not been investigated previously. Second, in our study, the cluster size is only observed up to the end of the study window, resulting in a censored cluster size. We propose to model z using a latent variable approach, and then incorporate z as a covariate in the PMM framework. In a different context, Roy and Daniels (2008) adopted a similar idea of latent class PMM to handle nonignorable dropout in longitudinal data. In their setting, the dropout time is fully observed, and the latent class is used to cluster several dropout times into one pattern; in our setting, the cluster size is never fully observed and we model it as a latent variable. Conceptually, our proposed latent variable model is conditional on incomplete z that requires extrapolation from the observed data. Like all latent class models, this model relies on assumptions that are difficult to verify in practice. Therefore, we further consider a simple alternative of using w as the cluster size, resulting in the ordinary PMM. We investigated the two model frameworks in theory, in simulation studies, and in data analyses. We also recommended the strategies to perform sensitivity analyses with regard to the latent class model. In Section 2, we start with the linear mixed model framework and propose the latent variable approach followed by a discussion of parameter estimation. In Section 3, we provide details related to using a linear mixed model based on the observed number of births as an alternative to the latent variable approach. We discuss the relationship between the two aforementioned modeling approaches in Section 4. Section 5 discusses the extension to generalized linear mixed models to handle binary outcomes. In Section 6, we present the results of simulation studies that assess the performance of both approaches and provide a discussion of our findings. We apply the proposed latent variable and alternative approaches to the CPS data and re-examine the relationship between birthweight and parity in Section 7. A discussion follows in Section 8. For all the analysis presented here we used statistical computing language R (http://www.R-project.org).

Figure 1.

Figure 1

Outcomes of (a) Birthweight z-score (Continuous) and (b) Hypertension (Binary) versus parity by the combination of onset parity (u) and observed numbers of births (w) for w = 2, 3, 4 in the CPS data.

2 Latent Variable Approach

In this section, we begin our discussion with a continuous response and later provide extension to the binary case in section 5. Let yi = (yi1, · · · , yiJi)′, where yij denotes the birthweight (continuous response) of i-th subject’s j-th birth in the study window, for i = 1, ..., n, j = 1, ..., Ji. Denote aij to be the parity of the j-th birth in the study window and Iij = I(aij > 0) is an the indicator variable for nulliparity (0 if nulliparous and 1 otherwise), where nulliparous means that it is a woman’s first child. Let Δi denote the unobserved future number of births for the i-th subject after the end of the study window. If we had observed the cluster size for every subject, we could fit a linear mixed model adjusting for zi as follows:

yij=β0(z)+β1(z)Iij+β2(z)aij+β3(z)zi+β4(z)ziIij+β5(z)ziaij+bi(z)+εij(z), (1)

where bi(z)~N(0,σb(z)2) denotes the random intercept and εij(z)~N(0,σ(z)2) is the error term, with b(z) independent of ε(z), and the superscript “(z)” indicates that the parameters correspond to what from hereon will be referred to as the z-model. In equation (1) other predictors can be included, however, for simplicity, we do not include covariates because in our application we use standardized birthweight z-scores as the outcome, which accounted for gestational age and gender of the infant. Other important confounders could be considered in a similar manner in (1). In the CPS study, there is left censoring since the number of births prior to the study window is known, but the birth outcomes are unknown. Implicitly, we are making the assumption that women with the same number of children prior to the study follow the same model irrespective of the unknown birth outcomes.

Underlying model (1) is a PMM formulation where we model the longitudinal trajectories conditional on patterns (zi) and our final target of inference is the marginal means over all patterns. Analogous to the conventional missing data problems, PMM incorporates non-ignorable missingness by conditioning on the missing patterns. Although SRPM might be a more natural model framework, the PMM approximates a SRPM structure (Wu and Bailey, 1989), and is computationally convenient for parameter estimation. Later in this section, we will discuss the details of the marginal mean estimation under the PMM model.

To complete the formulation of model (1), we need to explicitly model the unobserved zi, or equivalently, the future number of births Δi = ziwi. Naturally, Δi should follow a bounded count distribution for which we propose a truncated Poisson distribution, Δi~TruncatedPoisson(0,C)(λi) with support {0, 1, 2, . . . , C}:

Pr(Δi=kwi)=(t=0Ce-λiλitt!)-1e-λiλikk!;i=1,,n; (2)

where

log(λi)=α0+α1wi+xiα2;i=1,,n; (3)

with xi being subject-specific covariate(s). In our application and simulation studies, we explore how different choices of C and distribution of Δi|wi affect the inference of the mean trajectory.

The likelihood for the z-model can be written by integration over the latent variable zi. Let η=(β0(z),β1(z),,β5(z),σb(z)2,σ(z)2,α0,α1) denote the parameters in the z-model. The log-likelihood contribution for subject i is

i(η)logf(yiwi)=log{k=0Cf(yizi)Pr(Δi=kwi)},

where f(yizi=wi+Δi)=-f(yizi,bi)dF(bi) is the multivariate normal density (since yi and bi are jointly normal) as in (1), and Pr(Δi = k|wi) is given by (2). The model parameters η are then estimated by maximizing the observed data log-likelihood i=0ni(η).

As mentioned earlier, the parameter of interest is the mean trajectory of birthweights as a function of parity, E(yij). Consistent with the PMM framework, it is not of interest to interpret the mean birthweight trajectory conditional on zi. Instead, the inference is made on the marginal mean trajectories by averaging over all patterns zi.

It is shown in the appendix (see section S1) that the marginal mean response across all cluster sizes is

E(yij)=β0(z)+β1(z)Iij+β2(z)aij, (4)

where the parameters of interest are

β0(z)=β0(z)+β3(z)(μw+μΔ),β1(z)=β1(z)+β4(z)(μw+μΔ),β2(z)=β2(z)+β5(z)(μw+μΔ), (5)

and

μw=E(wi), (6)
μΔ=E(Δi)=Ewi[(t=0Ce-λiλitt!)-1ke-λiλikk!]. (7)

The parameters β(z)=(β0(z),β1(z),β2(z)) are estimated by replacing all the z-model parameters by their maximum likelihood estimates (MLEs), and replacing all the expectations by their sample means. With β^(z) denoting the estimated β(z), we show in the appendix that

n(β^(z)-β(z))dN3(0,ϒ2), (8)

as n → ∞. The formula of the asymptotic variance is also provided in the appendix.

3 Alternative Modeling Scheme via Observed Number of Births

A simple alternative to the z-model is to treat the observed number of births wi as the cluster size and fit a PMM in terms of wi; we will refer to this model as the w-model. Specifically, we propose a linear mixed model conditional on wi as

yij=β0(w)+β1(w)Iij+β2(w)aij+β3(w)wi+β4(w)Iijwi+β5(w)aijwi+bi(w)+εij(w), (9)

where bi(w)~N(0,σb(w)2) denotes the random intercept and εij(w)~N(0,σ(w)2) denotes the random noise, with b(w) independent of ε(w), and β(w)=(β0(w),,β5(w)). In (9) the superscript “(w)” indicate that the parameters correspond to the w-model. Unlike the z-model, the w-model model can be easily estimated by standard software packages.

Similar to Section 2, the marginal mean response under the w-model is given by

E(yij)=β0(w)+β1(w)Iij+β2(w)aij, (10)

where the parameters of interest are

β0(w)=(β0(w)+β3(w)μw),β1(w)=(β1(w)+β4(w)μw),β2(w)=(β2(w)+β5(w)μw), (11)

with μw = E(wi). The parameters β(w)(β0(w),β1(w),β2(w)) are estimated by replacing the w-model parameters with their respective MLEs. The estimated w-adjusted-regression-coefficients (jointly) are asymptotically normal as follows:

n(β^(w)-β(w))dN3(0,ϒ1), (12)

where ϒ1 is the variance-covariance matrix (see appendix for details). Lastly, we consider the “naïve-model” where the cluster size is ignored. In this case, the model is

yij=β0+β1Iij+β2aij+bi+εij, (13)

where bi~N(0,σb2) denotes the random intercept and εij ~ 𝒩(0, σ2) denotes the random noise, with b independent of ε, and β = (β0, β1, β2)′. Similar to the w-model, the naïve-model can also be easily estimated by standard software packages.

4 Relationships between z-model and w-model

In Sections 2 and 3, we proposed a complex latent variable PMM where the pattern indicator zi is unobserved along with a simpler alternative where patterns are characterized by the observed quantity wi. In this section, we comment on the strengths and weaknesses of the two proposed models, and then explore their relationship by the asymptotic bias calculation.

The model conditioning on z has several advantages over w. First, w is dependent on the study design (i.e., the length of follow-up of the subjects), while z is not. In other words, z is intrinsic to the subject and hence a better measure of fertility. Thus, the w-model leads to an unattractive feature that inference for the marginal mean may be sensitive to the length of the study window. Second, if we had complete follow-up (fertility life course), then z would be observed, and it would be natural to consider a PMM conditional on z. Third, as we will show later in this section, the z-model is a more general class of models than the w-model. Asymptotic bias calculations suggest that the w-model can only approximate the z-model when w is not predictive of the future number of births. Otherwise, the w-model can be severely biased.

We acknowledge that since z is a model-based extrapolation of w, it relies on assumptions that are unverifiable without knowledge of the complete data or specification of the population z-model. This is similar to the problem of modeling non-ignorable missingness, where researchers assess the sensitivity of the inference under different specifications of the model assumptions. In this paper, we conducted sensitivity analyses in two ways. First, we fit the proposed z-model using different choices of the truncation value C. Second, we conduct simulations to show that the z-model is relatively robust to minor misspecification of the distribution of cluster size distribution; specifically, we consider Δi|wi is simulated under a Poisson, a truncated negative binomial distribution, and a shared random parameter model. We present these sensitivity analyses in Sections 6.2 and 7.2.

Now, we perform the asymptotic bias calculation to investigate the performance of the w-model and naïve-model, when z-model is true. Suppose for any i, the response yi = (yi1, yi2, . . . , yiJi)′ follows the z-model, ui denotes the onset parity (number of live-births prior to entry into the study), and vi denote the number of births observed within the study window, wi = ui + vi. Furthermore, let μy(z) and y(z) denote the mean and variance of yi under the z-model, respectively. Now, suppose we fit the misspecified w-model (with mean μy(w) and variance y(w)) to yi and denote this likelihood as Lw(yi). Our interest lies in estimating β (denoted as β̃), the marginal mean parameter, that maximizes the expected log-likelihood given by E(u,v,x) [(Ez|wlog Lw(yi))], which is proportional to

E(u,v,x)[klog(y(w))+trace((y(w))-1y(z))+(μy(z)-μy(w))(y(w))-1(μy(z)-μy(w))]. (14)

The resulting β̃ is interpreted as the estimator that maximizes the Kullback- Leibler distance between the misspecified w-model and the true z-model (White, 1982; Heagerty and Kurland, 2001). In a similar manner, we can estimate β̆ that maximizes the expected log-likelihood for the naïve-model (denoted as L) as:

β=argmaxβE(u,v)[Ezw(logL(yi))]. (15)

Since there are no closed form solutions for maximizing the functions in (14) and (15), we maximize these expressions using numerical techniques. Table 1 shows values of the asymptotic bias of the marginal regression coefficients (i.e. β̆*β* for the naïve-model, and β̃*β* for the w-model) for several combinations of α = (α0, α1, α2)′ in equation (3) of the z-model. Table 1 firstly illustrates that the naïve-model is substantially biased across all the settings. Furthermore, the w-model corrects for a large amount of the bias, but still may not be satisfactory in some configurations of α. For example, when w and the subject specific binary covariate X are strong predictors of future births (Δ), as in the setting α = (−3.71, 0.8, 0.5)′, with E(Δ) = 0.64, the bias of w-model is still substantial. In configurations where w is either independent or weakly associated with Δ, the bias of the w-model reduces to negligible values. This suggest that under certain conditions there is an equivalence between the marginal means of the z-model and w-model. Specifically, when E(z|w) is approximately linear in w, i.e. E(zi|wi) ≈ γ0 + γ1 wi, we have from (1) that

Table 1.

Asymptotic bias of marginal regression coefficients of the naïve- and w- model with respect to the true z-model, over a reasonably wide range of values (with respect to rate of future number of births) for α in model (3) using empirical distribution of w with E(w) = 3.041 as observed in CPS.

α0 α1 α2 E(Δ) E(z|w) for E(z) True value naïve-model bias w-model bias
w = 2 w = 3 w = 4
β0
β1
β0
β0
β1
β2
β0
β1
β2

−3.71 0.8 0.5 0.64 2.16 3.36 4.80 3.68 1.47 1.37 0.38 −0.02 −0.58 0.43 −0.06 0.18 −0.16
−2.90 0.8 0.5 1.25 2.36 3.80 5.79 4.29 1.72 1.43 0.41 0.07 −0.62 0.41 −0.03 0.09 −0.10
−2.40 0.8 0.5 1.84 2.60 4.33 6.92 4.88 1.95 1.49 0.44 0.12 −0.66 0.42 0.01 0.00 −0.03
−2.19 0.8 0.5 2.13 2.73 4.63 7.50 5.17 2.07 1.52 0.46 0.13 −0.68 0.42 0.02 −0.04 0.01

−3.71 0.8 0 0.50 2.12 3.27 4.60 3.54 1.42 1.35 0.38 −0.05 −0.58 0.44 −0.06 0.19 −0.17
−2.90 0.8 0 1.00 2.27 3.61 5.35 4.04 1.62 1.40 0.40 0.04 −0.61 0.42 −0.05 0.14 −0.14
−2.40 0.8 0 1.50 2.45 4.00 6.23 4.54 1.82 1.45 0.43 0.09 −0.65 0.42 −0.01 0.05 −0.07
−2.02 0.8 0 2.00 2.66 4.47 7.23 5.04 2.02 1.50 0.45 0.13 −0.68 0.42 0.02 −0.03 −0.01

−2.05 0.4 0.5 0.66 2.38 3.57 4.85 3.70 1.48 1.37 0.39 −0.08 −0.54 0.43 −0.02 0.06 −0.06
−1.35 0.4 0.5 1.31 2.77 4.14 5.70 4.35 1.74 1.43 0.42 −0.01 −0.55 0.40 −0.02 0.07 −0.06
−0.93 0.4 0.5 1.94 3.16 4.74 6.57 4.98 1.99 1.50 0.45 0.04 −0.57 0.39 −0.01 0.04 −0.04
−0.62 0.4 0.5 2.55 3.58 5.35 7.42 5.59 2.24 1.56 0.48 0.06 −0.58 0.39 0.00 0.00 −0.01

−2.05 0.4 0 0.50 2.29 3.43 4.64 3.54 1.42 1.35 0.38 −0.11 −0.54 0.44 −0.02 0.05 −0.05
−1.35 0.4 0 1.00 2.58 3.86 5.29 4.04 1.62 1.40 0.40 −0.05 −0.55 0.42 −0.03 0.08 −0.07
−0.93 0.4 0 1.50 2.88 4.31 5.96 4.54 1.82 1.45 0.43 0.00 −0.56 0.40 −0.02 0.07 −0.07
−0.62 0.4 0 2.00 3.20 4.78 6.65 5.04 2.02 1.50 0.45 0.04 −0.58 0.40 −0.01 0.04 −0.05

−0.69 0 0.5 0.66 2.66 3.66 4.66 3.70 1.48 1.37 0.39 −0.10 −0.43 0.36 0.00 0.00 0.00
0.00 0 0.5 1.32 3.32 4.32 5.32 4.36 1.75 1.44 0.42 −0.06 −0.39 0.31 0.00 0.00 0.00
0.22 0 0.5 1.65 3.65 4.65 5.65 4.70 1.88 1.47 0.43 −0.04 −0.37 0.29 0.00 0.00 0.00
0.56 0 0.5 2.31 4.31 5.31 6.31 5.35 2.14 1.53 0.47 −0.02 −0.35 0.26 0.00 0.00 0.00

−0.69 0 0 0.50 2.50 3.50 4.50 3.54 1.42 1.35 0.38 −0.11 −0.45 0.38 0.00 0.00 0.00
0.00 0 0 1.00 3.00 4.00 5.00 4.04 1.62 1.40 0.40 −0.08 −0.41 0.33 0.00 0.00 0.00
0.41 0 0 1.50 3.50 4.50 5.50 4.54 1.82 1.45 0.43 −0.05 −0.38 0.30 0.00 0.00 0.00
0.69 0 0 2.00 4.00 5.00 6.00 5.04 2.02 1.50 0.45 −0.04 −0.36 0.28 0.00 0.00 0.00

➀ – ➃ represent α-configurations that are replicated in the simulation study with their results shown in Table 2.

E(yijwi)(β0(z)+β3(z)γ0)+(β1(z)+β4(z)γ0)Iij+(β2(z)+β5(z)γ0)aij+β3(z)γ1wi+β4(z)γ1Iijwi+β5(z)γ1aijwi, (16)

where β0(z)+β3(z)γ0β0(w),β1(z)+β4(z)γ0β1(w),β2(z)+β5(z)γ0β2(w),β3(z)γ1β3(w),β4(z)γ1β4(w), and β5(z)γ1β5(w). Therefore, the w-model is compatible with the z-model when E(z|w) is approximately linear in w.

To examine the linearity of E(zi|wi), we present plots for E(z|w) versus w in Figure 2, for selected configuration of (α0, α1, α2) shown in Table 1. The graphs in Figure 2 illustrate that there are settings where E(zi|wi) is not well approximated by a linear function in w. These correspond to configurations in Table 1 where the w-model shows substantial bias.

Figure 2.

Figure 2

E(z|w) based on model (2) over a select configurations of α0, α1, α2 (from Table 1) in model (3).

In general, when w is not predictive of the future number of births (Δ), the linear approximation holds exactly, and the w-model can be seen as a special case of the z-model in terms of having the same marginal mean structure. In the case when w is strongly predictive of Δ, the linear approximation weakens, leading to a biased w-model. In the next section we explore the finite sample properties of the aforementioned models through a simulation study.

5 Extension to the generalized linear mixed model (GLMM) for binary outcomes

Thus far, the model formulation provided in Section 2 and 3 considers a continuous outcome. However, the proposed modeling approaches can be extended to discrete outcomes. Here, we provide a extension to the case with a binary outcome. In our application, which will be discussed later in Section 7, hypertensive disorder is the binary outcome of interest.

For the generalized linear mixed model under the z-model-framework, the log-likelihood contribution for subject i is

i(η)logPr(yiwi)=log{k=0C(-Pr(yibi,zi)1σbϕ(biσb))Pr(Δi=kwi)}, (17)

where, for a binary outcome, we have

Pr(yibi,zi)=j=1Jiμijyij(1-μij)1-yij,
  • μij=expit(xij(z)β(z)+bi(z)) with expit(t) (1 + et)−1, and xij(z) is the design vector of the fixed effects,

  • zi = wii, and

  • ϕ(·) is the standard normal density.

The integral in equation (17) can be evaluated using Gaussian quadrature. Note that in our simulation and data example, the design vector xij(z) has the same form as that in equation (1), including piecewise linear terms of parity, unobserved cluster size zi, and their interactions. Our interest lies in inference of the population average trajectory of the response, i.e. the probability of hypertensive disorder as a function of parity. This probability will be computed in two steps in our subsequent simulations. First, for given values of zi, wi, we compute Pr (yij = 1|zi, wi) by numerically integrating out the random effect bi. Second, we marginalize the probability Pr (yij = 1|zi, wi) by summing it over the joint discrete probability distribution of Δi and wi, given by Pr(Δi = k|wi = l) × Pr(wi = l).

Analogous to Section 3, the w-model with a binary outcome can be estimated using standard GLMM software. Lastly, regarding variance estimation, though the asymptotic derivation is theoretically possible, there is no closed form expression because it requires derivatives of complex integrals. Thus, we use the bootstrap to estimate the variance of the parameters of interest.

6 Simulation Studies

6.1 Continuous outcome

In this section, we conduct simulation studies to assess the finite sample performance of the proposed methods for a continuous outcome. We generate data under the following settings where the z-models is the true model with (i) Δ strongly associated with w and a subject specific binary covariate X, (ii) Δ weakly associated with w and X, (iii) Δ independent of w but not X, and (iv) Δ independent of w and X. In each setting we use sample sizes of n = 2000, 500, 125 subjects and simulate the data with continuous outcome as follows:

  1. Subject specific components: Onset parity values (ui’s) and number of in-window births (vi’s) are created by sampling (with replacement) from CPS with the total number of births observed by the end of the study, wi = ui + vi.

  2. The longitudinal parity value for each subject is given by aij = ui, ui + 1, …, (wi − 1).

  3. For scenario (i)–(iv), we generate the response yij from model (1) with β0(z)=0,β1(z)=1,β2(z)=0.2,β3(z)=0.4,β4(z)=0.1,β5(z)=0.05,σb(z)2=σ(z)2=0.25. Then, we generate zi via model (3) where (α0, α1, α2) = (−3.71, 0.8, 0.5) for scenario (i), (α0, α1, α2) = (−2.05, 0.4, 0.5) for scenario (ii), (α0, α1, α2) = (−0.69, 0, 0.5) for scenario (iii), and (α0, α1, α2) = (−0.69, 0, 0) for scenario (iv).

  4. Then, Δi is simulated from a truncated Poisson distribution as a multinomial distribution with cell probabilities given by (2) for k = 0 to 8.

In each of the aforementioned settings, we fit the following three models: (a) the naïve-model, (b) the w-model, (c) the z-model. Additionally, we fit the "ideal" model which assumes that zi is observed. This model is not feasible in practice because zi is never observed; however, it serves as the benchmark for comparative purposes. In each setting, β^(z) and β^(w) were computed using equations (5) and (11), respectively, with the parameters being replaced by their MLEs. The simulation was repeated 1000 times. It should be noted that the z-model is computationally intensive, and could yield boundary solutions if the initial values are not carefully chosen in situations with a complex latent model structure. In our simulations, when we used the true parameter values or all zeros as the initial values, the simulation results were identical; however, if we started from randomly generated initial values, boundary solutions may occur with α1 → ∞, especially when the initial value of α1 is a large negative number. Therefore, in practice, it is important to start with multiple initial values to ensure the global maximum likelihood is achieved.

The simulation results for marginal regression coefficients for n = 2000 are shown in Table 2. The results for n = 500, 125 are provided in the web-based supplementary material section S3 (see Tables S1 and S2).

Table 2.

Simulation results where z-model is the true model for n = 2000 under varying degrees of association between Δ and w (and X) as determined by settings (i)–(iv) with C = 8 for Δ ~ Truncated Poisson.

α configuration ➀: α = (−3.71, 0.8, 0.5)′ ➁: α = (−2.05, 0.4, 0.5) ′ ➂: α = (−0.69, 0, 0.5) ′ ➃: α = (−0.69, 0, 0) ′

Δ assoc. with w (and X) Strong Assoc. Weak Assoc. Ind. of w only Ind. of w and X

Expected future births E (Δ) = 0.64 E (Δ) = 0.66 E (Δ) = 0.66 E (Δ) = 0.50

Marginal parameter
β0
β1
β2
β0
β1
β2
β0
β1
β2
β0
β1
β2

True value 1.470 1.368 0.384 1.481 1.370 0.385 1.481 1.370 0.385 1.417 1.354 0.377

naïve-model PE 1.453 0.785 0.809 1.398 0.832 0.812 1.383 0.940 0.746 1.303 0.908 0.757
SD 0.037 0.037 0.027 0.030 0.037 0.027 0.026 0.033 0.021 0.025 0.033 0.021
SE 0.035 0.029 0.015 0.029 0.029 0.014 0.026 0.028 0.013 0.025 0.028 0.013
Bias −0.018 −0.583 0.425 −0.083 −0.538 0.427 −0.098 −0.430 0.361 −0.114 −0.446 0.380
Coverage 0.899 0 0 0.196 0 0 0.035 0 0 0.010 0 0

w-model PE 1.411 1.547 0.222 1.460 1.433 0.329 1.483 1.371 0.384 1.418 1.354 0.376
SD 0.039 0.048 0.022 0.038 0.048 0.023 0.037 0.046 0.020 0.037 0.046 0.019
SE 0.038 0.045 0.022 0.037 0.046 0.023 0.036 0.044 0.020 0.036 0.044 0.020
Bias −0.060 0.179 −0.161 −0.021 0.063 −0.056 0.001 0 −0.001 0.001 0 −0.001
Coverage 0.640 0.028 0 0.914 0.716 0.294 0.954 0.952 0.955 0.950 0.951 0.956

z-model PE 1.471 1.367 0.383 1.482 1.371 0.384 1.482 1.371 0.385 1.417 1.354 0.377
SD 0.043 0.047 0.017 0.037 0.045 0.020 0.032 0.041 0.019 0.034 0.043 0.019
SE 0.042 0.044 0.018 0.037 0.042 0.020 0.032 0.039 0.020 0.033 0.041 0.020
Bias 0.001 −0.001 −0.001 0.001 0 −0.001 0 0.001 0 0 0 0
Coverage 0.946 0.932 0.955 0.951 0.932 0.946 0.950 0.948 0.951 0.952 0.944 0.949

ideal (z-known) model PE 1.472 1.368 0.383 1.481 1.371 0.385 1.482 1.370 0.385 1.417 1.354 0.377
SD 0.040 0.042 0.015 0.033 0.037 0.016 0.029 0.034 0.016 0.029 0.035 0.017
SE 0.039 0.040 0.016 0.033 0.036 0.016 0.028 0.033 0.017 0.028 0.034 0.017
Bias 0.001 0 0 0 0 0 0.001 0 0 0.001 0 0
Coverage 0.943 0.950 0.957 0.942 0.950 0.950 0.942 0.942 0.949 0.945 0.946 0.955

PE: Point Estimate, SD: Standard Deviation, SE: Standard Error, Bias = PE – True value.

In all scenarios, the naïve-model is severely biased with poor coverage rates. The ideal model (with known z) and z-model show negligible bias with nominal coverage rates. Compared to the ideal model, the z-model has about 15–30% efficiency loss, due to the estimation of additional parameters of the latent class model. The w-model performs well only in scenarios (iii) and (iv), where w is not predictive of Δ. When the association between w and Δ gets stronger, the w-model becomes more biased, as shown in scenarios (i) and (ii). Overall, for estimating the marginal mean trajectories, the naïve-model is biased, the z-model is unbiased, and the w-model can be biased in practical scenarios where expected future number of births is as small as 0.50 (see row 5 in Table 1). The results for n = 500, 125 yielded similar findings with larger (by a factor of approximately) n) standard deviation (SD) and standard error (SE) values when compared to respective SD and SE values of 2.

6.2 Misspecified informative cluster size model

For the continuous outcome, we conducted additional simulation studies to evaluate the performance of the z-model when it is misspecified. Specifically, we examine sensitivity of the z-model (under the truncated Poisson distribution) when informative cluster size is simulated under a (i) Poisson distribution, (ii) truncated negative binomial distribution, and (iii) shared random parameter model. These results are provided in the web-based supporting material (see section S4).

In the case of Poisson, simulation results suggest that both the naïve- and w- models yield biased results with less than nominal coverage rates when there is an association between Δ and w. In comparison, the z-model yields less biased estimates with better coverage rates. Specifically, we found that the z-model at low truncation values (C) yielded less than nominal coverage rates (for example, see results for Strong assoc. and C = 8, 16 in Table S3, p. 11 of web-based supporting material). However, when using C = 32, the z-model corrects for most of the bias and provides nominal coverage rates. This simulation illustrates that our proposed z-model is flexible in incorporating different truncation values (C) in the misspecified truncated-Poisson model, and has better coverage rates than (or at the least comparable to) those of the naïve- w- models for all degrees of association between Δ and w. Hence, this simulation suggests that even under severe model misspecfication the z-model performs better than the naïve- w- models. In the case of truncated negative binomial, the bias of z-model is smaller than that of w-model with the coverage of z-model higher than nominal level, whereas the w-model gives less than nominal levels. In other words, though the z-model estimates are not the best, they are (10 times less) biased that those of the w-model. In the case of SRPM, the z-model practically yields unbiased estimates with nominal coverage rate. Overall, these set of sensitivity analyses suggest that our proposed PMM z-model is fairly robust to misspecification of informative cluster size since it provides less biased estimates and better coverage values (or at the least comparable to) those from the w- model at different degrees of association between Δ and w.

6.3 Binary outcome

In this section we generate binary outcome to illustrate the properties of the proposed methods in GLMM. The components ui’s, vi’s, and aij’s are created exactly as in the continuous outcome simulation. To generate the binary response yij of model (17) with mean μij, we fix β0(z)=-0.4,β1(z)=-2.31,β2(z)=0.19,β3(z)=0.32,β4(z)=-0.08,β5(z)=0.09 To generate zi under a setting of strong association between Δ and w, we fix α0 = −3.71, α1 = 0.8. We generate data of sample size of 2000 subjects. We use 200 bootstrap samples to calculate the standard errors of the marginal probability trajectory. To this simulated data, we fit the naïve-, w-, z-, ideal (z-known) models. The simulation was repeated 1000 times.

In Table 3, we show the simulation summary for the estimated marginal probabilities from the w-, z-, and ideal (z-known) models at parity 0 to 7. The lower bias and nominal coverage rates of the z-model suggest the reliability of this model in estimating the true marginal probability trajectory. The similarity of the results between the z- and ideal model suggest inference from the proposed z-model is similar to the ideal model. The naïve model is severely biased as expected. Although the w-model is not as biased as the naïve model, it has lower than nominal coverage at parity 1 and 2, and slightly higher coverage at parity ≥ 4. This simulation demonstrates the importance of properly accounting for informative cluster size with the binary outcome.

Table 3.

Marginal means from (i) naïve-model, (ii) w-model, (iii) z-model and (iv) ideal (z-known) model for proportion of simulated cases with hypertension.

Fitted models & measures Marginal probability at parity
0 1 2 3 4 5 6 7

True value 0.659 0.222 0.313 0.412 0.513 0.608 0.694 0.767

naïve-model PE 0.593 0.161 0.368 0.637 0.841 0.941 0.980 0.993
Bias −0.067 −0.061 0.055 0.225 0.328 0.333 0.286 0.227
SD 0.015 0.008 0.009 0.014 0.013 0.008 0.004 0.001
SE 0.015 0.008 0.010 0.014 0.013 0.008 0.004 0.002
Coverage 0.005 0 0 0 0 0 0 0

w-model PE 0.653 0.208 0.302 0.404 0.504 0.594 0.672 0.737
Bias −0.007 −0.013 −0.011 −0.008 −0.009 −0.014 −0.022 −0.030
SD 0.023 0.010 0.009 0.013 0.021 0.027 0.032 0.033
SE 0.022 0.011 0.010 0.020 0.033 0.044 0.052 0.057
Coverage 0.933 0.761 0.811 0.969 0.984 0.986 0.989 0.988

z-model PE 0.667 0.221 0.313 0.417 0.523 0.621 0.705 0.773
Bias 0.008 −0.001 0.001 0.005 0.011 0.013 0.011 0.006
SD 0.012 0.006 0.006 0.017 0.032 0.044 0.053 0.058
SE 0.015 0.009 0.008 0.022 0.040 0.055 0.066 0.071
Coverage 0.936 0.929 0.940 0.963 0.960 0.953 0.948 0.934

ideal (z-known) model PE 0.658 0.219 0.313 0.418 0.523 0.621 0.706 0.775
Bias −0.001 −0.002 0.001 0.006 0.010 0.013 0.012 0.009
SD 0.019 0.011 0.009 0.018 0.030 0.040 0.046 0.047
SE 0.020 0.011 0.010 0.020 0.034 0.046 0.053 0.056
Coverage 0.962 0.944 0.957 0.958 0.955 0.947 0.939 0.931

PE: Point Estimate, Bias = PE – True value, SD: Standard Deviation, SE: Standard Error.

7 Application: CPS Data

For the application, we revisit the CPS data with focus on two outcome variables as a function of parity: (i) (continuous) birthweight z-score, and (ii) (binary) hypertensive disorder, while addressing the impact of incomplete informative cluster size. To account for informative cluster size, we fit the proposed z- and w- models to the CPS data. For comparison, we also fit the naïve-model which ignores the cluster size. 20

7.1 Continuous outcome: Birthweight

In this analysis, yij denotes the birthweight at parity aij . As in the simulation studies, we found the estimation for the z-model sensitive to the choice of initial values since several attempts led to boundary solutions. To mitigate this issue, we performed an exhaustive grid search to pick initial values. This resulted in a subset of initial values that gave similar (maximum) likelihood values. In using each initial value from the subset, we found similar estimates for the parameters and the likelihood.

In Figure 3, we show the estimated marginal means of the three fitted models, and as an illustration, we also show their respective confidence intervals at a parity value of 5. Although there are apparent differences in the point estimates of the marginal means of the three models, the overlap between their confidence intervals suggest similar inferences.

Figure 3.

Figure 3

Marginal means for Birthweight from the naïve-, w-, and z- models for the CPS data with corresponding confidence intervals at parity=5.

7.2 Continuous outcome: Sensitivity Analyses

In our analysis we fitted the z-model with different truncation values – C = 6, 8, 10, 12, and 12 with Age – to examine the sensitivity of the parameter estimates. The z-model with C = 12 and Age included the dichotomous variable Age (≥ 38 at the end of study window) in the (3) to examine the effect of subject’s age at the end of study window on their rate of births after the study window. The marginal regression coefficient estimates in Table S7 indicate that in each of the fitted models, the coefficients β1 and β2 are significantly different from zero (at the 5% significance level), thus suggesting that the birthweight is positively associated with parity. For the z-model, although there are apparent differences in the point estimates, the overlap between their confidence intervals suggest similar inferences. Since the marginal means are of primary interest, in Table 4 we show the point estimates and standard errors (SE) for the marginal means for C = 6, 8, 10, 12, and 12 with Age. These results suggest that the marginal means trajectories from different C-values are similar since each of the trajectories are within the bounds of variation.

Table 4.

Point estimate and standard errors for the marginal means corresponding to the z-model for C = 6, 8, 10, 12, 12 with Age when applied to the CPS data for continuous outcome Birthweight.

Parity Marginal Means: Estimate (SE)
C = 6 C = 8 C = 10 C = 12 C = 12 with Age

0 0.021 (0.008) 0.018 (0.007) 0.019 (0.007) 0.019 (0.007) 0.020 (0.007)
1 0.219 (0.006) 0.221 (0.005) 0.222 (0.005) 0.223 (0.006) 0.225 (0.006)
2 0.274 (0.005) 0.274 (0.005) 0.273 (0.005) 0.272 (0.005) 0.271 (0.005)
3 0.330 (0.008) 0.327 (0.007) 0.323 (0.008) 0.320 (0.009) 0.318 (0.009)
4 0.385 (0.013) 0.380 (0.012) 0.373 (0.012) 0.369 (0.013) 0.365 (0.013)
5 0.440 (0.018) 0.432 (0.016) 0.423 (0.017) 0.418 (0.018) 0.412 (0.018)
6 0.496 (0.023) 0.485 (0.020) 0.473 (0.022) 0.466 (0.023) 0.459 (0.023)
7 0.551 (0.028) 0.538 (0.025) 0.523 (0.026) 0.515 (0.028) 0.506 (0.028)
8 0.607 (0.034) 0.591 (0.029) 0.573 (0.031) 0.563 (0.033) 0.553 (0.034)

7.3 Binary outcome: Hypertensive disorder

In this analysis, yij denotes the status of hypertensive disorder (binary with 1 indicative of the condition) at parity aij . We used Gaussian quadrature with 10 quadrature points to evaluate the likelihood. With the identity link (for continuous outcomes) the piecewise linear relationship specified in the mixed model is preserved for the population averaged trajectory; the linearity is not preserved with the logit link function (for binary outcomes). However, we can evaluate the trajectory by computing marginal probabilities of hypertensive disorder. In Figure 4, we show the estimated marginal probabilities of the three fitted models and their respective confidence intervals at parity value of 5. Unlike the birthweight analysis where the inferences from all three models were similar, the results for the hypertensive disorder analysis were different. For the latter, both cluster size adjusted models show an increasing trajectory after parity of 2, whereas the naïve model shows no apparent pattern. The hypertensive disorder analysis demonstrated the importance of properly accounting for informative cluster size.

Figure 4.

Figure 4

probabilities for Hypertension from the naïve-, w-, and z- models for the CPS data with corresponding confidence intervals at parity=5.

8 Discussion

In this article we address the issue of modeling longitudinal data with informative cluster size, where the cluster size is censored by the study window. We propose a modeling scheme based on a latent variable approach (z-model), and also provide a simple alternative modeling scheme (w-model).

Latent class models are often used to characterize important individual-specific features of a biological process (e.g., model-based clustering, latent growth trajectory models, etc.). Here, we are using this construct to model the mixture distribution of the birthweight outcomes across repeated pregnancies. This idea has been proposed in other settings. For example, Roy and Daniels (2008) proposed a latent class pattern mixture model, where the latent variable is used to cluster many dropout times. We recognize that since z is a model-based extrapolation of w, the model relies on non-verifiable assumptions related to this extrapolation. However, with additional follow-up on a subset of subjects, it is possible to verify this extrapolation model. Unfortunately, such follow-up information was not available in the CPS data. The sensitivity analyses suggest that the proposed z-model is robust to minor departures in the key distributional assumptions such as the distribution of Δi. It is important to examine the robustness of our inferences relative to scientifically sensible alternative distributions. We found that the z-model inferences were robust to different truncation values (C) in the CPS analyses. One would expect more bias if the Δi distribution was markedly different from the one specified; for example, a uniform distribution on [0, C] rather than a truncated Poisson. However, a uniform distribution for the future number of births is very unlikely to be the case in reproduction epidemiology studies. In other applications, we recommend that the distribution of Δi be specified as much as possible on the best available scientific information. Sensitivity analyses should then be conducted with respect to reasonable departures from this assumption.

The mixed effects model proposed for the conditional distribution of yi given zi (expression (1)) and yi given wi (expression (9)) introduce only a single random intercept for characterizing the between subject variation. This was reasonable for the CPS analyses where the heterogeneity was well described by a single random intercept. In other applications, a more complex random effects structure could be introduced.

This paper proposed a PMM framework to adjust for informative cluster size. As recognized with the missing data literature, the PMM provides a 24 good approximation to the SRPM (Wu and Bailey, 1989). In our simulation studies, we confirm the validity of this approximation in our setting. Directly estimating the SRPM is an alternative approach, but more complex to implement, which we will leave for future exploration.

Supplementary Material

Acknowledgments

This research was supported by the Intramural Research Programs of the National Institute of Health (NIH), Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), and National Cancer Institute. This work utilized the computational resources of the NIH HPC Biowulf cluster. (http://hpc.nih.gov). A special thanks to Drs. Stefanie Hinkle and Katherine Laughon Grantz for the helpful discussions. The authors wish to thank the editor, associate editor, and referee for their insightful comments and suggestions that helped improve the presentation of the paper.

References

  1. Chen Z, Zhang B, Albert PS. A joint modeling approach to data with informative cluster size: Robustness to the cluster size model. Statistics in medicine. 2011;30:1825–1836. doi: 10.1002/sim.4239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Dunson DB, Chen Z, Harry J. A bayesian approach for joint modeling of cluster size and subunit-specific outcomes. Biometrics. 2003;59:521–530. doi: 10.1111/1541-0420.00062. [DOI] [PubMed] [Google Scholar]
  3. Gueorguieva RV, Agresti A. A correlated probit model for joint modeling of clustered binary and continuous responses. Journal of the American Statistical Association. 2001;96:1102–1112. [Google Scholar]
  4. Heagerty PJ, Kurland BF. Misspecified maximum likelihood estimates and generalised linear mixed models. Biometrika. 2001;88:973–985. [Google Scholar]
  5. Hinkle SN, Albert PS, Mendola P, Sjaarda LA, Yeung E, Boghossian NS, Laughon SK. The association between parity and birthweight in a longitudinal consecutive pregnancy cohort. Paediatric and perinatal epidemiology. 2014 doi: 10.1111/ppe.12099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Hoffman EB, Sen PK, Weinberg CR. Within-cluster resampling. Biometrika. 2001;88:1121–1134. [Google Scholar]
  7. Huang Y, Leroux B. Informative cluster sizes for subcluster-level covariates and weighted generalized estimating equations. Biometrics. 2011;67:843–851. doi: 10.1111/j.1541-0420.2010.01542.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Laughon SK, Albert PS, Leishear K, Mendola P. The nichd consecutive pregnancies study: recurrent preterm delivery by sub-type. American journal of obstetrics and gynecology. 2014;210:131–e1. doi: 10.1016/j.ajog.2013.09.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Little RJ. Modeling the drop-out mechanism in repeated-measures studies. Journal of the American Statistical Association. 1995;90:1112–1121. [Google Scholar]
  10. Little RJ, Wang Y. Pattern-mixture models for multivariate incomplete data with covariates. Biometrics. 1996:98–111. [PubMed] [Google Scholar]
  11. Neuhaus JM, McCulloch CE. Estimation of covariate effects in generalized linear mixed models with informative cluster sizes. Biometrika. 2011:asq066. doi: 10.1093/biomet/asq066. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Roy J, Daniels MJ. A general class of pattern mixture models for nonignorable dropout with many possible dropout times. Biometrics. 2008;64:538–545. doi: 10.1111/j.1541-0420.2007.00884.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Ryzin JV, Rai K. A dose-response model incorporating non-linear kinetics. Biometrics. 1987:95–105. [PubMed] [Google Scholar]
  14. Seaman SR, Pavlou M, Copas AJ. Methods for observed-cluster inference when cluster size is informative: A review and clarifications. Biometrics. 2014 doi: 10.1111/biom.12151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Su L, Tom BD, Farewell VT. Bias in 2-part mixed models for longitudinal semicontinuous data. Biostatistics. 2009;10:374–389. doi: 10.1093/biostatistics/kxn044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. White H. Maximum likelihood estimation of misspecified models. Econometrica: Journal of the Econometric Society. 1982:1–25. [Google Scholar]
  17. Williamson JM, Datta S, Satten GA. Marginal analyses of clustered data when cluster size is informative. Biometrics. 2003;59:36–42. doi: 10.1111/1541-0420.00005. [DOI] [PubMed] [Google Scholar]
  18. Wu MC, Bailey KR. Estimation and comparison of changes in the presence of informative right censoring: conditional linear model. Biometrics. 1989:939–955. [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

RESOURCES