Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Dec 18.
Published in final edited form as: Stat Biosci. 2012 Nov 1;4(2):213–234. doi: 10.1007/s12561-011-9053-2

A Seminonparametric Approach to Joint Modeling of A Primary Binary Outcome and Longitudinal Data Measured at Discrete Informative Times

Song Yan 1, Daowen Zhang 2, Wenbin Lu 3, James A Grifo 4, Mengling Liu 5
PMCID: PMC3524596  NIHMSID: NIHMS406168  PMID: 23259008

Abstract

In a study conducted at the New York University Fertility Center, one of the scientific objectives is to investigate the relationship between the final pregnancy outcomes of participants receiving an in vitro fertilization (IVF) treatment and their β-human chorionic gonadotrophin (β-hCG) profiles. A common joint modeling approach to this objective is to use subject-specific normal random effects in a linear mixed model for longitudinal β-hCG data as predictors in a model (e.g., logistic model) for the final pregnancy outcome. Empirical data exploration indicates that the observation times for longitudinal β-hCG data may be informative and the distribution of random effects for longitudinal β-hCG data may not be normally distributed. We propose to introduce a third model in the joint model for the informative β-hCG observation times, and relax the normality distributional assumption of random effects using the semi-nonparametric (SNP) approach of Gallant and Nychka (1987) [8]. An EM algorithm is developed for parameter estimation. Extensive simulation designed to evaluate the proposed method indicates that ignoring either informative observation times or distributional assumption of the random effects would lead to invalid and/or inefficient inference. Applying our new approach to the data reveals some interesting findings the traditional approach failed to discover.

Keywords: EM algorithm, Informative observation times, IVF, Joint model, Longitudinal data, Maximum likelihood estimation, SNP density

1 Introduction

For patients undergoing in vitro fertilization and embryo transfer (IVF-ET), the early determination of pregnancy prognosis is of critical importance [9]. From a medical standpoint, there are increased risks of adverse outcomes in IVF pregnancies compared with natural conceptions, including ectopic pregnancies and spontaneous abortions [9]. The incidence of ectopic pregnancy after IVF is nearly 2 to 5 times higher than that in natural pregnancies [17]. In naturally conceived cycles, among a variety of markers, human chorionic gonadotropin β subunit(β-hCG) has been found highly predictive of normal pregnancy [5]. In fact, because implantation can be timed more accurately in IVF pregnancies, β-hCG curves for IVF pregnancies should be even more accurate [5]. In the fertility literature, various cutoff levels of initial β-hCG measured some specific number of days after ET were proposed to predict viability (e.g., Lambers et al.(2006) [12]). However, a significant number of normal pregnancies often may have β-hCG levels below the established cut-offs [16]. Others suggested that the rising rate of early β-hCG after ET may also have a very strong positive correlation with the pregnancy outcome [4, 16]. Shamonki et al. (2009) [16] depicted logarithmic curves of initial β-hCG level and the rise of early β-hCG and the live delivery outcomes with IVF, and suggested that using the β-hCG curves could give both the clinician and the patient a more accurate assessment of the pregnancy since both the initial β-hCG value and the rising rate of β-hCG curve seem highly correlated with the final pregnancy outcome. Furthermore, Shamonki et al. (2009) [16] demonstrated there is a strong correlation among age, early β-hCG values and pregnancy outcome.

To understand the relationship between final pregnancy outcomes and early antenatal hormonal characteristics among IVF patients, a study conducted at the New York University Fertility Center collected data from patients who underwent IVF treatment between 2001 and 2003 (This retrospective study was approved by the Institutional Review Board at the NYU School of Medicine). For each patient, besides the baseline covariates and the primary binary pregnancy outcome (viable or nonviable) at the end of the study, β-hCG values are repeatedly measured at six potential follow-up time intervals between day 7 and day 23 after ET. Fig. 1 shows the observed log β-hCG profiles of 30 randomly selected patients with viable pregnancies and another 30 randomly selected patients with non-viable pregnancies. From Fig. 1, we can observe that the log β-hCG profiles of viable pregnancies show a clearly increasing trend, while those of nonviable pregnancies have a very diverse pattern, a mixture of increasing and decreasing trends, which suggests that patients’ β-hCG profiles may be correlated with their final pregnancy outcomes. In this paper, we take a joint modeling approach to studying the relationship between the pregnancy outcome and repeated measures of early β-hCG values after ET. In the statistical literature, there have been abundant works on joint models for a primary outcome (discrete or continuous) and longitudinal covariates. A popular approach is the latent variable approach that uses subject-specific random effects characterizing longitudinal covariate profiles as covariates in the model for the primary outcome ([11,22,10,21,19,13,20], among many others).

Fig. 1.

Fig. 1

Log transformed β–hCG profiles for 30 randomly selected patients from the non-viable pregnancy group (left) and viable pregnancy group (right), respectively.

One particular challenge in the analysis of NYU-IVF data is that the measurement mechanism of β-hCG profiles might also be informative for the primary pregnancy outcome. Table 1 shows the proportion of women having β-hCG measurements at each of the six intervals in the viable and non-viable pregnancy group separately. From Table 1 and Fig. 1, we observe that women with viable pregnancies tend to have less β-hCG measurements at intermittent time intervals 2 (days 10–11), 3 (days 12–13), 5 (days 17–19) and 6 (days 20–23) compared to those with non-viable pregnancies. To take into account the possible informative pattern of β-hCG measurements, we introduce extra subject-specific logistic submodels, one at each potential observation time (the middle point of a time interval) to model the probability that a β-hCG value will be measured at that time point for a study subject. The correlation among the β-hCG profile, binary pregnancy outcome and discrete observation times are then characterized via latent subject-specific random variables.

Table 1.

Proportion of women having β-hCG measures at each time interval.

1 2 3 4 5 6
Viable 91.7% 15.9% 7.3% 87.0% 15.7% 5.1%
Non-viable 91.7% 50.8% 25.0% 73.5% 37.9% 36.4%

In addition, most joint modeling strategies usually assume a normal or other parametric distributions for unobserved subject-specific random effects shared by submodels. This assumption can be restrictive and easily violated in many applications. To informally explore the distributions of subject-specific random effects in the linear mixed model for the β-hCG measurements from NYU-IVF data, we fit individual β-hCG profiles by simple linear regression over observation times (here we define the first observation time as time 0 and the unit is week). Fig. 2 presents the Q-Q plots of subject-specific intercept and slope estimates from individual least squares fits. Both plots suggest some discrepancy from the normal distribution, while the Q-Q plot of subject-specific slopes shows a larger deviation from the straight line. To take into account this departure from normality, we adopt the seminonparametric (SNP) approach of Gallant and Nychka (1987) [8] to model the distributions of subject-specific random effects. Similar approaches were also used by Zhang and Davidian (2001) [24] and Song, Davidian, and Tsiatis (2002) [18] in other contexts. We develop a maximum likelihood estimation method for the parameters in the joint model via an expectation maximization (EM) algorithm [7], and use an EM-aided numerical differentiation method to compute the variance-covariance matrix of the estimators of interest.

Fig. 2.

Fig. 2

(a) Q-Q plot of subject-specific intercept estimates from individual least squares fits. (b) Q-Q plot of subject-specific slope estimates from individual least squares fits.

In this paper we propose a joint model to study the association between a primary binary outcome and longitudinal covariates that are measured at informative discrete occasions. In this joint model, the random effects can follow a flexible distribution as well. To the best of our knowledge, this has not been studied in the literature. The rest of this article is organized as follows. Section 2 introduces notation and describes the proposed joint model. Section 3 presents the inference procedure using an EM algorithm. Section 4 applies our method to NYU-IVF data, in which we also investigate the consequences of ignoring the violation of normality assumption and the informative observation times. Section 5 provides simulation studies to further justify our method, followed by some discussion in Section 6.

2 Joint Model

2.1 Data Notation

For NYU-IVF data, let Yi denote the binary pregnancy outcome of subject i, i = 1, · · ·, n, obtained at the end of the study, taking the value of 1 if a viable pregnancy is achieved and 0 otherwise. Let Xi denote the baseline covariates, including age and BMI (appropriately standardized for computational stability). Denote the potential measurement occasions of β-hCG profiles by t = (t1, · · ·, tm)′, where t1 = 0 is the first observation time point, m is the maximum number of measurements and m = 6 for NYU-IVF data. Moreover, denote the complete log β-hCG profile by Zi = (Zi1, · · ·, Zim)′. Define Ri = (Ri1, · · ·, Rim)′, where Rij indicates whether a measurement of the β-hCG value occurs at the jth time point: Rij = 1 if a measurement is taken and Rij = 0 otherwise. Therefore, Zij is observed only if Rij = 1. The observed data consist of {Yi, Xi, Rij, ZijRij: i = 1, · · ·, n; j = 1, · · ·, m}.

2.2 Model Specifications

Fig. 1 indicates that there might be a quadratic trend over time in the longitudinal log β-hCG profile. To allow for the possible quadratic trend, we consider the following linear mixed model for the longitudinal log β-hCG profile Zij in the ideal situation of no missingness,

Zij=γ0+γ1tj+γ2tj2+γ3Xi+ui0+ui1tj+εij,j=1,,m, (1)

where ui ≡ (ui0, ui1)′ are mean-zero random effects, measuring deviation in the intercept and the changing rate of subject-specific log β-hCG profile at t1 = 0 from the population profile. It is assumed that the error term vector εi ≡ (εi1, · · ·, εim)′ is independent of ui and follows a multivariate normal distribution N(0, Σ), where the variance-covariance matrix Σ has the form =σε2Γ(ρ) with σε2 being a positive scalar and Γ(ρ) a m × m Markov structured correlation matrix indexed by ρ. Define γ=(γ0,γ1,γ2,γ3).

Note that we could add ui2tj2 in (1) to model subject-specific random effect ui2 of tj2. A preliminary analysis of NYU-IFV data indicates that ui2 may not be necessary (the P-value from the correct likelihood ratio test assuming normal distribution for ui is 0.29). Furthermore, adding ui2 will make the computation challenging for the SNP approach for ui. Therefore, we restrict two random effects in model (1).

For the binary pregnancy outcome Yi, we assume a logistic model,

logit{P(Yi=1Xi,ui)}=β0+β1Xi+α1ui0+α2ui1, (2)

where α1 and α2 are effects of ui0 and ui1 on the binary outcome Yi, respectively. Define β=(β0,β1). Finally, to account for the informative measurement pattern of β-hCG profiles as discussed before, we introduce extra logistic models for the measurement indicators Rij’s

logit{P(Rij=1Xi,ui)}=η0j+η1jXi+α3jui0+α4jui1,j=1,,m, (3)

where α3j and α4j are the time-specific effects of ui0 and ui1 on the measurement indicator. Define ηj=(η0j,η1j) and λj=(ηj,α3j,α4j), j = 1, · · ·, m.

Note that the three submodels (1), (2) and (3) are correlated through shared subject-specific random effects ui, and the magnitude of the association is controlled by the parameters α1, α2, α3j and α4j. Besides ui, extra randomness may exist. For example, the Markov structured variance-covariance matrix of the error terms in submodel (1) describes the additional association in repeated β-hCG measurements that can not be explained by the random effects ui. Following the convention in joint modeling, we assume that Zi, Ri and Yi are mutually independent given Xi and ui.

To flexibly model the distribution of the random effects ui, we adopt the SNP approach proposed by Gallant and Nychka (1987) [8]. This approach assumes that the random effects ui belongs to a class of densities that are sufficiently smooth so that they do not exhibit unusual behavior such as kinks, jumps, or oscillation. However the densities are very flexible and are allowed to be skewed, multi-modal, and fat- or thin-tailed (relative to the normal distribution); furthermore, the class includes the normal distribution as a special case [24]. Specifically, SNP approach re-formulates the random effects ui as

ui=μ+Dbi (4)

where μ = (μ0, μ1)′, D is a 2 × 2 lower triangular matrix with ξ ≡ vech(D) = (d00, d10, d11)′ and the “transformed” random effects bi = (bi1, bi2)′ have a smooth density function

hK(b)=PK2(b)ϕ(b)=(0i1+i2Kai1i2b1i1b2i2)2ϕ(b), (5)

with b = (b1, b2)′, where ϕ(·) is the standard bivariate normal density, K is a non-negative integer, PK(b) is a bivariate polynomial function of order K with coefficients ai1i2, i1, i2 = 0, 1, · · ·, K, and 0 ≤ i1 + i2K. To ensure hK(b) to be a proper bivariate density function, the coefficients {aij} of PK(b) must be chosen to satisfy the equality ∫hK(b)db = 1. Zhang and Davidian (2001)[24] showed that the above condition is equivalent to E{PK2(U)}=aAa=1, where U = (U1, U2)′ is a standard bivariate normal random variable, a is a d × 1 vector containing all the coefficients ai1i2’s, and A is the corresponding matrix with the elements of E(U1i1+j1U2i2+j2) with 0 ≤ i1 + i2K and 0 ≤ j1 + j2K. In Appendix B we give more details on the computation of the matrix A. Note that P0(·) ≡ 1, which corresponds to the standard bivariate normal density of bi. As demonstrated in Zhang and Davidian (2001) [24], in many cases the SNP approach with a K as small as one or two can adequately approximate complicated shapes, including multimodality and skewness. Since A is a symmetric positive definite matrix, hence it has a decomposition A = BB′ with some square matrix B. Write c = Ba, then we require aAa = cc = 1. As in [24], we consider a polar coordinate transformation of c = (c1, · · ·, cd)′, i.e. c1 = sin(φ1), c2 = cos(φ1)sin(φ2), · · ·, cd−1 = cos(φ1)cos(φ2)cos(φd−2)sin(φd−1), and cd = cos(φ1)cos(φ2)cos(φd−2)cos(φd−1), where −π/2 < φlπ/2 for l = 1, 2, · · ·, d − 1. Then cc = 1 and thus ∫hK(b)db = 1 is automatically satisfied by parameterizing hK(b) in terms of φ = (φ1, · · ·, φd−1)′.

For illustration, let us consider a univariate SNP density with K = 2. Using the results in [3], we have a0=cos(φ1)-sin(φ1)sin(φ2)/2, a1 = sin(φ1)cos(φ2) and a2=sin(φ1)sin(φ2)/2. Fig. 3 plots the resulting SNP densities for some selected values of (φ1, φ2). It is seen clearly that the SNP density can be used to approximate many different densities with distinct features (such as multi-modal, skewed, etc).

Fig. 3.

Fig. 3

Univariate SNP densities with some selected φ’s for K= 2: (a) φ1 = π/2, φ2 = 3π/10; (b) φ1 = π/2, φ2 = π/10; (c) φ1 = π/10, φ2 = 3π/10; (d) φ1 = π/10, φ2 = π/2.

3 Maximum Likelihood Estimation and Inference

For subject i, let mi=j=1mRij, and ti=(ti1,,timi) be the time points where β-hCG values are measured, and denote by Zi=(Zi1,,Zimi) the observed log β-hCG values. The observed data for subject i is denoted as Oi and write O = (Oi; i = 1, …, n). Let Θ={γ,σε2,ρ,β,α1,α2,λ1,,λm,μ,ξ,φ} denote the unknown parameters in the proposed joint model. Then the joint likelihood contributed by the observed data is given by

L(Θ)=i=1n[fL(Ziui)fP(Yiui)fR(Riui)fu(ui)dui], (6)

where fL(Ziui), fP(Yi|ui) and fR(Ri|ui) are conditional density functions (given random effects ui) of the observed longitudinal log β-hCG values, primary binary outcome and measurement indicators for subject i, respectively, and fU(·) is the density function of the random effects ui derived from the SNP representation (4). Here for simplicity, we omit the dependence of these density functions on the observed covariates. Specifically, we have SNP representation (4). Here for simplicity, we omit the dependence of these density functions on the observed covariates. Specifically, we have

{fL(Ziui;γ,i)=(2π)-mi/2i-1/2exp{-(Zi-μi)i-1(Zi-μi)},fP(Yiui;β,α1,α2)=exp{Yi(β0+β1Xi+α1ui0+α2ui1)}1+exp{β0+β1Xi+α1ui0+α2ui1},fR(Riui;λ1,,λm)=j=1mexp{Rij(η0j+η1jXi+α3jui0+α4jui1)}1+exp{η0j+η1jXi+α3jui0+α4jui1},fu(ui;μ,ξ,φ)=PK2{D-1(ui-μ)}ϕ{D-1(ui-μ)}D-1, (7)

where μi=γ0+γ1ti+γ2ti2+γ3Xi+ui0+ui1ti=E(Ziui,Xi),ti2 is the vector formed by squaring each element of ti,i=σε2Γi(ρ) is the associated Markov structured variance-covariance matrix for the observed Zi. Note that it is different from Σ for εi in submodel (1).

The direct maximization of the likelihood function (6) is very challenging, mainly due to two reasons: (a) the integration in (6) does not have an analytical form; (b) the number of model parameters is large and may cause numerical instability. This motivates us to develop an EM algorithm to maximize the likelihood function (for a given K). A striking property of the EM algorithm is that the observed data likelihood will always increase during the parameter update process. In addition, the parameter update can be separately carried over for different but smaller subsets of parameters and some parameter updates may even have closed forms.

The details of E-step and M-step of the EM algorithm can be found in Appendix A and B. The proposed EM algorithm requires specifying initial values of the parameters. Here we use the naive regression calibration (RC) method to produce the initial values for Θ. To be more specific, we first fit the linear mixed submodel for longitudinal covariates without considering the informative observation times, to obtain the initial estimators ( γ^(0),σ^ε2(0),ρ^(0)) and the best linear unbiased predictors (BLUP) u^i0(0) and u^i1(0), i = 1, · · ·, n. This can be easily done using standard software packages, e.g., PROC MIXED in SAS or lme in R. Then we fit the logistic models for the binary outcome and measurement indicators with Xi and the BLUP estimates u^i0(0) and u^i1(0) as covariates to obtain the initial estimators of parameters in (2) and (3), which can also be implemented directly by standard software packages. To obtain initial values for the parameters (μ, ξ, φ) in the SNP distribution, similar to Song, Davidian, and Tsiatis (2002) [18], we calculate the pseudo-maximum-likelihood estimates of (μ, ξ, φ) by treating the BLUP estimators u^i(0)=(u^i0(0),u^i1(0)) as the true underlying random effects, and denote the resulting estimators by (μ̂(0), ξ̂(0), φ̂(0)). For the convergence criterion of the proposed EM algorithm, we consider max|Θ̂(k+1)Θ̂(k)| < δ with δ = 0.0001 in our numerical studies. We use the Gaussian quadrature method with 25 quadrature knots to approximate the two dimensional integrations used in the E-step.

In the SNP representation, the parameter K is a tuning parameter, controlling the flexibility of the random effects distribution, which is needed to be chosen based on the data. Here we select K based on various information criteria that all take the form of −l(Θ)/N + C(N)pnet/N, where l(Θ) is the log likelihood, N=i=1nmi and pnet is the number of free parameters in the joint model. For example, C(N) = 1 is for the Akaike Information Criterion (AIC), C(N) = 0.5logN for the Schwarz Information Criterion (BIC), and C(N) = loglog(N) for the Hannan-Quinn criterion (HQ). As discussed by Zhang and Davidian (2001)[24], HQ criterion is often preferred in the selection of K. We will evaluate the empirical performance of these information criteria via simulations in Section 5.

For parametric models, Louis’s formula [14] can be used to compute the variance of the estimates obtained by the EM algorithm. However, for our proposed SNP likelihood approach, the number of parameters in the joint model can be large and our interest mainly focuses on the regression parameters for the effects of the baseline predictors and random effects. The direct calculation of the variance for the EM estimates based on Louis’s formula may be unstable. Here we calculate the variance for the estimates of the regression parameters by inverting the observed information matrix based on the corresponding profile log likelihood function. In general, the observed information matrix does not have a closed analytical form. We compute it using an EM-aided numerical differentiation method, which was studied by Meilijson (1989) [15] for parametric models and then extended to the proportional hazards model with missing covariates by Chen and Little (1999) [2]. Zeng and Cai (2005) [23] used a similar approach to variance estimation based on a profile likelihood function. The details of the EM-aided numerical differentiation method are given in Appendix C.

4 Application to NYU-IVF Data

The NYU-IVF data consist of 540 pregnancies obtained after IVF treatment at the New York University Fertility Center from 2001 to 2003. In this study, viable pregnancies are defined as pregnancies reaching the second trimester, including singleton, twins, higher order multiples and stillbirths; non-viable pregnancies include biochemical pregnancies, ectopic pregnancies and first trimester abortions [1]. The average age of participants was 35.27 years (SD= 4.40 years) and the average body mass index (BMI) was 23.71 (SD= 5.05). So for computational stability and ease of interpretation, age is centered at 35 and divided by 10, BMI is centered at 23 and divided by 10. The possible observation times of β-hCG are rounded to the median days of each six potential time intervals from day 7 to day 23 after ET. Interval 1 is days 7–9, interval 2 is days 10–11, interval 3 is days 12–13, interval 4 is days 14–16, interval 5 is days 17–19, and interval 6 is days 20–23. We define the first observation time as time 0 and transform the day unit to week unit. The total number of β-hCG observations is N = 1325. We fit the NYU-IVF data by our proposed joint model using the SNP likelihood-based approach with K = 0, 1 and 2 (K = 0 means normality assumption for ui). For comparison, we also analyzed the NYU-IVF data by a reduced joint model in which the informative measurement mechanism of the β-hCG profiles is ignored. When choosing the K in the SNP representation, we use the AIC, BIC and HQ criteria. The log likelihood for full model for K = 0, 1, 2 are −1732.86, −1641.57 and −1635.49; the AIC values are 1.3380, 1.2706 and 1.2683; the HQ values are 1.3674, 1.3014 and 1.3013; the BIC values are 1.4163, 1.3528 and 1.3564. The log likelihood, AIC, HQ and BIC values for reduced models are omitted here. Two of three criteria choose K = 2 for the full joint models and K = 1 for the reduced joint models, which again supports our previous discovery that the subject-specific random effects are not normally distributed. The parameter estimates with their estimated standard errors computed based on the proposed EM-aided numerical differentiation method for the full and reduced joint models are summarized in Tables 2 and 3, where Table 2 summarizes the results for the linear mixed model and the logistic model for the primary binary outcome while Table 3 for the logistic models for the measurement indicators.

Table 2.

Estimation of parameters in the β-hCG longitudinal submodel and binary pregnancy outcome submodel in the proposed full model and reduced model for NYU-IVF data

Para Full model K =0 (A)
Full model K=2 (B)
Reduced model K=0 (C)
Reduced model K=1 (D)
Est SE p-value Est SE p-value Est SE p-value Est SE p-value
longitudinal submodel for β-hCG
γ0(intercept) 4.605 0.071 <0.0001 4.606 0.066 <0.0001 4.613 0.057 <0.0001 4.729 0.048 <0.0001
γ1 (slope) 2.952 0.170 <0.0001 2.881 0.144 <0.0001 2.593 0.153 <0.0001 2.610 0.126 <0.0001
γ2 (quadratic) −0.432 0.122 0.0003 −0.481 0.111 <0.0001 −0.357 0.101 0.0004 −0.349 0.093 0.0001
γ3,1(age) −0.185 0.191 0.33 −0.190 0.155 0.22 −0.189 0.167 0.25 −0.183 0.141 0.19
γ3,2(BMI) −0.385 0.156 0.01 −0.371 0.124 0.003 −0.396 0.132 0.002 −0.378 0.115 0.001
σe2
0.395 0.385 0.392 0.367
ρ 0.276 0.061 <0.0001 0.303 0.066 <0.0001 0.288 0.063 <0.0001 0.305 0.058 <0.0001
binary outcome submodel for viable pregnancy
β0 1.633 0.216 <0.0001 1.583 0.199 <0.0001 1.609 0.211 <0.0001 1.642 0.209 <0.0001
β1,1(age) −1.180 0.403 0.003 −1.104 0.361 0.002 −1.283 0.398 0.001 −1.083 0.342 0.001
β1,2(BMI) −0.352 0.260 0.17 −0.359 0.229 0.12 −0.347 0.253 0.17 −0.360 0.223 0.10
α1 1.523 0.447 0.0006 1.610 0.324 <0.0001 1.415 0.747 0.058 1.278 0.692 0.064
α2 2.059 0.720 0.004 1.527 0.415 0.0002 1.762 0.639 0.005 1.609 0.627 0.010

SE is the estimated standard error.

Table 3.

Estimation of parameters in β-hCG observation times submodel of the proposed full model for NYU-IVF data

Para Full model K =0 (A)
Full model K =2 (B)
Est SE p-value Est SE p-value
observation times model
η02 −2.233 0.745 0.002 −2.412 0.415 <0.0001
η12,1(age) 0.110 0.429 0.79 0.175 0.436 0.68
η12,2(BMI) 0.258 0.350 0.46 0.367 0.327 0.26
α32 −5.066 1.470 0.0005 −4.684 1.022 <0.0001
α42 1.320 0.513 0.01 0.636 0.345 0.07
η03 −2.668 0.259 <0.0001 −2.674 0.221 <0.0001
η13,1(age) 0.384 0.380 0.31 0.397 0.361 0.27
η13,2(BMI) −0.259 0.318 0.41 −0.249 0.308 0.41
α33 −2.649 1.059 0.012 −2.321 0.476 <0.0001
α43 0.771 0.610 0.20 0.383 0.305 0.20
η04 2.697 0.921 0.003 2.178 0.217 <0.0001
η14,1(age) −0.401 0.409 0.32 −0.366 0.310 0.24
η14,2(BMI) −0.532 0.338 0.11 −0.492 0.275 0.073
α34 4.083 1.434 0.004 2.137 0.460 <0.0001
α44 −1.413 1.763 0.42 −0.331 0.211 0.11
η05 −2.053 0.553 0.0002 −1.894 0.190 <0.0001
η15,1(age) 0.837 0.399 0.035 0.698 0.318 0.028
η15,2(BMI) 0.562 0.286 0.049 0.531 0.207 0.01
α35 −3.539 1.675 0.03 −2.068 0.513 <0.0001
α45 1.180 1.402 0.40 0.664 0.441 0.13
η06 −2.438 0.228 <0.0001 −2.425 0.213 <0.0001
η16,1(age) 0.148 0.345 0.66 0.190 0.317 0.55
η16,2(BMI) 0.237 0.255 0.35 0.286 0.243 0.23
α36 −2.059 0.491 <0.0001 −1.994 0.527 0.0001
α46 0.242 0.273 0.37 0.188 0.187 0.31

SE is the estimated standard error.

Columns (A) and (B) in Table 2 are the results for our proposed joint model with K = 0 and 2, respectively. We make the following observations: (1) age is significantly negatively associated with the final pregnancy outcome, while BMI is not; (2) BMI is significantly negatively associated with the β-hCG profile, while age is not; (3) greater baseline value and stronger increasing trend of the β-hCG profile at baseline are associated with higher chance of viable pregnancy (e.g., α̂1 = 1.610 and α̂2 = 1.527 when K = 2, with the p-values of both < 0.05); (4) the point estimates for K = 0 and 2 are very comparable except for estimates of α2, while the estimates for K = 2 show certain efficiency gain for some parameters. We will further investigate the efficiency of estimates for K = 0 and 2 by simulations in Section 5.

Columns (C) and (D) in Table 2 are the results for the reduced model ignoring the informative measurement times with K = 0 and 1, respectively. Comparing column (D) to column (B), we observe that there is some discrepancy in the estimated population curves of the log β-hCG profiles. This is further supported by the results from the simulation in Section 5, which demonstrates that ignoring the informative measurement pattern of longitudinal data may cause an estimation bias of the mean slope parameter in the linear mixed model. More importantly, we find that the estimate of α1 (the effect of the random intercept on the final pregnancy outcome) becomes in-significant in the reduced model, due to a small point estimate and an increased standard error estimate.

From the results in Table 3 we observe that the estimates for the random intercepts (α3j’s) are negatively significant at intermediate times t2, t3, t5 and t6, while those for the random slopes (α4j’s) are not significant. This suggests that doctors may require more intermediate measurements of β-hCG values for women with lower β-hCG values at t1. Furthermore, age and BMI do not have significant effects on most of the measurement indicators.

The estimated joint and marginal density functions of the random effects ui for the full model with K = 2 are given in Fig. 4: (a) plots the estimated bivariate density function, which shows a bimodal distribution; (b) gives the contour plot corresponding to the bivariate density in (a) together with the posterior estimates of the ui’s, which also shows clearly two clusters; (c) and (d) are estimated marginal densities of ui0 and ui1 respectively, together with their histograms. It is seen that the estimated density of ui1’s also shows a clear bimodal, implying the deviation from the normal distribution. The bimodal distribution of the subject-specific random effects indicates that the underlying study population may be a mixture of two populations, separated most by the latent variable ui1. According to the results for the primary pregnancy outcome, we infer that one sub-population has a higher likelihood to have a viable pregnancy, while the other has a lower likelihood.

Fig. 4.

Fig. 4

(a) Estimated bivariate density of ui; (b) Contour plots of the density in (a) with posterior estimates of ui superimposed (contours are 5, 50 and 95%); (c) and (d) Estimated marginal densities for the random intercept and slope, superimposed by the histograms for posterior estimates of ui, respectively.

5 Simulation Studies

Simulation was conducted to evaluate the performance of the proposed method under practical settings. We used the same linear mixed model for longitudinal covariates (without a quadratic term of tj for simplicity) and logistic model for the primary binary outcome. For the informative measurement indicators, we consider the following simpler logistic models:

logit{P(Rij=1Xi,ui)}=η0j+η1Xi+α3ui0+α4ui1,j=1,,m, (8)

where the effects of covariates and subject-specific random variables on the measurement indicators are assumed to be the same over all time points. We consider two baseline covariates: a binary covariate Xi1 sampled from the Bernoulli distribution with a success probability of 0.5 and a continuous covariate Xi2 sampled from the standard normal distribution. Longitudinal covariates Zi were generated at six possible observation times (t1, · · ·, t6) = (1, 2, 3, 4, 5, 6) in weeks. The subject-specific random effects ui’s were generated from a mixture of two bivariate normal distributions: F(ui) = 0.7Φ(uI; μ1, V)+ 0.3Φ(uI; μ2, V) with μ1 = (1.56, 0)′, μ2 = (−3.64, 0)′ and the elements of the covariance matrix V given by v00 = 0.81, v01 = v10 = −0.0456 and v11 = 1.96. The true values of the regression parameters are shown in Table 4. We run 100 simulation with sample size of n = 300.

Table 4.

Simulation results: K = 0 denotes the estimation by assuming normal random effects. HQ is the estimation when K is selected by the HQ criterion.

Para true K=0
HQ
Est SD SE CP Est SD SE CP
(a)Proposed full joint model:
longitudinal submodel
γ0 1 0.987 0.237 0.220 93 % 0.986 0.175 0.177 96 %
γ1 0.5 0.498 0.080 0.087 97 % 0.502 0.080 0.085 96 %
γ2,1(X1) 0.5 0.483 0.156 0.157 94 % 0.502 0.079 0.077 93 %
γ2,2(X2) −0.5 −0.490 0.327 0.312 94 % −0.504 0.160 0.161 96 %
σe2
1 0.993 1.005
ρ 0.3 0.296 0.304
binary outcome submodel
β0 0.8 0.783 0.396 0.375 93 % 0.814 0.373 0.361 96 %
β1,1(X1) −1 −1.048 0.277 0.286 98 % −1.050 0.237 0.245 97 %
β1,2(X2) −1.5 −1.568 0.522 0.531 95 % −1.550 0.434 0.442 93 %
α1 −1 −1.071 0.173 0.171 96 % −1.041 0.152 0.156 98 %
α2 1 1.061 0.222 0.209 97 % 1.048 0.202 0.204 92 %
informative times submodel
η01 1.6 1.637 0.213 0.216 96 % 1.617 0.199 0.205 96 %
η02 1 1.037 0.218 0.206 93 % 1.015 0.212 0.201 93 %
η03 1.5 1.520 0.209 0.213 95 % 1.498 0.195 0.189 94 %
η04 1 1.006 0.222 0.205 95 % 0.984 0.199 0.193 94 %
η05 1 1.040 0.219 0.206 94 % 1.019 0.189 0.190 96 %
η06 1.4 1.402 0.220 0.211 93 % 1.385 0.192 0.183 94 %
η1,1(X1) 1 0.991 0.104 0.110 95 % 1.002 0.082 0.087 96 %
η1,2(X2) −0.5 −0.502 0.228 0.204 94 % −0.513 0.157 0.149 94 %
α3 0.5 0.526 0.043 0.041 94 % 0.510 0.040 0.040 96 %
α4 −0.5 −0.504 0.052 0.058 98 % −0.502 0.051 0.059 98 %
%K (2, 84, 14)
(b)Reduced model:
longitudinal submodel
γ0 1 1.014 0.237 0.218 91% 0.982 0.178 0.169 90%
γ1 0.5 0.401 0.079 0.085 84% 0.408 0.078 0.086 87%
γ2,1(X1) 0.5 0.479 0.157 0.156 94% 0.501 0.080 0.085 96%
γ2,2(X2) −0.5 −0.488 0.326 0.310 94% −0.504 0.164 0.175 97%
σe2
1 0.992 1.022
ρ 0.3 0.290 0.295
binary outcome submodel
β0 0.8 0.724 0.395 0.383 93 % 0.819 0.381 0.330 88%
β1,1(X1) −1 −1.096 0.312 0.304 99 % −1.102 0.249 0.256 99%
β1,2(X2) −1.5 −1.564 0.535 0.551 96 % −1.527 0.448 0.457 94%
α1 −1 −1.124 0.212 0.197 100 % −1.073 0.157 0.169 98%
α2 1 1.088 0.260 0.234 96 % 1.074 0.210 0.210 96%
%K (1, 77, 22)

SD is the sample standard deviation; SE is the mean of estimated standard errors; CP is the empirical coverage probability of 95% Wald-type confidence intervals; %K represents the proportions of K = 0, 1 or 2 out of 100 runs selected by the HQ criterion.

We consider the SNP representation with K = 0, 1 and 2, and choose the optimal K values by the HQ criterion (the AIC and BIC criteria give similar results). Simulation results are summarized in Table 4. Here we report the results for K = 0 corresponding to the bivariate normal distribution of the random effects, and the optimal estimates chosen by HQ information criterion. For comparison, we also report the estimation results based on the reduced model ignoring the informative measurement process. For our proposed method (part (a) in the table), the estimates of all the parameters are nearly unbiased in all cases, the SE’s are close to the SD’s, and all the CP’s are close to the nominal level. This implies that the HQ information criterion performs well in selecting the tuning parameter K. Similar observations were also found in the literature [18]. However, the estimators assuming the incorrect normality assumption for the random effects (K = 0) may be less efficient than those estimated by SNP with the K value chosen by the HQ criterion (for example, the SD of γ̂2,1 is 0.156 for K = 0, but 0.079 for HQ; the SD for γ̂2,2 is 0.327 for K = 0, but 0.160 for HQ). In terms of the selection of the tuning parameter K by HQ, the values 0, 1 and 2 were chosen 2%, 84% and 14% out of 100 runs, indicating the ability of the HQ criterion to detect the departure of the random effects distribution from normality. These selection proportions by AIC and BIC are 1%, 78%, 21% and 2%, 88%, 10% respectively.

The results for the reduced model are shown in part (b) of Table 4. We can see that the estimates of the mean slope parameter (γ1) in the linear mixed model for longitudinal covariates show bigger biases compared to those of the full model and the associated empirical coverage probability (CP) of the 95% Wald-type confidence interval is significantly below the nominal level (≤ 87%). We also see that the paramter estimates in the primary binary outcome model exhibit some degree of bias. In particular, there is about 10% bias in the regression parameter estimate β̂1,1, and 7% to 12% bias in the parameter estimate α̂1. In addition, these parameter estimates tend to have greater variability than those from our proposed method. This suggests that ignoring the informative measurement process may cause invalid or less efficient inference for some parameters in the joint model.

In Fig. 5, we plot the true and estimated (based on the means of 100 simulations) joint and marginal densities of the subject-specific random effects ui: (a) and (b) are for the true and estimated bivariate densities, respectively; (c) and (d) are for the marginal densities of the random intercept and baseline slope, respectively. It can be seen that the SNP approach can estimate the true bivariate and marginal densities of random effects very well, demonstrating the flexibility of the SNP representation to capture the complex distribution of random effects that are different from the normal.

Fig. 5.

Fig. 5

(a) and (b) True and estimated bivariate density of ui by HQ; (c) and (d) True marginal density (solid line) and estimated marginal density chosen by HQ (dash-dotted line) for the random intercept an slope, respectively.

6 Discussion

In this article, we propose a joint model that can naturally study the association between a primary binary outcome and longitudinal covariates that are possibly measured at discrete informative observation times. The association is described by latent subject-specific random variables whose distribution is flexibly modeled by the SNP representation, allowing the departure from normal assumption. Our approach provides a great insight into the relationship between early β-hCG profiles and the final pregnancy outcomes after IVF treatment. The results demonstrate the power of using the latent characteristics of early stage β-hCG profiles for the prediction of viable pregnancies achieved by IVF treatment, and the importance to take into account the informative measurement process and non-normally distributed random effects for data analysis.

Other modeling strategies proposed in the literature may also be used to analyze NYU-IVF data. A particular one tries to use the β-hCG value measured at some specific time point as a covariate in the logistic model for the primary pregnancy outcome. This approach may be problematic for NYU-IVF data for the following reasons. First, the observation time points are different for different women. Therefore, it is not possible to select a common time point where all women had β-hCG measured. Second, if we could find a common time point at which β-hCG measurements were available for all women, using the β-hCG value at this particular time point as a covariate may not have a scientific and/or clinical justification. Third, suppose we had β-hCG data at a clinically meaningful time point, there is usually some degree of measurement error in this hormone data. It is well-known that putting an error-prone variable as a covariate in a logistic model will lead to a biased result.

If we follow the convention in mixed model literature to interpret Zi(t)=γ0+γ1t+γ2t2+γ3Xi+ui0+ui1t as the true log β-hCG of subject i (with covariates Xi) at time point t, our joint model then implies that the viable pregnancy probability has a logistic regression model as follows

logit{P(Yi=1Xi,t,ui)}=β0(t)+β1Xi+α1Zi(t)+(α2-α1t)Zi(t).

where β0(t) = β0α1γ0α2γ1 − 2α2γ2t + α1γ2t2, β1=β1-α1γ3. That is, given the results from our joint model, we can simultaneously investigate the effects of the true log β-hCG and its changing rate at any time on the primary pregnancy outcome. Since both α̂1 and α̂2 are positive, changing rate of the true log β-hCG at baseline is most predictive of the viable pregnancy outcome. Similarly, we can show from our joint model that the logit of a viable pregnancy probability is (linearly) related to the average of the true log β-hCG and its changing rate in any specific interval [0, T].

Acknowledgments

The research of Zhang was partly supported by NIH grants R01-CA85848-08, R01-MH84022-01 and R37-AI031789-20. The research of Lu was partly supported by R01-CA140632. The authors gratefully acknowledge the comments and suggestions of the editor, associate editor, and two reviewers, which have greatly improved the paper.

Appendix A: The details of EM algorithm for Our Joint Model

The log complete-data likelihood is given by

lC(Θ)=i=1n[log{fL(Ziui)}+log{fP(Yiui)}+log{fR(Riui)}+log{fu(ui)}]. (9)

In the kth E-step, we need to calculate the conditional expectation of lC(Θ) with respect to the random effects ui given the observed data O and the current parameter estimates Θ̂(k), commonly called the Q function and denoted as Q(Θ| Θ̂(k)). From equation (9), the conditional expectation Q(Θ| Θ̂(k)) can be expressed as Q(ΘΘ^(k))=Q1(γ,σε2,ρ;O,Θ^(k))+Q2(β,α1,α2,O,Θ^(k))+Q3(λ1,,λm;O,Θ^(k))+Q4(μ,ξ,φ;O,Θ^(k)), where

Q1(γ,σε2,ρ;O,Θ^(k))=i=1nE[log{fL(Ziui)}Oi,Θ^(k)]Q2(β,α1,α2;O,Θ^(k))=i=1nE[log{fP(Yiui)}Oi,Θ^(k)]Q3(λ1,,λm;O,Θ^(k))=i=1nE[log{fR(Riui)}Oi,Θ^(k)]Q4(μ,ξ,φ;O,Θ^(k))=i=1nE[log{fu(ui)}Oi,Θ^(k)], (10)

and the expectation E(·|Oi, Θ̂(k)) is taken with respect to the conditional density function f(ui |Oi, Θ̂(k)) of u given O evaluated at Θ̂(k). By simple calculation, we have

f(uiOi,Θ^(k))=f(Oi,uiΘ^(k))f(Oi,uiΘ^(k))dui=fP(Yiui,Θ^(k))fR(Riui,Θ^(k))PK2{D-1(ui-μ)}g(uiZi,Θ^(k))fP(Yiui,Θ^(k))fR(Riui,Θ^(k))PK2{D-1(ui-μ)}g(uiZi,Θ^(k))dui, (11)

where g(uiZi,Θ^(k)) is the density of a bivariate normal distribution with mean

μi(k)=Wi{Gii-1(Zi-γ2Xi)+(DD)-1μ}

and variance-covariance matrix

Wi(k)={(DD)-1+Gii-1Gi}-1,

with Gi=(1,ti) being an mi × 2 matrix. This distribution can be viewed as the “working” conditional distribution of ui given Zi and Θ̂(k). Then the integrations in (10) and (11) can be computed using numerical integration techniques. Here we use the Gaussian quadrature method.

Next, in the M-step, it is clear that the maximization of Q(Θ| Θ̂(k)) can be conducted separately for each component in Q(Θ| Θ̂(k)).

Specifically, Q1(γ,σε2,ρ;O,Θ^(k)), the conditional expectation of log-likelihood of (γ, σε2, ρ), takes the following form:

-12i=1nlogi-12i=1n(Zi-γ0-γ1ti-γ2Xi-ui0(k)-ui1(k)ti)i-1 (12)
×(Zi-γ0-γ1ti-γ2Xi-ui0(k)-ui1(k)ti) (13)

where ui0(k)=E(ui0Oi,Θ^(k)) and ui1(k)=E(ui1Oi,Θ^(k)), and i=σε2Γi(ρ). it can be shown by simple algebra that

γ^(k+1)=(i=1nXi(^i(k))-1Xi)-1i=1nXi(^i(k))-1(Zi-ui0(k)-ui1(k)ti) (14)

where (^i(k))-1=(σ^ε2)(k)Γi(ρ^(k)),ui0(k)=E(ui0Oi,Θ^(k)) and ui1(k)=E(ui1Oi,Θ^(k)).

The updated estimators ( σ^ε2(k+1),ρ^(k+1)) at (k + 1)th step can be obtained by solving the “score equation” of Q1(γ,σε2,ρ;O,Θ^(k)) with respect to σε2 and ρ. For example, if subject i has mi = 4 observations. Let djk = |tijtik| be the length of time between times tij and tik for all j, k = 1, · · ·, mi, then

Γi(ρ)=(1ρd12ρd13ρd14ρd121ρd23ρd24ρd13ρd231ρd34ρd14ρd24ρd341).

The derivative of Q1(γ,σε2,ρ;O,Θ^(k)) with respect to σε2 and ρ have the following expressions:

S(k)(σε2)=-12i=1ntr{i-1Γi(ρ)}+12i=1n(Zi-γ0-γ1ti-γ2Xi-ui0(k)-ui1(k)ti)i-1Γi(ρ)i-1×(Zi-γ0-γ1ti-γ2Xi-ui0(k)-ui1(k)ti) (15)
S(k)(ρ)=-12i=1ntr(i-1σε2Γi(ρ)ρ)+12i=1n(Zi-γ0-γ1ti-γ2Xi-ui0(k)-ui1(k)ti)×i-1σε2Γi(ρ)ρi-1(Zi-γ0-γ1ti-γ2Xi-ui0(k)-ui1(k)ti) (16)

Let S(k)(σε2,ρ)=(S(k)(σε2),S(k)(ρ)), S(k)(ρ))′. We then solve the following equation

S(k)(σε2,ρ)=0 (17)

to obtain the update for ψ=(σε2,ρ). However, the “score equation” (17) does not have a closed form solution. We then propose to update ψ=(σε2,ρ) by one-step Newton-Raphson algorithm

ψ^(k+1)=ψ^(k)+I(k)-1(ψ^(k))S(k)(ψ^(k)), (18)

where S(k)(ψ^(k))=S(k)(σ^ε2(k),ρ^(k)) is S(k)(σε2,ρ) evaluated at ψ^(k)=(σ^ε2(k),ρ^(k)), and I(k)(ψ)=I(k)(σε2,ρ) is the pseudo Fisher score matrix of ψ and defined by

I(ψ)=(S(k)(σε2)σε2S(k)(σε2)ρS(k)(σε2)ρS(k)(ρ)ρ) (19)

where S(k)(σε2)σε2=12tr(i-1iσ2i-1iσε2),S(k)(σε2)ρ=12tr(i-1iσε2i-1iρ) and S(k)(ρ)ρ=12tr(i-1iρi-1iρ). The updated estimators β̂(k+1), α^1(k+1) and α^2(k+1) can also be obtained by the Newton-Raphson (or one-step Newton-Raphson) method based on the “score function” and “information matrix” of Q2(β, α3, α4; O, Θ̂(k)). The updated estimators ( λ^1(k+1),,λ^m(k+1)) can be obtained in a similar way based on the “score function” and “information matrix” of Q3(λ1, · · ·, λm; O, Θ̂(k)). The (k + 1)th update of μ, ξ and φ can be found by using any optimization software, e.g., nlm in R, to maximize; Q4(μ, ξ, φ; O, Θ̂(k)) under the constraint E(ui) = 0. The details can be found in the Appendix B. We iterate the E-step and M-step until a pre-specified convergence criterion is met. Let Θ̂ denote the final estimator at convergence.

Appendix B: Updating Parameters μ, D and φ in the EM Algorithm

At the kth step in the EM algorithm, Q4(μ, ξ, φ; O, Θ̂(k)) takes the following form:

i=1nE[log{PK2{D-1(ui-μ)}}]-12i=1nE[(ui-μ)(DD)-1(ui-μ)]-nlogD (20)

To satisfy E(ui) = 0, let E(ui) = μ + D · E(bi) = 0, then

μ^(D,φ)=-D·E(bi) (21)

Plugging the above μ̂(D, φ) into (20) we have

Q4(μ^(D,φ),ξ,φ;O,Θ^(k))=i=1nE[log{PK2{D-1(ui-μ^(D,φ))}}]-12i=1nE[(ui-μ^(D,φ))(DD)-1(ui-μ^(D,φ))]-nlogD (22)

then the updated estimators ((k+1), φ(k+1)) at (k+1)th step can be obtained by using standard optimization software to maximize Q4(μ̂(D, φ), ξ, φ; O, Θ̂(k)). The updated estimator for μ is then μ̂(k+1) = μ̂((k+1), φ̂(k+1)).

E(bi) varies for different K values. To ensure ∫gK(b)db = 1, we need the quadratic constraint aTAa = 1. Because A is positive definite, there exists B positive definite such that A = BBT. Let c = Ba, then cTc = 1 and a = (B′)−1c.

For K = 1,

A=(100010001), (23)

and P12(b)=(a00+a10b1+a01b2)2. Let a = (a00, a10, a01)′, define c = (c1, c2, c3)′, where c1 = sinφ1, c2 = cosφ1sinφ2 and c3 = cosφ1cosφ2. a = c and

E(bi)=E(b1b2)=(2a00a102a00a01) (24)

For K = 2,

A=(100101010000001000100301000010100103), (25)

and P22(b)=(a00+a10b1+a01b2+a20b12+a11b1b2+a02b22)2. Let a = (a00, a10, a01, a20, a11, a02)′, define c = (c1, c2, c3, c4, c5, c6)′ where c1 = sinφ1, c2 = cosφ1sinφ2, c3 = cosφ1cosφ2sinφ3, · · ·, c6 = cosφ1cosφ2 · · · cosφ5. Let c = Ba, then a = (B′)−1c and

E(bi)=E(b1b2)=(2a00a10+2a10a02+2a01a11+6a10a202a00a01+2a10a11+2a01a20+6a01a02) (26)

Appendix C: EM-Aided Algorithm for Variance Estimates

Define θ = (γ, β, α1, α2, λ1, · · ·, λm), the regression parameters of our main interest, and ζ the remaining nuisance parameters in Θ. Let θ̂ denote the MLE of θ obtained by the proposed EM algorithm. Moreover, let ζ^(θ)=(σ^ε2(θ),ρ^(θ),μ^(θ),ξ^(θ),φ^(θ)) denote the maximizer of the observed likelihood function over ζ with θ fixed, which can be obtained by a similar EM algorithm as proposed. Then we calculate the information matrix for θ as follows:

  1. Perturb the jth component of θ̂ by a small amount h from both sides, i.e. θ̂j + h and θ̂jh, while keeping other components of θ̂ unchanged; denote the resulting parameter vectors by θ̂j+ and θ̂j respectively, and run the EM algorithm to obtain ζ̂(θ̂j+) and ζ̂(θ̂j) accordingly.

  2. Compute the jth row of the information matrix of θ by
    12h[E{lCθ(θ^j+,ζ(θ^j+))|O,θ^j+,ζ^(θ^j+)}-E{lCθ(θ^j-,ζ(θ^j-))|O,θ^j-,ζ^(θ^j-)}]

    where lCθ(θ^j+,ζ(θ^j+)) and lCθ(θ^j-,ζ(θ^j-)) denote the derivatives of the complete-data log likelihood lC (θ, ζ) with respect to θ evaluated at (θ̂j+, ζ( θ̂j+)) and (θ̂j, ζ( θ̂j)) respectively.

The derivation of lCθ(θ,ζ) is straightforward and hence omitted here. For the perturbation size h, [2] suggested to use h = c/n, where c is a positive constant. We found that c = 0.15 would produce satisfactory results in all of our numerical studies.

Contributor Information

Song Yan, Email: syan@ncsu.edu, Department of Statistics, North Carolina State University, Raleigh, NC 27695, U.S.A.

Daowen Zhang, Department of Statistics, North Carolina State University, Raleigh, NC 27695, U.S.A.

Wenbin Lu, Department of Statistics, North Carolina State University, Raleigh, NC 27695, U.S.A.

James A. Grifo, NYU Fertility Center, NYU Langone Medical Center, New York, NY 10016, U.S.A

Mengling Liu, Division of Biostatistics, School of Medicine, New York University, New York, NY 10016, U.S.A.

References

  • 1.Bjercke S, Tanbo T, Dale PO, Morkrid L, Abyholm T. Human chorionic gonadotrophin concentrations in early pregnancy after in-vitro fertilization. Human Reproduction. 1999;14:1642–1646. doi: 10.1093/humrep/14.6.1642. [DOI] [PubMed] [Google Scholar]
  • 2.Chen HY, Little RJA. Proportional hazards regression with missing covariates. Journal of the American Statistical Association. 1999;94:896–908. [Google Scholar]
  • 3.Chen J, Zhang D, Davidian M. A Monte Carlo EM algorithm for generalized linear mixed models with flexible random effects distribution. Biostatistics. 2001;1:1–27. doi: 10.1093/biostatistics/3.3.347. [DOI] [PubMed] [Google Scholar]
  • 4.Chung K, Sammel MD, Coutifaris C, Chalian R, Lin K, Castelbaum AJ, Freedman MF, Barnhart KT. Defining the rise of serum hCG in viable pregnancies achieved through use of IVF. Human Reproduction. 2006;21:823–828. doi: 10.1093/humrep/dei389. [DOI] [PubMed] [Google Scholar]
  • 5.Confino E, Demir RH, Friberg J, Gleicher N. The predictive value of hCG β subunit levels in pregnancies achieved by in vitro fertilization and embryo transfer: an international collaborative study. Fertility and Sterility. 1986;45:526–531. [PubMed] [Google Scholar]
  • 6.Davidian M, Gallant AR. The nonlinear mixed effects model with a smooth random effects density. Biometrika. 1993;80:475–488. [Google Scholar]
  • 7.Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B. 1977;39:1–38. [Google Scholar]
  • 8.Gallant AR, Nychka DW. Seminonparametric maximum likelihood estimation. Econometrica. 1987;55:363–390. [Google Scholar]
  • 9.Glatstein IZ, Hornstein MD, Kahana MJ, Jackson KV, Friedman AJ. The predictive value of discriminatory human chorionic gonadotropin levels in the diagnosis of implantation outcome in vitro fertilization cycles. Fertility and Sterility. 1995;63:350–356. doi: 10.1016/s0015-0282(16)57367-1. [DOI] [PubMed] [Google Scholar]
  • 10.Henderson R, Diggle P, Dobson A. Joint modelling of longitudinal measurements and event time data. Biostatistics. 2000;1:465–480. doi: 10.1093/biostatistics/1.4.465. [DOI] [PubMed] [Google Scholar]
  • 11.Hogan J, Laird N. Model-based approaches to analysing incomplete longitudinal and failure time data. Statistics in Medicine. 1997;16:259–272. doi: 10.1002/(sici)1097-0258(19970215)16:3<259::aid-sim484>3.0.co;2-s. [DOI] [PubMed] [Google Scholar]
  • 12.Lambers MJ, Weering HGIV, Grunewold MDV, Lambalk CB, Homburg R, Schats R, Hopes PGA. Optimizing hCG cut-off values: A single determination on day 14 or 15 is sufficient for a reliable prediction of pregnancy outcome. European Journal of Obstetrics and Gynecology and Reproductive Biology. 2006;127:94–98. doi: 10.1016/j.ejogrb.2005.12.023. [DOI] [PubMed] [Google Scholar]
  • 13.Li E, Zhang D, Davidian M. Conditional estimation for generalized linear models when covariates are subject-specific parameters in a mixed model for longitudinal measurements. Biometrics. 2004;60:1–7. doi: 10.1111/j.0006-341X.2004.00170.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Louis TA. Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society, Series B. 1982;44:226–233. [Google Scholar]
  • 15.Meilijson E. A fast improvement to the EM algorithm on its own terms. Journal of the Royal Statistical Society, Series B. 1989;51:127–138. [Google Scholar]
  • 16.Shamonki M, Frattarelli JL, Bergh PA, Scott RT. Logarithmic curves depicting initial level and rise of serum beta human chorionic gonadotropin and live delivery outcomes with in vetro fertilization: an analysis of 6021 pregnancies. Fertility and Sterility. 2009;91:1760–1764. doi: 10.1016/j.fertnstert.2008.02.171. [DOI] [PubMed] [Google Scholar]
  • 17.Strandell A, Thorburn J, Hamberger L. Risk factors for ectopic pregnancy in assisted reproduction. Fertility and Sterility. 1999;71:282–286. doi: 10.1016/s0015-0282(98)00441-5. [DOI] [PubMed] [Google Scholar]
  • 18.Song X, Davidian M, Tsiatis AA. A semiparametric likelihood approach to joint modeling of longitudinal and time-to-event data. Biometrics. 2002;58:742–753. doi: 10.1111/j.0006-341x.2002.00742.x. [DOI] [PubMed] [Google Scholar]
  • 19.Tsiatis AA, Davidian M. A semiparametric estimator for the proportional hazards model with longitudinal covariates measured with error. Biometrika. 2001;88:447–458. doi: 10.1093/biostatistics/3.4.511. [DOI] [PubMed] [Google Scholar]
  • 20.Vonesh EF, Greene T, Schluchter MD. Shared parameter models for the joint analysis of longitudinal data and event times. Statistics in Medicine. 2006;25:143–163. doi: 10.1002/sim.2249. [DOI] [PubMed] [Google Scholar]
  • 21.Wang CY, Wang N, Wang S. Regression analysis when covariates are regression parameters of a random-effects model for observed longitudinal measurements. Biometrics. 2000;56:487–495. doi: 10.1111/j.0006-341x.2000.00487.x. [DOI] [PubMed] [Google Scholar]
  • 22.Wulfsohn MS, Tsiatis AA. A joint model for survival and longitudinal data measured with error. Biometrics. 1997;53:330–339. [PubMed] [Google Scholar]
  • 23.Zeng D, Cai J. Simultaneous modeling of survival and longitudinal data with an application to repeated quality of life measures. Lifetime Data Analysis. 2005;11:151–74. doi: 10.1007/s10985-004-0381-0. [DOI] [PubMed] [Google Scholar]
  • 24.Zhang D, Davidian M. Linear mixed models with flexible distributions of random effects for longitudinal data. Biometrics. 2001;57:795–802. doi: 10.1111/j.0006-341x.2001.00795.x. [DOI] [PubMed] [Google Scholar]

RESOURCES