Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Jan 1.
Published in final edited form as: Stat Methods Med Res. 2016 Jan 5;27(1):126–141. doi: 10.1177/0962280215623583

Probability of Atrial Fibrillation after Ablation: Using a Parametric NonLinear Temporal Decomposition Mixed Effects Model

Jeevanantham Rajeswaran a,*, Eugene H Blackstone a, John Ehrlinger a, Liang Li b, Hemant Ishwaran c, Michael K Parides d
PMCID: PMC5633490  NIHMSID: NIHMS867851  PMID: 26740575

Abstract

Atrial Fibrillation is an arrhythmic disorder where the electrical signals of the heart become irregular. The probability of Atrial Fibrillation (binary response) is often time varying in a structured fashion, as is the influence of associated risk factors. A generalized nonlinear mixed effects model is presented to estimate the time related probability of Atrial Fibrillation using a temporal decomposition approach to reveal the pattern of the probability of atrial fibrillation and their determinants. This methodology generalizes to patient-specific analysis of longitudinal binary data with possibly time varying effects of covariates and with different patient-specific random effects influencing different temporal phases. The motivation and application of this model is illustrated using longitudinally measured atrial fibrillation data obtained through weekly trans-telephonic monitoring from an NIH sponsored clinical trial being conducted by the Cardiothoracic Surgery Clinical Trials Network.

Keywords: binary longitudinal response, nonlinear model, mixed effects model, temporal decomposition, multiphase model, time varying coefficient

1 Introduction

Arrhythmias are cardiac disorders effecting the regular rhythmic beating of the heart. Atrial Fibrillation, one type of arrhythmia, involves different parts of the atria emitting uncoordinated electrical signals. This irregularity causes the heart to beat unevenly and too fast, which also prevents the heart from fully contracting. An estimated 2.5 million Americans are living with atrial fibrillation and estimates indicate as many as 12 million people will have the condition by 2050 (Lloyd-Jones et al. [1]). This makes it the most common serious heart rhythm abnormality. Though atrial fibrillation is not life threatening, if left untreated it may lead to serious heart related issues, such as stroke or congestive heart failure. Traditionally, atrial fibrillation has been medically treated with Aspirin or Warfarin. More recently surgical intervention or catheter ablation has gained widespread acceptance, particularly in patients having concomitant cardiac surgery.

The pathogenesis of atrial fibrillation is incompletely understood and the mechanism(s) of atrial fibrillation vary among affected individuals. The mechanisms are probably more complex than the discrete, well characterized causes of most other arrhythmias (Gillinov [2]). With this in mind, we focused on assessing the time varying probability of atrial fibrillation as a binary response, and endeavor to identify patient risk factors whose influence may also be time varying.

1.1 Literature Review

Longitudinal methods have been widely used in medicine and epidemiology to study the patterns of time varying variables, such as disease progression or trends of health status. In observational studies, one often encounters unbalanced longitudinal data, where each subject can have different number of measurements measured at different time points. In such situations, the correlation among the observations within a subject can be accounted for by using the unobservable random effects. Further, with patient data, it is also an important aim to study patient specific profiles to improve the quality of patient management. Laird and Ware [3] introduced linear random effects models (mixed effects model) for analyzing continuous response data. However, most of the temporal progression of biological events or biomarkers are nonlinear in nature (Muller and Rosner [4], Mikulich et. al. [5], Wu and Ding [6]). A thorough overview of the research literature on nonlinear mixed effects modeling, especially with a continuous response, can be found in Davidian and Giltinan [7], and Vonesh and Chichilli [8]. As link functions used for non-normal responses are already nonlinear in nature, most of the literature in nonlinear mixed effect modeling does still involve linear predictor relations. A comprehensive overview of the generalized linear and nonlinear mixed effects modeling for non-normal responses can be found in Molenberghs and Verbeke [9] or Vonesh [10].

Nonlinear longitudinal models have been widely used to model time varying clinical data. Most modeling approaches are based on nonparametric methods where, for example, an intercept coefficient is modeled as a function of time using a cubic spline (Guo [11]). “Compartmental” or “multi-phase” models have also been widely applied to pharmacokinetic data (Pinheiro [12], Vonesh [13], Molenberghs and Verbeke [9][ch.20]), bi-phase exponential decay model proposed by Wu [14] to fit a temporal trend of virus load in an AIDS study, and a multiphase model proposed by Rajeswaran and Blackstone [15] to fit a temporal trend of longitudinal continuous lung function data are examples of parametric nonlinear models where random effects enter the model nonlinearly. Faes et al. [16] used a nonlinear mixed effects model for binary response data.

Another important aspect of statistical analyses of longitudinal data is to evaluate the effect of covariates where the influence may change with time, coefficients are time-varying. Many time-varying coefficient models use nonparametric approaches (Wu and Yu [17], Wu and Zhang [18], Senturk and Muller [19]). Most of the proposed models are for continuous longitudinal responses. Here again, coefficients related to each covariate are modeled as a function of time. However, when the number of covariates in a model is large, as is the case in most observational studies, this approach becomes computationally expensive. In our model, we identify a set of covariates for each time phase, and the influence of these covariate is modulated by the corresponding nonlinear time function. The proposed model for binary longitudinal data is an extension of nonlinear mixed effects model for continuous data proposed by Rajeswaran and Blackstone [15].

1.2 Contribution and Outline

In this paper, we present a parametric, nonlinear mixed effects model to fit longitudinal binary data with two major aims in mind: i). We use multiple, “over-lapping” nonlinear functions of time to explicitly identify the time varying odds/probability of a longitudinal binary response; ii). We identify patient risk factors whose influence on the binary response may or may not change with time.

In our binary response model, random effects enter the model nonlinearly and can be extended to other response types by changing the link function. The lay out of the rest of the paper as follows:

In Section 2 we detail the Cardiothoracic Surgery Clinical Trials Network (CTSN) data that motivated the development of the proposed nonlinear mixed effects temporal decomposition model. In Section 3, we introduce a logistic nonlinear mixed effects model. In Section 4, we discuss the model parameter estimation process. We then demonstrate the application of this model using the longitudinal binary outcome for atrial fibrillation data in Section 5 and in Section 6 we perform a simulation study to assess the model performance. Concluding remarks are given in Section 7.

2 The Atrial Fibrillation Study

We investigate data obtained from an NIH sponsored multicenter randomized clinical trial being conducted by the CTSN involving two hundred and fourteen patients enrolled from January 2010 to July 2013. All patients with non-paroxysmal atrial fibrillation undergoing mitral valve procedure were eligible for this trial. Details of the study design are given in Gillinov et al. [20].

The presence atrial fibrillation or normal sinus rhythm was assessed by weekly Trans-telephonic Monitor (TTM) recording. Patients were requested to transmit rhythm data through normal telephone lines every week for 12 months. Patients who did not transmit at their weekly date were contacted by a research nurse and rhythms were obtained at that time. All the rhythm strips were adjudicated by a research nurse to identify the type of rhythm as atrial fibrillation (AFIB) or normal sinus rhythm (NSR). A total of 6709 rhythm records were available for the 214 patients with 20% of the patients have 50 or more records. The frequency plot in Figure 1 represents the number of patients against the number of TTM recordings for each patient.

Figure 1.

Figure 1

Frequency of patients with number of post operative Atrial Fibrillation TTM recordings.

Based on the trial protocol, each patient should have a maximum of 52 binary measurements (AFIB: yes/no). However, at the time of the data extraction for analysis we found that some patients had not completed a full 1-year follow-up, some had transmitted more frequently than every week and some less frequently. We also note that most rhythm data was not obtained at exact weekly intervals. Hence, as typical of observational studies, each patient may have a different number of longitudinal measures at varying points in time. This issue is illustrated in Figure 2, where we show repeated measures of binary TTM response data for 40 patients randomly selected from the full cohort of 214.

Figure 2.

Figure 2

Repeated rhythm (binary) data for 40 randomly selected patients in the descending order of date of surgery (from most recent at the top). Repeated rhythm data is shown horizontally with one row for each patient. Symbols depict normal sinus rhythm (NSR) (circle) and atrial fibrillation (AFIB) (cross) as binary indicators.

We use a crude binned averaging procedure to investigate of the shape of the time varying probability of atrial fibrillation. Figure 3 is constructed by partitioning follow up times into a number of disjoint groups and taking mean probability of atrial fibrillation. A loess nonparametric method is used to smooth the probability curve. Note that this binned averaging procedure does not take the repeated nature of this data into the account.

Figure 3.

Figure 3

Probability of Atrial Fibrillation based on the binned smoothers.

Figure 3 indicates that there is a higher probability of AFIB immediately after ablation which peaks around week two. The probability then decreased gradually to about 50% by 6 months and appears to only slightly decrease thereafter. We interpret this as the odds of atrial fibrillation peaking around two weeks after the procedure and gradually decreased until 6 months post procedure. The odds stayed relatively constant or increased thereafter at a slower rate if at all. From this, it appears that there may be two phases of odds; an early peaking phase followed by a constant or increasing phase of odds of AFIB.

To investigate the association between baseline covariates and the odds/probability of AFIB, we show the trend of probability of AFIB over time, similar to Figure 3, stratified by selected baseline variables in Figure 4. The figure shows that there is no appreciable difference in the probability of AFIB between age groups split at 75 years. Further, while BMI, diabetes and congestive heart failure (CHF) groups do not have much effect on the early probability, there is a large difference in the later probability of AFIB. The figure then supports the hypothesis that the effect of some risk factors change over time. With Figure 3 and Figure 4 as the motivation, in Section 3, we propose a logistic nonlinear mixed effects model to identify the constituents of temporal decomposition of the nonlinear trend and risk factors whose effects are modulated by the trends.

Figure 4.

Figure 4

Probability of AFIB based on the binned smoothers stratified by baseline covariates.

3 A Logistic nonlinear Mixed Effects Model

Let Yij be a binary response observed at time tij (j = 1, …, ki) for ith subject (i = 1, …, n), each with an associated set of covariate vector Xi of length p. We define the conditional probability πij = E(Yij|ui), where ui is a subject-specific vector of random effects. Suppose the time varying odds for subject i can be decomposed into overlapping time phases attenuated by possibly different sets of covariates Xil with corresponding regression coefficients βl (l = 1, …, ℒ), then we write a nonlinear mixed effects model in the odds domain as follows,

πij1πij=l=1Ψl(Xil,βl,uil)Λl(tij,Γl), (1)

where Λl(tij, Γl) (> 0) is a flexible parametric function depending only on time t and a shaping parameter vector Γl = (η,γ,t1/2); and Ψl(Xi, βl, ul) is a set of log linear mixed effects models such that Ψl(Xil,βl,uil) = exp{Xil,βl+uil}. Here ui=(ui1,,ui)~N(0,), a vector of phase specific random effects for subject i, and Yij|uiBinary(πij) are conditionally independent.

Formulation of Model (1) indicates that the multiple overlapping time phases of risk are additive in the odds domain, with each phase shaped by a function of time Λl(t, Γl) and scaled by a multiplicative function of concomitant information Ψl(Xil, βl, uil). This is similar to a formulation of a dynamic multiplicative-additive regression model to analyze survival data, as detailed in Martinussen and Scheike [21].

Further, by introducing phase specific random effects, we allow the possibility of different variability within different time phases. We rewrite the random effects for patient i across the phases such that the random effects differ proportionally. Here ui=(a1ui,,aui), where al is the random effect coefficient and ui~N(0,σ2). Hence, the variability of the random effect within each time phase l is given by al2σ2, for the identifiability purpose, we set a1 = 1. Suppose al = 1, for all l(1,,), the model (1) reduces to a simple random intercept model, with one common random effect.

Suppose there is a variable (or set of variables), such that a coefficient vector βc is common to all phases, the model (1) can be further simplified and the proposed logistic nonlinear mixed effect model is give as,

logit(πij)=log(πij1πij)=Xicβc+log(l=1Ψl(Xil,βl,al,ui)Λl(tij,Γl)). (2)

3.1 Time function Λ(t, Γ)

We use a generic family of nonlinear functions of time that was originally used to model the cumulative mortality by Blackstone et al. [22] and Hazelrig et al. [23] as the time function Λ(t, Γ) in our model. The generic family is given by,

G(t,Γ)=|η|η2|η|+η|η|[1+ϕ(γ)(|γ|γ2|γ|+|η|tρ)1/η]1/γ, (3)

where γ > 0 and/or η > 0, ϕ(γ) = γ if γ > 0, and ϕ(γ) = 1 if γ < 0. Shaping parameter vector Γ (γ, η, t1/2), and ρ is a function of t1/2, γ, and η. We define the parameter t1/2 as the time point t such that G(t1/2) = 1/2. Natural constraints of G are that G(0, Γ) = 0 and G(t, Γ) 1 as t → ∞. When γ < 0 and η < 0, G(0) 0. This violates the constrain and hence (3) does not exist for γ < 0 and η < 0. Hence, the formulation (3) simplifies into 3 cases when γ > 0 and η > 0; γ > 0 and η < 0; γ < 0 and η > 0;. See Online Appendix A for further details.

The motivation to use this type of time function in our modeling is that time varying odds of the AFIB in Figure 3 has a similar shape as a time varying hazard of death after a cardiac surgery. Further, this generic family can almost handle any shapes. The function G(t, Γ) or any of its transformations can be used as Λl(t, Γl). In our data analysis experience, the most frequently used early and late phase function are g(t,Γ)=G(t,Γ)t and, h(t,Γ)=g(t,Γ)1G(t,Γ), respectively. Note that in ’survival’ terminology, suppose G(t, Γ) is cumulative hazard, g(t, Γ) is the hazard. On the other hand, if G(t, Γ) is cumulative distribution function, h(t, Γ) is the hazard function. In our data analysis experience, most of the nonlinear trends can be modeled using only two phases and very rarely we use three phases. Detailed description of equations, limiting behavior and the shapes of some possible functions are given in Online Appendix A.

Development of the generic equation (3) took place over more than a decade in the 1970s, and represents a differential equation that Blackstone and colleagues ([23], [24]) formed from apparently disparate dynamic models of biochemical reaction rates, physical laws of thermodynamics, allometric growth, ecologic predator-prey phenomena, and population growth. They also established the relationship to certain statistical models. Special case models are found by setting exponents to ±1 and 0/∞ that lead to linear and various non-linear models, all nested by dint of the common generic differential equation.

Thus the models not only simplify (usually substantially), but have robust statistical properties useful in nonlinear iterative unbounded optimization. The idea of multiple phases arose out of the Makeham-Gompertz law of mortality (which has an age-dependent and an age independent component), from 1860. The 3-phase model for the time to event response, proposed by Blackstone et al. [22], permits multiple streams of concomitant information (including the same variables) to be estimated simultaneously. In the present paper, we have extended the approach to accommodate longitudinal response with the number of components on phases is unlimited. This permits us to characterize the temporal pattern and then independently modulate the pattern by phase-specific risk factors as well as modulating the entire course.

Further, the present formulation for longitudinal data differs from that was developed for survival (Blacksone et al. [3]) in that we have used a single flexible component (Equation 3) for all phases, because by setting the exponents to various values, the equation reduces to a constant, to a Weibull-type increasing, to a decreasing or peaking function or combination of these. Further, these are the shapes that we frequently observe in a temporal trend of biomakers or biologic mechanisms, such as AF, after cardiac surgery ([25], [26], [27], [28], [29]).

4 Estimation

Shaping parameters and concomitant information coefficients are estimated by the method of marginal maximum likelihood (Diggle et al. [30][p.172]). Let β=(βc,β1,,β), a=(a1,,a) and Γ=(Γ1,,Γ) then the likelihood function for the unknown parameter δ = (β, a, Γ, σ2) is

L(δ|yi)=i=1nf(yi|ui)f(ui;σ2)dui=i=1nj=1kiπijyij(1πij)yijϕ(ui|0,σ2)dui (4)

where f(yi|ui) is the conditional density of the binary longitudinal response, and f(ui; σ2) is the density of the random effect, ui, assumed to be a normal distribution with mean 0 and variance σ2, ϕ|0, σ2). Note that (4) is merely the marginal distribution Y obtained by integrating the joint distribution of Y and U with respect to U. The maximum likelihood estimates are obtained using the marginal likelihood function. For the Logistic mixed effects model (2), since the random effects is a nonlinear function of the conditional mean, there is no closed form solution for (4). In general, except for some special cases, (4) does not have closed form solution. Hence, it involves numerical integration. Note that, by assuming distribution of the random effect U a Beta distribution, Kleinman [31] obtained a closed form solution, a beta-binomial distribution. McCulloch [32] provides some details on using an extension of the EM algorithm for parameter estimation in generalized linear mixed effects models. Laplace approximation is another popular method for parameter estimation in nonlinear mixed effects models. However, Joe [33] cautioned in using this approach in non-normal scenario. For generalized linear mixed effects model, another estimation approach is to use Taylor series expansion around regression coefficient and/or around random effects (see for example, Breslow and Clayton [34]). In general, there are “exact” and “approximate” methods available for parameter estimation. The “approximate methods” such as Taylor series or Laplace approximation avoid integration. However, for non-normal responses these methods may lead to biased estimation (see for example, Wu [35][Ch 2]). Higher order approximations are proposed to improve the estimation (see for example, Lee et al. [36]). We use Gauss-Hermite quadrature, which is an “exact” method, for integration of (4) with respect to the random effect to determine the marginal likelihood function. However, it should be noted that this method is computationally expensive, and at times, unfeasible when the dimension of random effect is large. In our model, however, we have kept the dimension of the random effect at 1. We implemented the parameter estimation using PROC NLMIXED (SAS®, Inc., Cary, NC). Note that the implementation of this SAS procedure is mainly based on Pinheiro and Bates [12].

Note that, by using (2), we can estimate the conditional probability E(Y^ij|ui)=πij(u^i). If one is also interested in estimating the marginal probability E(Ŷj) from this model, one can integrate the conditional probability over the distribution of random intercept ui (Fitzmaurice et al. [37]) to obtain the marginal estimates. Then,

E(Y^j)=Eu(E(Y^ij|ui))=πij(u^i)f(ui;σ2)dui,

where f(ui; σ2) is the density function of ui. Note that, one can approximate the E(Y^j)1ni=1nE(Y^ij|ui).

5 Data Analysis

In this section, we first focus on the problem of explicitly modeling the nonlinear time varying trend of odds/probability of AFIB. We then focus on the factors associated with odds/probability of AFIB after the ablation procedure. In the multivariable analysis, we consider the following ten covariates: age at the time of the procedure; gender; race; histories of congestive heart failure, cardiovascular disease, hypertension, diabetes; creatinine, diastolic and systolic blood pressure, and international normalized ratio.

5.1 Temporal Decomposition and Trend

In this section, we consider the model (2) without covariates and the focus is on the estimation of shaping parameters Γs. Without covariates, the model (2) can be written as;

logit(πij)=log(l=1Ψl(β0l,al,ui)Λl(tij,Γl)), (5)

where β0l is phase specific intercept (fixed effect).

Temporal trend analysis yields a bi-phase model: An early peaking and a late increasing phase. Figure 5 shows the temporal decomposition of time varying odds for a “typical patient” or “average patient”, decomposition when ui = 0. Although the late increasing phase is very small for the “typical patient”, it can increase for some patients through the estimates of ui. The estimates and the standard error of the shaping parameters and standard deviation of distribution of ui are given in Table 1.

Figure 5.

Figure 5

The exact decomposition of temporal trend in odds domain for a typical patient (ui=0).

Table 1.

Estimates of shaping parameter of the temporal trend odds of. Note that, in the early phase m = 0 means Λ(t, Γ) = g(t, Γ) when η > 0 and γ → 0+ and in the late phase γ = 0 means Λ(t, Γ) = h(t, Γ) when η > 0 and m → 0+ (see Appendix A for further details). Further note that, β^02 is set to 0 when using h(t, Γ) as Λ(t, Γ).

Parameter Estimate ± SE P
Early peaking phase
β^01
2.6±0.23 <0.0001
η^
1
γ 0
t^1/2
1.05
Late increasing phase
β^02
0
η^
0.76±0.33 0.02
γ 0
t^1/2
7.9

Random effect coefficients
a1 1
a^2
4.2±0.69 <0.0001
var^(u)=σ^2
8.9±1.7 <0.0001

It can be noted here that the estimated standard deviation of the random effect in the early peaking phase is 3.0 and that of late increasing phase is 12.5. This suggest that there is a larger variability in subject-specific profiles in late time phase than in the early time phase. Based on the estimates in Table 1, the estimated multiphase temporal trend equation for odds of AFIB can be simplified in (6):

π^ij1π^ij=exp(2.6+ui)0.73tij2exp(0.73/tij)+exp(4.2ui)0.22(0.16tij)2.3. (6)

Remark

We have used Λ1(t, Γ1) = g(t, Γ) under the limiting Case 1 (Online Appendix A) as the early phase equation with shaping parameters η and γ are fixed at 1 and 0, respectively. We then used Λ2(t, Γ2) = h(t, Γ) under the limiting Case 1 as the late phase equation, with η is a positive value and γ is fixed at 0. To get mathematically tractable stable functions, when the estimate of η or γ is almost 0, we use one of the limiting cases and when the estimate of η and/or γ is almost 1 (not significantly different from 1), we simplify the G(t, Γ) by fixing η and/or γ at 1. Further, since in the transformed function h(t, Γ), t1/2 acts as the scalar, to avoid redundancy in the parametrization of covariate information in Ψ(·), whenever one uses h(t, Γ) as the time function we take the β0 as 0. Hence, phase identification and parameter estimation are determined by the data in an ad-hoc manner. The process is briefly as follows: Based on the overall binned smoother trend (Figure 3), we started with 2 phases: an early phase using g(t, Γ) with a smaller value (1 month) for t1/2; and a late phase using h(t, Γ) with a larger value (6 months) for t1/2 as starting values. For η and γ, we have tried 3 possible combinations (1,−1), (1,−1) and (1,1) as starting values for both phases and observed which combination gives a stable estimation with larger likelihood. For both phases, it turns out to be that the estimate of γ is almost 0. For η, early phase estimate is almost 1 (not significantly different from 1) and for the late phase, it is a positive value different from 0 or 1.

Figure 6 shows the patient-specific profiles and the average of the profiles and Figure 7 depicts the average of the profile superimposed on the binned smoothers. It can be noted from Figure 2 that some patients do not have data as follow-up gets longer. Therefore, not all the patients contribute to the binned averages as follow-up gets longer. However, average curve is obtained by averaging all patient-specific curves at each time point. This may explain the deviation between the estimated curve and the binned averages in Figure 7. Although the probability based on the average of the patient-specific profiles increased from about 0.45 to about 0.75 within a month after the ablation and gradually decreased to about 0.45 by month 6 and remained constant thereafter, there is a large variability in the patient-specific probability profiles of AFIB after ablation. By one year, around 45% of the patients have the probability less than 5% and 30% have the probability greater than 95%.

Figure 6.

Figure 6

Patient-specific probability profiles and the average of the profiles.

Figure 7.

Figure 7

Average of the patient-specific profiles and binned averages. Binned averages are the average of available data at various follow-up time intervals without taking into the consideration of the possibility that some of these observations are correlated, provided here as a crude verification of model fit.

An alternative Model

Using Akaike Information Criteria (AICc) we have compared the model (5) with the following simpler alternative model:

logit(πij)=log(l=1Ψl(β0l)Λl(tij,Γl))+vi, (7)

where V~N(0,τ2). Here, instead of patient-specific random effects for each phase, we simplify the model with patient-specific random intercepts.

Based on the AICc values (Main Model (6): AICc=3300.8; Alternative Model (7): AICc=3395.7), model (6) which has subject-specific random effects for each phase, is better than alternate model, a random intercept model (7) which has one common random effect.

5.2 Factors associated with temporal change odds of AFIB

We now illustrate the multivariable analysis to identify phase specific baseline covariates that are associated with AFIB after ablation using some selected variables. We have considered the following variables in the analyses: Demography (age, gender, race, body mass index (BMI)), cardiac comorbidity (congestive heart failure - (CHF), cardio vascular disease (CVD), hypertension (HTN)), non-cardiac comorbidity (diabetes, serum creatinine, diastolic and systolic blood pressure, international normalized ratio (INR)). Our objective here is to identify variables: 1. that are associated with the early outcomes; 2. that are associated with the late outcomes; 3. that influence the outcome regardless of the time phase. Because of limited built-in capability of performing variable selection using PROC NLMIXED, we have used an ad-hoc selection strategy as follows: We first force in a variable in each phase and noted its significance and the magnitude of the regression coefficients. If the magnitudes are at least approximately equal and are significant in at least one phase, we move this variable as a common variable. If it is, on the other hand, significant in both phases with different magnitudes, we will keep this variable in both phases. Finally, if the variable is significant only in one phase, we keep that variable only in that phase. We continue this “forward-selection like” process until we consider all the variables in the model.

Table 2 shows the patient-specific estimates of regression coefficients of the selected covariate that are significantly associated with post ablation probability of AFIB.

Table 2.

Patient-specific risk factors associated with AFIB

Parameter Estimate ± SE P
Overall
None
Early peaking phase
BMI 1.6±0.67 0.02
Late increasing phase
BMI −1.5±0.55 0.008
Diabetes 9.6±2.4 <0.0001
Congestive Heart Failure 6.0±1.9 0.002

Figure 8 depicts the estimated probability stratified by BMI, diabetes and congestive heart failure variables for a “typical patient”. Based on the limited variable selection and analysis, among the patient demographics, while larger body mass index is associated with early elevated risk of having AFIB, its direction of effect appears to change in the late odds of AFIB (top row - Figure 8). Notably, neither age nor gender have any impact on the likelihood AFIB. Having history of congestive heart failure and diabetes are also appear to be associated with late elevated odds of having AFIB. Particularly, diabetes appears have a larger impact on the likelihood of late AFIB (bottom row - Figure 8).

Figure 8.

Figure 8

Estimated probability of post-op AFIB for a “typical patient” (ui=0) stratified by different baseline covariates.

6 A Simulation Study

We assess the performance of the model (2) using a focused simulation study. The major objective of this study is to assess performance the shaping parameters and the regression coefficient estimators. The simulation study does not focus on model building, instead, given a model, we would like to assess how well the parameters are estimated.

We generated a binary longitudinal response for 250 subjects. Mimicking the data described in Section 2 (Figure 2), we assumed the first 150 patients have complete follow-up data and the remaining 100 patients have partial follow-up based on the staggering study entry (assumed linear). For the first 150 patients, we generated the data at the following 18 time points over a 12-month period: 1-day, 2-day, 3-day, 5-day, 1-week, 2-week, and 12 monthly time points at month 1 through month 12. For the remaining 100 patients, assuming they have a minimum of 3 months of follow-up, we generated data at the first 9 time points in the first 3 months and then depend on the study entry, we generated the remaining data. We have generated 1000 simulated datasets with sample size of 250.

For the temporal trend, we assume a bi-phase model with Λ1(t, Γ) = g(t, Γ), with Γ = (η = 1, γ = 0, t1/2 = 1), as the early phase (limiting Case 1 in Appendix A) and Λ2(t, Γ) = h(t, Γ), with Γ = (η = 0.5, γ = 0, t1/2 = 5), as the late phase (limiting Case 3 in Appendix A). Equation Λ1(t, Γ) with the selected values for the shaping parameters shows an early peaking function and Λ2(t, Γ) with the selected values gives a late increasing function (Figure 9). For simplicity, we generate 3 covariates as follows: V1Binary(0.25), V2~N(mean=0,sd=0.5), and V3Binary(0.4). V1 is an early phase factor that positively associated with the response; V3 is a late phase factor that positively associated with the response; V2 positively associated with the response in both phases with higher influence in early than in the late. The random effect ui~N(mean=0,sd=3) with coefficients (a1 = 1, a2 = 2) with a1 is fixed at 1 and a2 is estimated.

Figure 9.

Figure 9

True shapes of the phases for temporal trend in the simulated model for a typical patient (ui = 0). Dash-dash lines depict the shapes of the early and the late phases and the solid line depicts total odds.

6.1 Simulation Results

We now assess the performance of the logistic multiphase nonlinear mixed effects model based on simulated data using the following summary measures: Average Bias: %Bias=100×(θθ^¯)/|θ|, where B is the number of simulated datasets, θ is the true value, θ^i is the estimate from the ith simulated dataset and θ^¯=1/Bi=1B θ^i; Average within standard error, AvgSE=i=1BSE(θ^i)/B, where SE(θ^i) is the estimated standard error of θ^i; Empirical standard error, EmpSE=[1/(B1)]i=1B(θ^iθ^¯)2; 95% coverage probability, CP.

The summary measures of the shaping and regression coefficients of the covariates based on the 1000 simulated data with sample size 250, are given in Table 3.

Table 3.

Summary measures of the parameter estimates based on the 1000 simulated data with sample size 250.

Parameter True value % Bias AvgSE EmpSE CP
Early peaking phase
V0 −1 2.96 0.4244 0.0.4406 0.934
V1 1 −1.55 0.3382 0.3448 0.956
V2 1 −0.892 0.4992 0.4965 0.953
Shaping parameters
η 1 1.74 0.1581 0.1698 0.924
t1/2 1 −5.09 0.3926 0.4403 0.874

Late increasing phase
t1/2 5 −1.71 1.264 1.355 0.927
V2 0.5 6.22 0.8261 0.8224 0.950
V3 0.5 3.77 0.5552 0.5837 0.940
Shaping parameters
η −0.5 −0.356 0.0199 0.0197 0.946

Random effect
Var(U) = σ2 9 −1.48 2.472 2.581 0.928
Coefficient - a2 2 0.528 0.2872 0.2990 0.921

All the estimated parameters, except the t1/2 in the early phase and V2 in the late phase, have smaller bias. However, other summary measures of V2 appears to be closer to the expected values. This suggests, in general, our logistics nonlinear mixed effect model performs as expected in estimating the parameters.

6.2 Varying Sample Size

In this simulation scenario, we simulate three sample sizes: n = 500 - a large sample size; n = 250 - a moderate sample size; and n = 100 - a small sample size, and compare the influence of varying sample sizes on the performance of parameter estimation.

Table 4 shows the summary estimates for 3 different sample sizes. In general, as sample size decreases there appears to be some increase in bias and standard error of the parameter estimates. This is apparent, particularly when the sample size is 100. One reason may be that our model has larger number of parameters (12, in this simulated model) to be estimated. As a result, there may be decrease in efficiency with decreased sample size. To have a better performance in the parameter estimation, we may need at least a moderate sample size.

Table 4.

Summary measures of the parameter estimates based on the 1000 simulated data with varying sample size.

Sample Size
n=500 n=250 n=100

Parameter True value % Bias AvgSE EmpSE CP % Bias AvgSE EmpSE CP % Bias AvgSE EmpSE CP
Early peaking phase
V0 −1 −0.768 0.2921 0.2855 0.954 2.96 0.4244 0.4406 0.934 11.9 0.6961 0.7532 0.928
V1 1 −0.085 0.2313 0.2358 0.942 −1.55 0.3382 0.3448 0.956 −1.36 0.5738 0.5902 0.961
V2 1 0.793 0.3444 0.3450 0.944 −0.892 0.4992 0.4965 0.953 −2.45 0.8406 0.8616 0.954
Shaping parameters
η 1 0.984 0.1102 0.1085 0.943 1.74 0.1581 0.1698 0.924 4.40 0.2433 0.2845 0.848
t1/2 1 −1.39 0.2491 0.2610 0.932 −5.09 0.3926 0.4403 0.874 −18.8 0.8868 1.111 0.790

Late increasing phase
t1/2 5 1.07 0.8576 0.8578 0.922 −1.71 1.264 1.355 0.927 −16.6 2.439 3.191 0.941
V2 0.5 9.17 0.5808 0.5730 0.952 6.22 0.8261 0.8224 0.950 10.1 1.348 1.467 0.934
V3 0.5 0.691 0.3866 0.3918 0.946 3.77 0.5552 0.5837 0.940 −12.3 0.9148 0.9376 0.954
Shaping parameters
η −0.5 0.183 0.0140 0.0143 0.942 −0.356 0.0199 0.0197 0.946 0.272 0.0311 0.0332 0.933

Random effect
Var(U) = σ2 9 1.81 1.679 1.658 0.946 −1.48 2.472 2.581 0.928 −8.55 4.289 4.689 0.925
 Coefficient - a2 2 0.265 0.1993 0.2003 0.933 0.528 0.2872 0.2990 0.921 −1.80 0.4972 0.5322 0.938

6.3 Influence of Random Effect Coefficient

In this simulation scenario, we illustrate the influence of the random effect coefficient. We know that if the random effect coefficients are all equal to 1, the model can be simplified into a random intercept model. However, when the true coefficient is different from 1, performance of the model under the assumption of coefficients are all equal to 1 is not clear. We simulated data based on a bi-phase model as described at the beginning of this section with sample size equal to 250, but under two scenarios of random effect coefficient: i). a2 = 2; and ii). a2 = 1. We then assess the model performance under the the assumption of a2 = 1.

Table 5 shows the summary estimates based on the assumption of random intercept model (a2 = 1), for data simulated under two different true values of a2. As expected, when the true value of the coefficient is 1, the random intercept model performs very well. However, when the true value of the coefficient is 2, the random intercept model perform very poorly as this is equivalent to misspecifying the correlation structure.

Table 5.

Summary measures of the parameter estimates based on the 1000 simulated data with different random effect coefficients.

True Random Effect coefficient for phase 2
a2 =2 a2 = 1

Parameter True value % Bias AvgSE EmpSE CP % Bias AvgSE EmpSE CP
Early peaking phase
V0 −1 54.8 0.5421 0.6030 0.806 1.93 0.3351 0.3384 0.946
V1 1 −37.4 0.4369 0.5110 0.83 −1.60 0.3119 0.3141 0.949
V2 1 −46.1 0.7519 0.7603 0.908 −0.871 0.4581 0.4544 0.952
Shaping parameters
η 1 10.0 0.1552 0.1984 0.785 0.467 0.1034 0.1105 0.924
t1/2 1 −3.18 0.4524 1.606 0.780 −2.70 0.2535 0.2772 0.907

Late increasing phase
t1/2 5 17.3 0.8516 0.9273 0.704 −1.56 0.6373 0.6414 0.946
V2 0.5 10.0 0.6769 0.6826 0.947 2.33 0.4299 0.4264 0.954
V3 0.5 27.8 0.3876 0.4700 0.881 −0.458 0.2708 0.2695 0.954
Shaping parameters
η −0.5 8.52 0.0172 0.0169 0.288 −0.548 0.0189 0.0190 0.959

Random effect
Var (U) = σ2 9 −160 3.482 3.301 0.000 0.618 1.235 1.261 0.929

7 Conclusion

In this article, we have proposed and demonstrated a nonlinear multiphase mixed effects model to analyze longitudinal binary data. Our model is different from classical generalized linear mixed effects model in that, the time varying odds of an event can be decomposed into multiple overlapping linear or nonlinear time phases and thus, explicitly identifying the linear or nonlinear time varying trend. Further, each phase can have its own stream of concomitant information and patient specific random effect. In medical science, these two features are important components in patient management after a procedure. Knowledge of the both patient-specific/overall time varying temporal trend of an event and of the phase specific (time specific) factors that influence the early and late events would enable a physician to tailor the post procedure patient management according to patient risk factors. For example, in our data analysis, we have shown that higher body mass index is an early risk factor and diabetes is a late risk factor. Based on this information, a physician can closely monitor an obese patient right after the procedure and for diabetes patients, set up more follow-up visits, say, after 3 months. Further, the multiphase model we proposed is similar to time varying coefficient models. However, when analyzing this type of cardiac data, the main advantage of the multiphase model over the time varying coefficient models is that, with time varying coefficient models, coefficient for each covariate in the model is a smooth function of time and this may pose computational difficulties when there are many covariates. On the other hand, the multiphase model still works with large number of covariates, but with more restrictive model assumption that the covariates within a phase have similar shape of time varying effect. However, these assumptions are likely to be satisfied for cardiac data ([25], [26], [27], [28], [29]).

We have demonstrated the application of this model using a readily available software and this feature is another advantage of our model. Further, by reparameterization of patient specific random effects in each phase, we have reduced the dimension of random effects to one, regardless of number of phases. This has greatly reduced computational difficulties arising from numerical integration being used in parameter estimation.

We have used the cumulative mortality function used by Hazelrig et al. [23] as the generic time function for our analysis. This is a very flexible family of functions that can handle almost any shapes. In our data analysis experience in cardiac surgery, models with two phases are the most frequent, very rarely, a three phase model. We have never encountered a model with four phases. Hence, although the process of variable selection is ad-hoc, having to deal with only two (or at most three) phases makes the data analysis plausible and tractable. Variable selection in longitudinal models is an active area of research.

The model (2) can be easily extended to accommodate other type of longitudinal data - continuous, ordinal, and nominal - by changing the link functions related to the corresponding conditional distributions, and, in the case of ordinal data, introducing cut off parameters. These extensions are currently under active research in our group. Further, another active area of research currently under consideration is the extension of model (2) to handle multivariate longitudinal data of same type (for example, all continuous) or different types (continuous and binary).

Supplementary Material

Acknowledgments

This research was partially supported by a grant from National Institutes of Health - NATIONAL HEART, LUNG, AND BLOOD INSTITUTE, Grant Number: 1R01HL103552. The authors would like to thank the reviewers for the helpful and constructive comments.

References

  • 1.Lloyd-Jones D, Adams RJ, Brown TM, Carnethon M, Dai S, et al. Heart Disease Stroke Statistics–2010 Update A Report From the American Heart Association. Circulation. 2010;121:e1–e170. doi: 10.1161/CIRCULATIONAHA.109.192667. [DOI] [PubMed] [Google Scholar]
  • 2.Gillinov AM. Choice of Surgical Lesion Set: Answers from the Data. Annals of Thoracic Surgery. 2007;84:1786–92. doi: 10.1016/j.athoracsur.2007.05.040. [DOI] [PubMed] [Google Scholar]
  • 3.Laird NM, Ware JH. Random-Effects Models for Longitudinal Data. Biometrics. 1982;38:963–74. [PubMed] [Google Scholar]
  • 4.Muller P, Rosner GL. A Bayesian Population Model with Hierarchical Mixture Prior Applied to Blood count Data. Journal of American Statistical Association. 1997;92:1279–1292. [Google Scholar]
  • 5.Mikulich SK, Zerbe GO, Jones RH, Crowley TJ. Comparing Linear and Nonlinear mixed model approaches to Cosinor Analysis. Statistics in Medicine. 2003;22:3195–3211. doi: 10.1002/sim.1560. [DOI] [PubMed] [Google Scholar]
  • 6.Ding A, Wu H. Population HIV-1 Dynamics in Vivo: Applicable Models and Inferential tools for Virological Data from AIDS Clinical Trials. Biometrics. 1999;55:410–418. doi: 10.1111/j.0006-341x.1999.00410.x. [DOI] [PubMed] [Google Scholar]
  • 7.Davidian M, Giltinan DM. Nonlinear Models for Repeated Measurement Data. Chapman and Hall; New York: 1995. [Google Scholar]
  • 8.Vonesh EF, Chinchilli VG. Linear and Nonlinear Models for the Analysis of Repeated Measurements. Chapman and Hall; London: 1997. [Google Scholar]
  • 9.Molenberghs G, Verbeke G. Models for Discrete Longitudinal Data. Springer; New York: 2005. [Google Scholar]
  • 10.Vonesh EF. Generalized Linear and Nonlinear Models for Correlated Data: Theory and Applications Using SAS. SAS Institute Inc; Cary, NC, USA: 2012. [Google Scholar]
  • 11.Guo W. Functional data analysis in longitudinal setting using smoothing splines. Statistical Methods in Medical Research. 2004;13:49–62. doi: 10.1191/0962280204sm352ra. [DOI] [PubMed] [Google Scholar]
  • 12.Pinheiro JC, Bates DM. Approximations to the Log-likelihood Function in the Non-linear Mixed-Effects Model. Journal of Computational and Graphical Statistics. 1995;4:12–35. [Google Scholar]
  • 13.Vonesh EH. Non-Linear Models for the Analysis of Longitudinal Data. Statistics in Medicine. 1992;11:1929–1954. doi: 10.1002/sim.4780111413. [DOI] [PubMed] [Google Scholar]
  • 14.Wu L. A Joint Model for Nonlinear Mixed-Effects Models with Censoring and Covariates Measured with Error, With Application to AIDS Studies. Journal of American Statistical Association. 2002;97:955–964. [Google Scholar]
  • 15.Rajeswaran J, Blackstone EH. A Multiphase Non-Linear Mixed Effects Model: An Application to Spirometry after Lung Transplantation. Statistical Methods in Medical Research. 2014 Jun 11; doi: 10.1177/0962280214537255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Faes C, Aerts M, Geys H, Molenberghs G, Declerck L. Bayesian testing for trend in a power model for clustered binary data. Environmental and Ecological Statistics. 2004;11:305–322. [Google Scholar]
  • 17.Yu CO, Wu KF. Nonparametric Varying-Coefficient Models for the Analyses of Longitudinal Data. International Statistical Review. 2000;3:373–393. [Google Scholar]
  • 18.Wu H, Zhang JT. Nonparametric Regression Methods for Longitudinal Data Analysis: Mixed-Effects Modeling Approaches. John Wiley & Sons Inc; New York: 2006. [Google Scholar]
  • 19.Senturk D, Muller HG. Generalized varying coefficient models for longitudinal Data. Biometrika. 2008;95:653–666. [Google Scholar]
  • 20.Gillinov AM, Argenziano M, Blackstone EH, Iribarne A, DeRose JJ, et al. Designing comparative effectiveness trials of surgical ablation for atrial fibrillation: Experience of the Cardiothoracic Surgical Trials Network. Journal of Thoracic and Cardiovascular Surgery. 2011;142:257–264. doi: 10.1016/j.jtcvs.2011.04.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Martinussen T, Scheike TH. Dynamic Regression Models for Survival Data. Spinger; New York: 2005. [Google Scholar]
  • 22.Blackstone EH, Naftel DC, Turner ME., Jr The decomposition of time-varying hazard into phases, each incorporating a separate stream of concomitant information. J Am Stat Assoc. 1986;81:615–24. [Google Scholar]
  • 23.Hazelrig JB, Turner ME, Jr, Blackstone EH. Parametric Survival Analysis Combining Longitudinal and Cross-sectional Censored and Interval-Censored Data with Concomitant information. Biometrics. 1982;38:1–15. [Google Scholar]
  • 24.Turner ME, Hazelrig JB, Blackstone EH. Bounded Survival. Mathematical Biosciences. 1982;59:33–46. [Google Scholar]
  • 25.Banga A, Gildea T, Rajeswaran J, Rokadia H, Blackstone EH, Stoller JK. The Natural History of Lung Function after Lung Transplantation for a1-Antitrypsin Deficiency. American Journal of the Respiratory and Critical Care Medicine. 2014;190:274–281. doi: 10.1164/rccm.201401-0031OC. [DOI] [PubMed] [Google Scholar]
  • 26.Beach JM, Mihaljevic T, Rajeswaran J, Marwick T, Edwards ST, Nowicki ER, et al. Ventricular hypertrophy and left atrial dilatation persist and are associated with reduced survival after valve replacement for aortic stenosis. Journal of Thoracic and Cardiovascular Surgery. 2014;147:362–9. doi: 10.1016/j.jtcvs.2012.12.016. [DOI] [PubMed] [Google Scholar]
  • 27.Mason DP,JL, Rajeswaran J, Li L, Murthy SC, Su JW, Pettersson GB, Blackstone EH. Effect of changes in postoperative spirometry on survival after lung transplantation. Journal of Thoracic and Cardiovascular Surgery. 2012;144:197–203. doi: 10.1016/j.jtcvs.2012.03.028. [DOI] [PubMed] [Google Scholar]
  • 28.Mason DP, Rajeswaran J, Murthy SC, McNeill AM, Budev MM, Mehta AC, Pettersson GB, Blackstone EH. Spirometry After Transplantation: How Much Better Are Two Lungs Than One? Annals of Thoracic Surgery. 2008;85:1193–201. doi: 10.1016/j.athoracsur.2007.12.023. [DOI] [PubMed] [Google Scholar]
  • 29.Gillinov AM, Bhavani S, Blackstone EH, Rajeswaran J, Svensson LG, Navia JL, et al. Surgery for Permanent Atrial Fibrillation: Impact of Patient Factors and Lesion Set. Annals of Thoracic Surgery. 2006;82:502–14. doi: 10.1016/j.athoracsur.2006.02.030. [DOI] [PubMed] [Google Scholar]
  • 30.Diggle JD, Heagerty P, Liang KY, Zeger SL. Analysis of Longitudinal Data (2nd version) Oxford; New York: 2002. [Google Scholar]
  • 31.Kleinman J. Proportions with extraneous variance: single and independent samples. Journal of American Statistical Association. 1973;68:46–54. [Google Scholar]
  • 32.McCulloch CE. Maximum likelihood algorithm for generalized linear mixed models. Journal of American Statistical Association. 1997;92:162–170. [Google Scholar]
  • 33.Harry Joe. Accuracy of Laplace approximation for discrete response mixed models. Computational and Data Analysis. 2008;52:5066–74. [Google Scholar]
  • 34.Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. Journal of American Statistical Association. 1993;88:9–25. [Google Scholar]
  • 35.Wu L. Mixed Effects Models for Complex Data. Chapman and Hall; New York: 2010. [Google Scholar]
  • 36.Lee H, Nelder JA, Pawitan Y. Generalized Linear Models with Random Effects: Unified Analysis via H-likelihood. Chapman and Hall; New York: 2006. [Google Scholar]
  • 37.Fitzmaurice GM, Laird NM, Ware SL. Applied Longitudinal Analysis. John Wiley & Sons; 2002. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

RESOURCES