Misspecification of a binary dependent variable in the logistic model controlling for the repeated longitudinal measures

Chun-Chao Wang; Yi-Ting Hwang; Chung-Chuan Chou; Hui-Ling Lee

doi:10.1080/02664763.2021.1982877

. 2021 Oct 4;50(1):155–169. doi: 10.1080/02664763.2021.1982877

Misspecification of a binary dependent variable in the logistic model controlling for the repeated longitudinal measures

Chun-Chao Wang ^a, Yi-Ting Hwang ^a,^CONTACT, Chung-Chuan Chou ^b, Hui-Ling Lee ^c

PMCID: PMC9754046 PMID: 36530783

Abstract

Many medical applications are interested to know the disease status. The disease status can be related to multiple serial measurements. Nevertheless, owing to various reasons, the binary outcome can be measured incorrectly. The estimators derived from the misspecified outcome can be biased. This paper derives the complete data likelihood function to incorporate both the multiple serial measurements and the misspecified outcome. Owing to the latent variables, EM algorithm is used to derive the maximum-likelihood estimators. Monte Carlo simulations are conducted to compare the impact of misspecification on the estimates. A retrospective data for the recurrence of atrial fibrillation is used to illustrate the usage of the proposed model.

Keywords: Atrial fibrillation, EM algorithm, joint likelihood function, logistic regression, misspecification, random effect model

2020 Mathematics Subject Classification: 62P10

1. Introduction

Many medical applications are interested to identify the status of the disease. The status is often related to multiple serial measurements. For instance, atrial fibrillation (AF) is the most common cardiac arrhythmia and is associated with stroke, heart failure and increased mortality in addition to worsening of quality of life. Rhythm control of AF relies primarily on antiarrhythmic drugs treatment. Radiofrequency catheter ablation (RFCA) is considered as a standard procedure for drug-refractory symptomatic AF. With the improvement of technologies, RFCA could be superior to medical therapy for the maintenance of sinus rhythm. Nevertheless, the recurrence of AF after RFCA remains possible. Previous studies summarized the five-year outcomes of AF ablation and reported that the atrial tachyarrhythmias-free rate was 29% to 47% after index procedures and 63% to 80% after multiple procedures [15,20]. Investigating potential predictors associated with the recurrence of AF after RFCA may help physicians to identify high risking patients. Echocardiography is a common cardiac examination to assess the structure and function of the heart for AF patients. Left atrial (LA) size, left ventricular ejection fraction (LVEF), and mitral regurgitation (MR) are three commonly obtained echocardiographic parameters. [12,18] found that enlarged LA may predict the recurrence of AF after RFCA. Using the baseline echocardiography reports, [14] found that only LA size is associated with the recurrence of AF while neither LVEF nor MR is significantly associated with the duration from the time of the echocardiogram until the first recurrence of AF. Serial measurements of LA size and LVEF are thus needed to understand the long-term mutual influences among the LA size, LVEF, and AF recurrence. Lee et al. [10] used the two-stage model, where the first stage models the longitudinal influence of LA size and the second stage models the status of recurrence of AF based on the predicted value from the first stage, has better-predicted power than that based on only the baseline LA size.

Precise measurements for the binary outcome are required when using the maximum likelihood estimation. Nevertheless, owing to some reasons, the binary data might be not measured correctly. [1] mentioned that sexual maturation can be measured with sizeable diagnostic error. Furthermore, [11] revealed that there was net bias of 13% in mean estimates of the participants in the food stamp participation. In the AF situation, the physician would arrange regular follow-up after RFCA. The determination of the possible recurrence of the AF is often relied on the patients' self-reported symptoms of experiencing typical palpitation episodes (>30 s) [5]. However, AF recurrence requires an ambulatory electrocardiogram documentation at specific time points or when patients exhibited with symptoms; therefore, patients with asymptomatic AF between visits may not have been identified. This may have led to an underestimation of the risk of AF recurrence and be resulted in a misspecified status of recurrence.

Biased estimators and incorrect inferences could be derived when the fallible data are used [4]. Even lower rates of misclassification can lead to pronounced effects on the estimation and inference [2,3,13]. Hausman et al. [6] showed that a misspecified dependent variable in the logistic or probit model might result in biased or inconsistent estimators. By incorporating the latent construction proposed in [6], the bias of the estimators can be reduced.

Owing to some possibility of misspecification of the outcome, this paper incorporates the misspecification information in the joint model construction to increase the efficiency of the estimators. Section 2 formulates the joint model for the logistic model and multiple general linear models for longitudinal data that incorporates the latent information of misspecification. Section 2.1 describes the complete data likelihood function for the binary and multiple longitudinal data with misspecification and the numerical algorithm for obtaining the estimates. Simulations are conducted to evaluate the feasibility of the proposed methods in Section 3. A retrospective data of the recurrence of AF is used to illustrate the usage of the proposed model as given in Section 4. We conclude with a discussion in Section 5.

2. The proposed model

Suppose there are n subjects. Let ${\tilde{y}}_{i}$ denote the binary true status of an event of interest for the ith patient, where ${\tilde{y}}_{i} = 1$ denotes the event of interest occurred and ${\tilde{y}}_{i} = 0$ otherwise. Suppose the binary true status ${\tilde{y}}_{i}$ is subject to misspecification and the observed binary variable might be different from the true binary variable. Let $y_{i}$ denote the observed binary dependent variable. Following the setting considered in [6], the probability of misclassification is assumed to depend only on the value of ${\tilde{y}}_{i}$ . That is, the probabilities of misspecification are defined as

\begin{aligned} α_{0} & = P [y_{i} = 1 | {\tilde{y}}_{i} = 0], \end{aligned}

(1)

\begin{aligned} α_{1} & = P [y_{i} = 0 | {\tilde{y}}_{i} = 1] . \end{aligned}

(2)

For the ith subject, we assume $m_{i}$ serial measurements are taken at time and J time-dependent covariates are measured. For the ith patient, let $Z_{i j k}$ , $j = 1, \dots, J$ , denote the ith patient jth time-dependent covariate observed at time $t_{i j k}$ , $i = 1, \dots, N$ , $j = 1, \dots, J$ and $k = 1, \dots, m_{i}$ . Also, let $X_{i} = (X_{i 1}, X_{i 2}, \dots, X_{i r})^{'}$ denote r time-independent covariates, where $X_{i 1} = 1$ denotes the intercept.

Assuming there are measurement errors, the time-dependent longitudinal covariates can be defined as

Z_{i j k} = Z_{i j k}^{*} + e_{i j k},

(3)

where $Z_{i j k}^{*}$ denotes the true time-dependent longitudinal covariate of $Z_{i j k}$ , $e_{i j k}$ is the measurement error that is independent of and has a normal distribution with mean zero and variance $σ_{j}^{2}$ . In practice, the true time-dependent longitudinal covariate $Z_{i j k}^{*}$ is assumed to have a known polynomial function as

Z_{i j k}^{*} = \sum_{l = 0}^{q_{j} - 1} θ_{i j l} t_{i j k}^{l}

where $q_{j}$ is a given power of the polynomial for the jth covariate and the coefficients vector $θ_{i j} = (θ_{i j 0}, \dots, θ_{i j, q_{j} - 1})^{'}$ is assumed to be random effects and is assumed to be multi-normally distributed with mean $θ_{j} = (θ_{j 0}, \dots, θ_{j, q_{j} - 1})^{'}$ and covariance matrix $D_{j j}$ . The measurements taken on the same subject across different types of longitudinal measures indexed by j are likely to be correlated and the dependence through random effects is taken into account by $C o v (θ_{i j}, θ_{i s}) = D_{j s}$ , for $j \neq s$ .

Since (3) can provide the underlying time trajectory of the time-dependent covariates, the information can be used to predict the binary outcome. Nevertheless, to avoid collinearities, the time-dependent covariates at last time points are used to build the model. The area under the actual time trajectory of the time-dependent covariates might be an alternative choice for the time-dependent covariates. For the illustration, let the probability of event of interest be associated with the time-dependent covariates at last time points and be denoted as

\tilde{π} (X_{i}, Z_{i J}^{*}) = P [{\tilde{y}}_{i} = 1 | X_{i}, Z_{i}^{*}],

(4)

where $Z_{i}^{*} = (Z_{i 1 m_{1}}^{*}, Z_{i 2 m_{2}}^{*}, \dots, Z_{i J m_{J}}^{*})^{'}$ is the actual time trajectory of the time-dependent covariates at the last time point. The logistic regression is often used to model the binary response and is defined as

l o g i t [\tilde{π} (Z_{i}^{*}, X_{i})] = \log (\frac{\tilde{π} (Z_{i}^{*}, X_{i})}{1 - \tilde{π} (Z_{i}^{*}, X_{i})}) = β^{'} Z_{i}^{*} + γ^{'} X_{i},

(5)

where $β = (β_{1}, β_{2}, \dots, β_{J})^{'}$ and $γ = (γ_{1}, γ_{2}, \dots, γ_{r})^{'}$ . The probit link is another possible choice.

Under the misspecification, we are only able to observe $y_{i}$ . From (1), (2) and (4), the joint probability mass function of $y_{i}$ and ${\tilde{y}}_{i}$ is given as

\begin{aligned} \begin{aligned} P [y_{i} = 1, {\tilde{y}}_{i} = 1 | Z_{i}^{*}, X_{i}] & = (1 - α_{1}) \tilde{π} (Z_{i}^{*}, X_{i}), \\ P [y_{i} = 0, {\tilde{y}}_{i} = 1 | Z_{i}^{*}, X_{i}] & = α_{1} \tilde{π} (Z_{i}^{*}, X_{i}), \\ P [y_{i} = 1, {\tilde{y}}_{i} = 0 | Z_{i}^{*}, X_{i}] & = α_{0} (1 - \tilde{π} (Z_{i}^{*}, X_{i})), \\ P [y_{i} = 0, {\tilde{y}}_{i} = 0 | Z_{i}^{*}, X_{i}] & = (1 - α_{0}) (1 - \tilde{π} (Z_{i}^{*} . X_{i})), \end{aligned} \end{aligned}

(6)

which can be re-expressed as

\begin{aligned} f (y_{i}, {\tilde{y}}_{i} | Z_{i}^{*}, X_{i}) & = [(1 - α_{1}) \tilde{π} (Z_{i}^{*}, X_{i})]^{y_{i} {\tilde{y}}_{i}} [(1 - α_{0}) (1 - \tilde{π} (Z_{i}^{*}, X_{i}))]^{(1 - y_{i}) (1 - {\tilde{y}}_{i})} \\ \times [α_{1} \tilde{π} (Z_{i}^{*}, X_{i})]^{(1 - y_{i}) {\tilde{y}}_{i}} [α_{0} (1 - \tilde{π} (Z_{i}^{*}, X_{i}))]^{y_{i} (1 - {\tilde{y}}_{i})} \\ = (1 - α_{1})^{y_{i} {\tilde{y}}_{i}} α_{1}^{(1 - y_{i}) {\tilde{y}}_{i}} α_{0}^{y_{i} (1 - {\tilde{y}}_{i})} (1 - α_{0})^{(1 - y_{i}) (1 - {\tilde{y}}_{i})} \\ \times \tilde{π} (Z_{i}^{*}, X_{i})^{{\tilde{y}}_{i}} (1 - \tilde{π} (Z_{i}^{*}, X_{i}))^{1 - {\tilde{y}}_{i}}, y_{i} = 0, 1; {\tilde{y}}_{i} = 0, 1. \end{aligned}

(7)

To simplify derivations for estimations, the following notations are introduced. Let $Z_{i} = (Z_{i 1}^{'}, Z_{i 2}^{'}, \dots, Z_{i J}^{'})^{'}$ , where $Z_{i j} = (Z_{i j 1}, Z_{i j 2}, \dots, Z_{i j m_{i j}})^{'}$ denote the vector with all the observed longitudinal variables for subject i at all time points, and $θ_{i} = (θ_{i 1}^{'}, θ_{i 2}^{'}, \dots, θ_{i J}^{'})^{'}$ denote the vector of all random coefficients. Let $T_{i} = T_{i 1} \oplus T_{i 2} \oplus \dots \oplus T_{i J}$ , where

T_{i j} = [\begin{array}{ccccc} 1 & t_{i j 1} & t_{i j 1}^{2} & \dots & t_{i j 1}^{q_{j} - 1} \\ 1 & t_{i j 2} & t_{i j 2}^{2} & \dots & t_{i j 2}^{q_{j} - 1} \\ ⋮ & ⋮ \\ 1 & t_{i j m_{i j}} & t_{i j m_{i j}}^{2} & \dots & t_{i j m_{i j}}^{q_{j} - 1} \end{array}],

and

T_{i 1} \oplus T_{i 2} = [\begin{array}{cc} T_{i 1} & 0 \\ 0 & T_{i 2} \end{array}] .

The matrix representation of (3) is

Z_{i} = T_{i} θ_{i} + e_{i},

(8)

where $e_{i} = (e_{i 1}^{'}, e_{i 2}^{'}, \dots, e_{i J}^{'})^{'}$ and $e_{i j} = (e_{i j 1}, e_{i j 2}, \dots, e_{i j m_{i j}})^{'}$ , $j = 1, 2, \dots, J$ . Finally, let $Σ_{i} = σ_{1}^{2} I_{m_{i 1}} \oplus σ_{2}^{2} I_{m_{i 2}} \oplus \dots \oplus σ_{J}^{2} I_{m_{i J}}$ , where $I_{m_{i j}}$ is an identity matrix of size $m_{i j}$ . The matrix expression for the distributional assumption in (3) is

θ_{i} \sim MVN (θ, D),

(9)

where $θ = (θ_{1}^{'}, θ_{2}^{'}, \dots, θ_{J}^{'})$ and

D = (\begin{array}{cccc} D_{11} & D_{12} & \dots & D_{1 J} \\ D_{21} & D_{22} & \dots & D_{2 J} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ D_{J 1} & D_{J 2} & \dots & D_{J J} \end{array}) .

The complete data likelihood function of $(y_{i}, {\tilde{y}}_{i}, Z_{i}, θ_{i})$ , $i = 1, 2, \dots, n$ , is

\prod_{i = 1}^{n} f (Z_{i} | θ_{i}) f (θ_{i} | θ, D) f (y_{i}, {\tilde{y}}_{i} | α_{0}, α_{1}, θ_{i}, X_{i}, β, γ),

(10)

where $f (y_{i}, {\tilde{y}}_{i} | α_{0}, α_{1}, θ_{i}, X_{i}, β, γ)$ is defined in (7), and

\begin{aligned} f (Z_{i} | θ_{i}, Σ_{i}) & = \frac{\exp [- \frac{1}{2} (Z_{i} - T_{i} θ_{i})^{'} Σ_{i}^{- 1} (Z_{i} - T_{i} θ_{i})]}{(2 π)^{\sum_{j = 1}^{j} m_{i j} / 2} | Σ_{i} |^{\frac{1}{2}}}, \end{aligned}

(11)

\begin{aligned} f (θ_{i} | θ, D) & = \frac{\exp [- \frac{1}{2} (θ_{i} - θ)^{'} D^{- 1} (θ_{i} - θ)]}{(2 π)^{\sum_{j} q_{j} / 2} | D |^{\frac{1}{2}}} . \end{aligned}

(12)

2.1. Estimation

Since $θ_{i}$ and ${\tilde{y}}_{i}$ , $i = 1, 2, \dots, n,$ are unobservable, it is not possible to obtain MLEs explicitly. The numerical solution is obtained by the EM algorithm. Let $Ω = (α_{0}, α_{1}, θ, D, β, γ, Σ)$ denote the parameter space. The process of EM algorithm is given as follows:

E step

Since $Z_{i} \sim MVN (T_{i} θ, T_{i} D T_{i}^{'} + Σ_{i})$ and $Cov (Z_{i}, θ_{i}) = T_{i} D$ , the distribution of $θ_{i} | Z_{i}$ is

θ_{i} | Z_{i} \sim MVN (θ + V_{i} [T_{i}^{'} Σ_{i}^{- 1} (Z_{i} - T_{i} θ)], V_{i}),

(13)

where $V_{i} = (T_{i}^{'} Σ_{i}^{- 1} T_{i} + D^{- 1})^{- 1}$ . The latent variables $θ_{i}$ are generated from this conditional distribution.

From (7), the marginal probability of $y_{i}$ equals

\begin{aligned} P [y_{i} = 1 | Z_{i}^{*}, X_{i}] & = α_{0} + (1 - α_{0} - α_{1}) \tilde{π} (Z_{i}^{*}, X_{i}), \\ P [y_{i} = 0 | Z_{i}^{*}, X_{i}] & = α_{1} \tilde{π} (Z_{i}^{*}, X_{i}) + (1 - α_{0}) \tilde{π} (Z_{i}^{*}, X_{i}) . \end{aligned}

(14)

Given observed data, the conditional expectation of ${\tilde{y}}_{i}$ is obtained from (7) and (14) as

\begin{aligned} E [{\tilde{y}}_{i} | y_{i} = 0, Z_{i}^{*}, X_{i}] & = \frac{α_{1} \tilde{π} (Z_{i}^{*}, X_{i})}{α_{1} \tilde{π} (Z_{i}^{*}, X_{i}) + (1 - α_{0}) (1 - \tilde{π} (Z_{i}^{*}, X_{i}))}, \\ E [{\tilde{y}}_{i} | y_{i} = 1, Z_{i}^{*}, X_{i}] & = \frac{(1 - α_{1}) \tilde{π} (Z_{i}^{*}, X_{i})}{(1 - α_{1}) \tilde{π} (Z_{i}^{*}, X_{i}) + α_{0} (1 - \tilde{π} (Z_{i}^{*}, X_{i}))} . \end{aligned}

The unobservable quantities, ${\tilde{y}}_{i}$ , $i = 1, \dots, n$ , can be also generated from the conditional distribution using $E [{\tilde{y}}_{i} | y_{i} = 0, Z_{i}^{*}, X_{i}]$ and $E [{\tilde{y}}_{i} | y_{i} = 1, Z_{i}^{*}, X_{i}]$ . Nevertheless, since ${\tilde{y}}_{i}$ only has two possible values, 0 and 1, the estimates of $α_{0}$ and $α_{1}$ oscillate dramatically. Thus, for the M step, instead of using the generated random sample of ${\tilde{y}}_{i}$ , $E [{\tilde{y}}_{i} | y_{i} = d, Z_{i}^{*}, X_{i}]$ , d = 0, 1, is used. To simplify the notation, $E [{\tilde{y}}_{i} | y_{i} = d, Z_{i}^{*}, X_{i}]$ is denoted as $E^{*} [{\tilde{y}}_{i}]$ .

M step

From (10), closed forms for the maximum likelihood estimators of some parameters are given as

\begin{aligned} \hat{θ} & = \frac{\sum_{i = 1}^{n} E_{i} (θ_{i})}{n}, \end{aligned}

(15)

\begin{aligned} \hat{D} & = \frac{\sum_{i = 1}^{n} E_{i} (θ_{i} - θ) (θ_{i} - θ)^{'}}{n}, \end{aligned}

(16)

\begin{aligned} {\hat{σ}}_{j}^{2} & = \frac{\sum_{i = 1}^{n} \sum_{k = 1}^{m_{i j}} E_{i} [Z_{i j k} - T_{i j} θ_{i j}]^{2}}{\sum_{i = 1}^{n} m_{i j}}, j = 1, \dots, J . \end{aligned}

(17)

The details for obtaining (15)–(17) are given in Appendices 1 and 2.

Estimators for the parameters in the logistic regression are derived numerically from the complete log-likelihood function

\begin{aligned} Q (α_{0}, α_{1}, β, γ) & = \log (1 - α_{1}) \sum_{i = 1}^{n} y_{i} E^{*} [{\tilde{y}}_{i}] + \log (α_{1}) \sum_{i = 1}^{n} (1 - y_{i}) E^{*} [{\tilde{y}}_{i}] \\ + \log (α_{0}) \sum_{i = 1}^{n} y_{i} (1 - E^{*} [{\tilde{y}}_{i}]) + \log (1 - α_{0}) \sum_{i = 1}^{n} (1 - y_{i}) (1 - E^{*} [{\tilde{y}}_{i}]) \\ + \sum_{i = 1}^{n} E^{*} [{\tilde{y}}_{i}] \log (\tilde{π} (Z_{i}^{*}, X_{i})) + \sum_{i = 1}^{n} (1 - E^{*} [{\tilde{y}}_{i}]) \log (1 - \tilde{π} (Z_{i}^{*}, X_{i})) . \end{aligned}

The MLEs of $α_{0}$ and $α_{1}$ are derived explicitly as

\begin{aligned} {\hat{α}}_{0} & = \frac{\sum_{i = 1}^{n} y_{i} (1 - E^{*} [{\tilde{y}}_{i}])}{n - \sum_{i = 1}^{n} E^{*} [{\tilde{y}}_{i}]}, \end{aligned}

(18)

\begin{aligned} {\hat{α}}_{1} & = \frac{\sum_{i = 1}^{n} E^{*} [{\tilde{y}}_{i}] (1 - y_{i})}{\sum_{i = 1}^{n} E^{*} [{\tilde{y}}_{i}]} . \end{aligned}

(19)

The function ‘fminsearch’ in MATLAB is used to find $\hat{β}$ and $\hat{γ}$ [8].

Standard error of estimators

The standard errors of these estimators are obtained by the bootstrap sampling. Procedures for computing the estimators are as follows:

Procedure 1 Use (13) to generate M samples of $θ_{i}$ . Procedure 2 Use (A4) to find the conditional expectation in (15), (16) and (17) to compute $\hat{θ}$ , $\hat{D}$ and ${\hat{σ}}_{j}^{2}, j = 1, \dots, J$ . Procedure 3 Obtain ${\hat{α}}_{0}$ , ${\hat{α}}_{1}$ , $\hat{β}$ and $\hat{γ}$ .

Let b = 1. The standard errors of the estimators are obtaining using the bootstrap as

Procedure A Use $Z_{i}$ to generate a bootstrap sample. Procedure B Based on this bootstrap sample, use Procedure 1 to Procedure 3 to find estimates of $Ω$ and denote as ${\hat{Ω}}_{b}^{⋆}$ . Procedure C Set b = b + 1 and repeat Procedures A and B for W times. Procedure D Compute

Cov ({\hat{Ω}}^{⋆}) = \frac{\sum_{b = 1}^{W} ({\hat{Ω}}_{b}^{⋆} - {\bar{Ω}}^{⋆}) ({\hat{Ω}}_{b}^{⋆} - {\bar{Ω}}^{⋆})^{'}}{W - 1},

where ${\bar{Ω}}^{⋆} = \frac{\sum_{b = 1}^{W} {\hat{Ω}}_{b}^{⋆}}{W} .$

3. Simulations

Monte Carlo simulations are used to demonstrate the feasibility of the proposed model. We consider two longitudinal measures $(J = 2)$ and one fixed binary effect. To be consistent with the model constructed for the case study, a piecewise linear mean model with one knot located at 1 is assumed. Let $(x)_{+}$ be x if x>0 and be 0 if $x \leq 0$ . Then, the observed data for the longitudinal measures are generated from

Z_{i j k} = Z_{i j k}^{*} + e_{i j k} = θ_{i j 0} + θ_{i j 1} t_{i j k} + θ_{i j 2} (t_{i j k} - t^{*})_{+} + e_{i j k},

(20)

$i = 1, \dots, n, j = 1, 2, k = 1, \dots, 7$ , where $t^{*} = 1$ and $θ_{i} \sim M V N (θ, D)$ with

θ = (\begin{matrix} θ_{1} \\ θ_{2} \end{matrix}) = (\begin{matrix} θ_{10} \\ θ_{11} \\ θ_{12} \\ θ_{20} \\ θ_{21} \\ θ_{22} \end{matrix}), D = (\begin{array}{cc} D_{11} & D_{12} \\ D_{21} & D_{22} \end{array}),

and the ijth element of $D_{r s}$ are denoted as $d_{r s}^{i j}$ , i, j = 1, 2, 3, r, s = 1, 2. Furthermore, the distribution of random error $e_{i j k}$ is normal with mean 0 and variance $σ_{j}^{2}$ , $j = 1, 2.$ The event status is assumed to be related to two longitudinal measures and one fixed effect and is simulated from

l o g i t [\tilde{π} (Z_{i}^{*}, X_{i})] = β_{1} Z_{i 1 m_{i 1}}^{*} + β_{2} Z_{i 2 m_{i 2}}^{*} + γ_{0} + γ_{1} X_{i},

where $X_{i}$ only takes two values, say 0 and 1. When $π (Z_{i}^{*}, X_{i}) > 0.5$ , ${\tilde{y}}_{i} = 1$ .

The probabilities of misspecifications are set to be $(α_{0}, α_{1}) = (0.05, 0.2), (0.1, 0.05)$ . To generate $y_{i}$ , a uniform random number U is generated. When ${\tilde{y}}_{i} = 0$ , $y_{i} = 1$ if $U < α_{0}$ and $y_{i} = 0$ otherwise. Similarly, when ${\tilde{y}}_{i} = 1$ , $y_{i} = 0$ if $U < α_{1}$ and $y_{i} = 1$ otherwise. Finally, n is set to equal 500 and 1000. Table 1 provides the detailed parameter settings. For each scenario, the estimates are computed based on 500 simulated samples. Four indices, estimates, the standard error of estimators (SSE), the mean of the standard error of estimator (SEE) and the 95% coverage probability (CP), are computed, where SSE is the sample variance of parameter estimates obtained from 500 random samples. Following [7], the standard error of each estimate is obtained based on 40 bootstrap samples.

Table 1.

Comparison of parameter estimates for joint models with and without misspecification based on n = 500.

		Model
		Without misspecification ( $M_{2}$ )				With misspecification ( $M_{1}$ )
Parameter	True value	Estimate	SEE	SSE	CP	Estimate	SEE	SSE	CP
$α_{0}$	0.05					0.054	0.029	0.021	0.95
$α_{1}$	0.20					0.192	0.098	0.078	0.95
$β_{1}$	−0.05	−0.03	0.004	0.004	0	−0.071	0.034	0.021	0.84
$β_{2}$	0.05	0.03	0.004	0.003	0	0.072	0.035	0.021	0.82
$γ_{0}$	0.5	−0.122	0.373	0.374	0.61	0.785	1.458	1.146	0.92
$γ_{1}$	0.5	0.304	0.234	0.27	0.89	0.7	0.885	0.669	0.93
$θ_{01}$	150	150.054	0.255	0.264	0.96	150.044	0.257	0.269	0.95
$θ_{11}$	−9	−9.052	0.295	0.293	0.95	−9.063	0.293	0.294	0.95
$θ_{21}$	−9	9.072	0.301	0.316	0.94	9.068	0.3	0.305	0.94
$θ_{02}$	90	89.998	0.299	0.289	0.94	89.979	0.296	0.31	0.95
$θ_{12}$	3	3.025	0.327	0.323	0.93	3.016	0.322	0.354	0.95
$θ_{22}$	−3	−3.012	0.323	0.334	0.94	−3.024	0.316	0.33	0.94
$d_{11}^{11}$	9	9.482	1.553	1.462	0.94	9.444	1.556	1.437	0.93
$d_{11}^{12}$	5	4.687	0.869	0.725	0.93	4.625	0.877	0.803	0.93
$d_{11}^{13}$	5	5.3	1.167	1.084	0.95	5.322	1.176	1.09	0.95
$d_{11}^{22}$	7	7.579	1.35	1.149	0.93	7.594	1.376	1.322	0.93
$d_{11}^{23}$	5	4.421	0.625	0.474	0.75	4.391	0.625	0.495	0.77
$d_{11}^{33}$	5	5.569	1.187	1.101	0.91	5.569	1.189	1.091	0.92
$d_{12}^{11}$	9	8.941	1.489	1.519	0.97	8.977	1.503	1.458	0.96
$d_{12}^{12}$	5	5	1.445	1.347	0.94	4.872	1.415	1.355	0.95
$d_{12}^{13}$	5	5.07	1.374	1.467	0.95	5.141	1.367	1.353	0.96
$d_{12}^{22}$	7	7.158	1.367	1.177	0.95	7.135	1.39	1.24	0.94
$d_{12}^{23}$	5	4.615	1.223	1.061	0.93	4.772	1.226	1.221	0.95
$d_{12}^{33}$	5	5.265	1.216	1.178	0.95	5.262	1.204	1.132	0.94
$d_{22}^{11}$	20	20.39	2.621	2.761	0.95	20.588	2.634	2.56	0.94
$d_{22}^{12}$	10	9.08	1.837	1.811	0.92	9.194	1.816	1.75	0.94
$d_{22}^{13}$	10	10.639	2.241	2.424	0.95	10.777	2.233	2.12	0.93
$d_{22}^{22}$	15	16.275	2.659	2.307	0.92	16.416	2.706	2.588	0.92
$d_{22}^{23}$	8	6.546	1.773	1.643	0.86	6.583	1.77	1.576	0.87
$d_{22}^{33}$	10	11.313	2.444	2.36	0.92	11.373	2.421	2.17	0.89
$σ_{1}^{2}$	25	24.899	0.684	0.722	0.95	24.798	0.691	0.717	0.94
$σ_{2}^{2}$	25	24.804	0.719	0.79	0.95	24.757	0.731	0.713	0.94

Open in a new tab

Notes: SEE is the mean of standard error of estimator. SSE is the standard error of estimators. CP is the 95% coverage probability.

When n = 500, Table 1 displays estimates derived from the proposed likelihood function denoted as $M_{1}$ and the joint likelihood function derived in [7] denoted as $M_{2}$ . The four indices for $M_{1}$ and $M_{2}$ are similar for the random effect models. However, the estimates for $β_{1}$ , $β_{2}$ , $γ_{0}$ and $γ_{1}$ in the logistic model are very different from two models. Specifically, the bias of the estimate of $γ_{0}$ for $M_{2}$ is negative and is larger than the true value. The bias of the estimate of $γ_{1}$ for $M_{2}$ is also negative but is slightly smaller. The biases for the estimates of $γ_{0}$ and $γ_{1}$ for $M_{1}$ are positive and are slightly smaller than those derived from $M_{2}$ . The SEE and SSE of the estimates of $γ_{0}$ and $γ_{1}$ for $M_{1}$ are much larger than those for $M_{2}$ . In turn, the CP of the estimates of $γ_{0}$ and $γ_{1}$ for $M_{1}$ is much closer to the nominal CP than those for $M_{2}$ . Moreover, the absolute bias of estimates of $β_{1}$ and $β_{2}$ for $M_{1}$ and $M_{2}$ is similar. Nevertheless, the SEE and SSE of the estimates of $β_{1}$ and $β_{2}$ for $M_{1}$ are 10 times larger than those for $M_{2}$ . In turn, the CP of the estimates of $β_{1}$ and $β_{2}$ for $M_{2}$ equals zero.

To better understand the performance of the estimates in turns of sample sizes, an additional simulation for n = 1000 is conducted and is displayed in Table 2. The estimates of the parameters in the random effect models are similar. Expectedly, the SEE and SSE of the estimates are smaller when n increases. The CP of the estimates are closer to the nominal level. Furthermore, the estimates of the parameters in the logistic regression are less biased and have smaller SEE and SSE. The CP of the estimates are also closer to the nominal level.

Table 2.

Parameter estimates for the proposed models based on 500 simulation samples when sample sizes vary.

		Sample size
		n=500				n=1000
Parameter	True value	Estimate	SEE	SSE	CP	Estimate	SEE	SSE	CP
$α_{0}$	0.05	0.054	0.029	0.021	0.95	0.048	0.018	0.016	0.94
$α_{1}$	0.2	0.192	0.098	0.078	0.95	0.18	0.071	0.068	0.93
$β_{1}$	−0.05	−0.071	0.034	0.021	0.84	−0.054	0.012	0.011	0.93
$β_{2}$	0.05	0.072	0.035	0.021	0.82	0.054	0.012	0.011	0.92
$γ_{0}$	0.5	0.785	1.458	1.146	0.92	0.544	0.592	0.568	0.94
$γ_{1}$	0.5	0.7	0.885	0.669	0.93	0.547	0.334	0.327	0.96
$θ_{01}$	150	150.044	0.257	0.269	0.95	150.018	0.182	0.172	0.95
$θ_{11}$	−9	−9.063	0.293	0.294	0.95	−9.048	0.207	0.21	0.94
$θ_{21}$	−9	9.068	0.300	0.305	0.94	9.043	0.211	0.204	0.94
$θ_{02}$	90	89.979	0.296	0.31	0.95	89.987	0.21	0.205	0.95
$θ_{12}$	3	3.016	0.322	0.354	0.95	3.014	0.228	0.229	0.96
$θ_{22}$	−3	−3.024	0.316	0.330	0.94	−3.026	0.226	0.226	0.95
$d_{11}^{11}$	9	9.444	1.556	1.437	0.93	9.366	1.175	1.099	0.93
$d_{11}^{12}$	5	4.625	0.877	0.803	0.93	4.758	0.611	0.561	0.92
$d_{11}^{13}$	5	5.322	1.176	1.090	0.95	5.217	0.833	0.811	0.93
$d_{11}^{22}$	7	7.594	1.376	1.322	0.93	7.359	0.976	0.912	0.92
$d_{11}^{23}$	5	4.391	0.625	0.495	0.77	4.664	0.394	0.336	0.84
$d_{11}^{33}$	5	5.569	1.189	1.091	0.92	5.311	0.825	0.778	0.94
$d_{12}^{11}$	9	8.977	1.503	1.458	0.96	8.994	1.076	1.064	0.96
$d_{12}^{12}$	5	4.872	1.415	1.355	0.95	4.978	1.019	0.959	0.95
$d_{12}^{13}$	5	5.141	1.367	1.353	0.96	5.078	0.982	0.947	0.94
$d_{12}^{22}$	7	7.135	1.390	1.24	0.94	7.076	0.969	0.919	0.95
$d_{12}^{23}$	5	4.772	1.226	1.221	0.95	4.87	0.859	0.830	0.94
$d_{12}^{33}$	5	5.262	1.204	1.132	0.94	5.197	0.841	0.802	0.94
$d_{22}^{11}$	20	20.588	2.634	2.56	0.94	20.416	1.873	1.953	0.94
$d_{22}^{12}$	10	9.194	1.816	1.75	0.94	9.518	1.290	1.235	0.93
$d_{22}^{13}$	10	10.777	2.233	2.12	0.93	10.529	1.576	1.595	0.93
$d_{22}^{22}$	15	16.416	2.706	2.588	0.92	15.681	1.871	1.779	0.93
$d_{22}^{23}$	8	6.583	1.770	1.576	0.87	7.197	1.209	1.122	0.89
$d_{22}^{33}$	10	11.373	2.421	2.170	0.89	10.882	1.700	1.632	0.92
$σ_{1}^{2}$	25	24.798	0.691	0.717	0.94	24.863	0.489	0.507	0.94
$σ_{2}^{2}$	25	24.757	0.731	0.713	0.94	24.853	0.520	0.547	0.93

Open in a new tab

Notes: SEE is the mean of standard error of estimator. SSE is the standard error of estimators. CP is the 95% coverage probability.

Table 3 provides the simulation result for the second misspecification setting when $α_{0} = 0.1$ and $α_{1} = 0.05$ and the other parameters are the same as those in Table 1. This setting is mimicking the situation for the real data. The performance of the estimates is similar to that in Table 1.

Table 3.

Parameter estimates for the proposed models based on 500 simulation samples assuming $α_{0} = 0.1$ and $α_{1} = 0.05$ .

		Sample size
		n=500
Parameter	True value	Estimate	SEE	SSE	CP
$α_{0}$	0.10	0.103	0.036	0.029	0.94
$α_{1}$	0.05	0.070	0.059	0.050	0.93
$β_{1}$	−0.05	−0.071	0.032	0.024	0.88
$β_{2}$	0.05	0.071	0.032	0.024	0.88
$γ_{0}$	0.5	0.778	1.207	0.996	0.93
$γ_{1}$	0.5	0.686	0.752	0.577	0.93
$θ_{01}$	150	150.058	0.256	0.271	0.94
$θ_{11}$	−9	−9.059	0.292	0.306	0.95
$θ_{21}$	−9	9.068	0.300	0.309	0.95
$θ_{02}$	90	90.010	0.299	0.326	0.93
$θ_{12}$	3	3.013	0.324	0.314	0.95
$θ_{22}$	−3	−3.001	0.319	0.339	0.95
$d_{11}^{11}$	9	9.520	1.667	1.577	0.94
$d_{11}^{12}$	5	4.445	1.008	0.995	0.91
$d_{11}^{13}$	5	5.509	1.373	1.288	0.93
$d_{11}^{22}$	7	7.906	1.595	1.566	0.92
$d_{11}^{23}$	5	4.035	0.783	0.649	0.73
$d_{11}^{33}$	5	5.976	1.429	1.316	0.88
$d_{12}^{11}$	9	8.940	1.562	1.524	0.95
$d_{12}^{12}$	5	4.864	1.533	1.564	0.96
$d_{12}^{13}$	5	5.084	1.472	1.423	0.94
$d_{12}^{22}$	7	7.212	1.533	1.491	0.95
$d_{12}^{23}$	5	4.779	1.408	1.298	0.95
$d_{12}^{33}$	5	5.177	1.380	1.339	0.94
$d_{22}^{11}$	20	20.753	2.669	2.768	0.94
$d_{22}^{12}$	10	8.771	1.935	1.909	0.90
$d_{22}^{13}$	10	11.156	2.323	2.365	0.92
$d_{22}^{22}$	15	16.724	2.845	2.689	0.90
$d_{22}^{23}$	8	6.103	1.996	1.936	0.84
$d_{22}^{33}$	10	11.896	2.634	2.654	0.88
$σ_{1}^{2}$	25	24.787	0.693	0.720	0.95
$σ_{2}^{2}$	25	24.763	0.724	0.733	0.92

Open in a new tab

Notes: SEE is the mean of standard error of estimator. SSE is the standard error of estimators. CP is the 95% coverage probability.

4. Example

292 patients who underwent RFCA for drug-refractory symptomatic AF between July 2004 and March 2014 were retrospectively evaluated at an institution. Patients who lost clinical follow-up after RFCA, who had severe valvular heart disease requiring surgery, or who received surgical MAZE previously, were excluded. Finally, 265 patients were analyzed in the study.

A commercially available ultrasound scanner (Vivid 7 or 9, General Electric Medical Health, Waukesha, WI, USA) with a 2.5-MHz phased-array transducer was used to perform echocardiographic examinations. LA size was obtained as an anteroposterior diameter in parasternal long axis view according to the guidelines of the American Society of Echocardiography ([9]). The biplane Simpson method was used to determine LVEF. The baseline echocardiographic images in sinus rhythm were obtained on the next day after RFCA. Serial LA size and LVEF were measured at 1, 3, 6, 12, 18 and 24 months after RFCA. The average number of serial measurements was 5.96, where 150 participants (56.6%) have complete data and 88% of the participants have at least 4 repeated measures. Recurrence was defined in the introduction section. The percent of recurrence of AF was 25%. Among the eligible patients, 67% of respondents were male and 22% of respondents had paroxysmal AF.

The time plots for the LA size and LVEF are given in Figure 1, where the middle solid line represents the loess curve of the data. The time plot shows that there exists the individual heterogeneity. Furthermore, the LOWESS curve for LA size declines the first 3 months and stably increases afterwards. While the loess curve for LVEF for the first 3 months drops very gently and remains constantly. Based on these findings, the piecewise linear models, where the knot is set to equal 3 months, are chosen to model the serial measurement of LA size and LVEF as given in (20) and $t^{*} = 3$ . The detailed model selection is referred to [10].

Using the similar data, Lee et al. [10] indicated that gender and the type of AF at baseline are significant predictors in the logistic regression model, where a two-stage estimation approach is used. Thus, these two factors are also included in the logistic regression model as defined in (5). The event of interest is defined as non-recurrent. Two serial measurements (LA size and LVEF) are considered. Estimates of models are derived based on the likelihood function with misspecification and without misspecification, respectively.

Table 4 displays the estimates derived from two models. The estimate of $α_{0}$ is insignificant, while the significance of the estimate of $α_{1}$ is closer to 0.05 (p=0.079), which means the determination of recurrence is slightly conservative and a small portion of AF non-recurrent participants is misspecified.

Table 4.

Parameter estimates of the proposed model with and without misspecification for the AF recurrent data.

	Without misspecification			With misspecification
Parameter	Estimate	SE	p	Estimate	SE	p
$α_{0}$				0.101	0.065	0.122
$α_{1}$				0.050	0.029	0.079
Logistic regression
LA size	−0.190	0.044	0.000	−0.290	0.098	0.003
LVEF	0.128	0.052	0.015	0.163	0.074	0.028
Intercept	0.033	3.789	0.993	2.162	5.345	0.686
Male	0.915	0.419	0.029	1.280	0.790	0.105
Non-paroxysmal AF	−1.298	0.414	0.002	−2.048	0.854	0.017
Piecewise linear mixed model
$θ_{01}$	41.595	0.409	0.000	41.580	0.400	0.000
$θ_{11}$	−9.362	0.696	0.000	−9.327	0.625	0.000
$θ_{21}$	9.396	0.751	0.000	9.334	0.665	0.000
$θ_{02}$	66.608	0.309	0.000	66.602	0.321	0.000
$θ_{12}$	5.189	0.869	0.000	5.036	0.726	0.000
$θ_{22}$	−5.166	0.943	0.000	−5.010	0.777	0.000
$d_{11}^{11}$	35.030	7.117	0.000	32.057	4.607	0.000
$d_{11}^{12}$	−2.764	13.703	0.840	4.103	4.643	0.377
$d_{11}^{13}$	7.727	10.436	0.459	2.529	3.107	0.416
$d_{11}^{22}$	21.596	31.053	0.487	4.733	4.061	0.244
$d_{11}^{23}$	−16.916	34.213	0.621	1.636	4.514	0.717
$d_{11}^{33}$	21.450	33.588	0.523	3.221	3.909	0.410
$d_{12}^{11}$	−4.381	2.089	0.036	−3.740	1.548	0.016
$d_{12}^{12}$	1.177	4.617	0.799	−0.297	1.535	0.847
$d_{12}^{13}$	−0.906	4.803	0.850	0.314	1.766	0.859
$d_{12}^{22}$	−5.664	11.871	0.633	−0.228	1.511	0.880
$d_{12}^{23}$	5.206	11.918	0.662	−0.201	1.660	0.904
$d_{12}^{33}$	−5.516	12.178	0.651	−0.057	1.608	0.972
$d_{22}^{11}$	12.260	4.552	0.007	10.515	3.178	0.001
$d_{22}^{12}$	0.006	3.755	0.999	1.338	1.256	0.287
$d_{22}^{13}$	0.962	3.126	0.758	0.243	1.194	0.839
$d_{22}^{22}$	5.239	8.293	0.528	1.812	2.345	0.440
$d_{22}^{23}$	−3.637	8.772	0.678	0.000	2.316	1.000
$d_{22}^{33}$	5.062	8.881	0.569	1.503	2.123	0.479
$σ_{1}^{2}$	14.107	7.002	0.044	16.850	5.261	0.001
$σ_{2}^{2}$	18.972	1.553	0.000	19.414	1.415	0.000

Open in a new tab

Similar to the result in simulations, the estimates for parameters in the mixed model are similar for models with misspecification and without misspecification. The estimates of the mean in the mixed model are all significant. The LA size declines the first three months but remains stable afterwards. LVEF increases slightly the first three months and then remain constant. Regarding the estimates of $D$ , only the estimates of the variance of the random intercept and also the estimate of the corresponding covariance are significant.

Some discrepancies in estimates of parameters for the logistic regression are found. The type of AF at baseline is significant in both models. The participants who had non-paroxysmal AF are less likely to have non-recurrent AF as compared to those who have paroxysmal AF. The odds ratios for model with and without misspecification are 0.129 and 0.273, respectively. However, the finding for the gender of the participants is inconsistent. Both estimates are positive. It means male participants are highly likely to have non-recurrent AF. Nevertheless, the estimate is not significant in the model adjusting the misspecification, while it is significant in the model without the misspecification adjustment.

The estimates corresponding to the LA size and LVEF are all significant in both models. In particular, estimates derived from the model with misspecification have slightly large estimates as compared to the model without misspecification, while SEs of the estimates are 2 times larger. The direction of the influence is the same. As the LA size increases, the odds of having non-recurrent AF declines. While the odds of having non-recurrent AF increases, as LVEF increases.

5. Discussions

The misspecification of the outcome can occur owing to various reasons. Assuming the probability of misspecification of the outcome depends upon only the observed outcome, this paper constructs the complete data likelihood that takes into account many serial measurements and binary outcome incorporating the misspecification of the outcome. Owing to the misspecification, the computation is much more complex. Based on Monte Carlo simulations, the estimates can be biased when the misspecification is not incorporated in the model. This finding is similar to the discussion in [6].

Using the baseline echocardiography reports, [14] found that only LA size is associated with the recurrence of AF while neither LVEF or MR are significantly associated with the duration from the time of the echocardiogram until the first recurrence of AF. However, [19] reported that heart failure was an independent marker of the adverse procedural outcome, indicating that the poorer outcome was independent on secondary phenomena such as dilated LA but primarily related to low LVEF. In this study, LVEF is a strong predictor in predicting the recurrence of AF using the proposed model. The advantage of the proposed model is that the most current values of LA size and LVEF are implemented which reflects more concurrent function. Nevertheless, one of the recurrent indications is the self-reported typical palpitation episodes. It may be possible that female participants are more cautious about their condition. Nevertheless, other indications of recurrence might be stronger. Thus, by incorporating the misspecification, the association between gender and the recurrence of AF is weakened.

The simulation only considered the complete data. That is, all the repeated measures are available. In reality, owing to missing follow-ups, repeated measures might be missing. The impact of the attritions is unclear. Further investigations about the impact on the missing values on the repeated measures are needed.

The misspecification of the outcome can occur in many applications. Adapting the setting by [6], this paper assumes the probability of misspecification to depend only on the observed value. Very limited information about misspecification is given. [16] and [17] suggested using surrogate outcome data and a validation sample to correct. Using the Monte Carlo simulation, Cheng and Hsueh [4] found the estimator is quite stable when a reasonable surrogate classifier is given. The estimates of $α_{0}$ and $α_{1}$ may converge faster when some additional informations are given.

The area (AUC) under the receiver operator characteristic (ROC) curve to assess the predictive power for the joint model of multiple repeated measures and binary outcome [7]. It might be possible to adopt the AUC measure for the current model. Nevertheless, the construction of the ROC curve and the AUC have to incorporate the possible misspecification of the binary outcome. The modification is not straight-forward.

Acknowledgments

The authors gratefully acknowledge the referees for their insightful comments, which enhanced the presentation and methodology of this paper.

Appendices.

Appendix 1. M step in the joint model

From (10), the complete likelihood function can be decomposed into three parts. The estimators of $σ_{j}^{2}$ can be derived from the first part as

\log (\prod_{i = 1}^{n} \prod_{k = 1}^{m_{i j}} f (Z_{i j k} | θ_{i j}, σ_{j}^{2})) = - \frac{\sum_{i = 1}^{n} m_{i j}}{2} \log (2 π σ_{j}^{2}) - \frac{\sum_{i = 1}^{n} \sum_{k = 1}^{m_{i j}} (Z_{i j k} - T_{i j k} θ_{i j})^{2}}{2 σ_{j}^{2}} .

(A1)

Estimators defined in (17) can be obtained by setting the derivative of (A1) to be zero as

- \frac{1}{2 σ_{j}^{2}} \sum_{i = 1}^{n} m_{i j} + \frac{1}{2 σ_{j}^{4}} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i k}} (Z_{i j k} - T_{i j k} θ_{i j})^{2} = 0.

The second part can be used to derive the estimator that is related to the distribution of $θ_{i}$ .

Appendix 2. Expectation in M step

Many computations in the EM algorithm require finding the conditional expectation of $θ_{i}$ given the data when searching for the MLE. Furthermore, to compute the conditional expectation, we need to know the conditional probability density function of $θ_{i}$ , which is

\begin{aligned} f (θ_{i} | y_{i}, {\tilde{y}}_{i}, X_{i}, Z_{i}, T_{i}, \hat{Ω}) \\ = \frac{f (y_{i}, {\tilde{y}}_{i} | α_{0}, α_{1}, θ_{i}, X_{i}, \hat{β}, \hat{γ}) f (θ_{i} | Z_{i}, T_{i}, α_{0}, α_{1}, \hat{θ}, \hat{D}, {\hat{σ}}_{1}^{2}, \dots, {\hat{σ}}_{J}^{2})}{\int_{- \infty}^{\infty} f (y_{i}, {\tilde{y}}_{i} | α_{0}, α_{1}, θ_{i}, X_{i}, \hat{β}, \hat{γ}) f (θ_{i} | Z_{i}, T_{i}, α_{0}, α_{1}, \hat{θ}, \hat{D}, {\hat{σ}}_{1}^{2}, \dots, {\hat{σ}}_{J}^{2}) d θ_{i}}, \end{aligned}

(A2)

where the conditional distribution of $y_{i}$ and ${\tilde{y}}_{i}$ is given in (7), but $f (θ_{i} | Z_{i}, T_{i}, \hat{θ}, \hat{D}, {\hat{σ}}_{1}^{2}, \dots, {\hat{σ}}_{J}^{2})$ is unknown.

Let $h (x)$ denote a function of x and we want to find the conditional expectation of $h (θ_{i})$ given the observed data and parameter estimators, i.e. $E [h (θ_{i}) | y_{i}, {\tilde{y}}_{i}, X_{i}, Z_{i}, T_{i}, \hat{Ω}],$ where $\hat{Ω} = (α_{0}, α_{1}, \hat{θ}, \hat{D}, \hat{β}, \hat{γ}, {\hat{Σ}}_{i}) .$ Using (7) and (13), we obtain the conditional expectation of $h (θ_{i})$

\begin{aligned} E [h (θ_{i}) | y_{i}, {\tilde{y}}_{i}, X_{i}, Z_{i}, T_{i}, \hat{Ω}] \\ = \frac{\int_{- \infty}^{\infty} h (θ_{i}) f (y_{i}, {\tilde{y}}_{i} | α_{0}, α_{1}, θ_{i}, X_{i}, \hat{β}, \hat{γ}) f (θ_{i} | Z_{i}, T_{i}, α_{0}, α_{1}, \hat{θ}, \hat{D}, {\hat{σ}}_{1}^{2}, \dots, {\hat{σ}}_{J}^{2}) d θ_{i}}{\int_{- \infty}^{\infty} f (y_{i}, {\tilde{y}}_{i} | α_{0}, α_{1}, θ_{i}, X_{i}, \hat{β}, \hat{γ}) f (θ_{i} | Z_{i}, T_{i}, α_{0}, α_{1}, \hat{θ}, \hat{D}, {\hat{σ}}_{1}^{2}, \dots, {\hat{σ}}_{J}^{2}) d θ_{i}} . \end{aligned}

(A3)

Since the explicit value of $E [h (θ_{i}) | y_{i}, {\tilde{y}}_{i}, X_{i}, Z_{i}, T_{i}, \hat{Ω}]$ is complicated, we obtain the approximate value based on Monte Carlo simulation. The procedure begins simulating M samples of $θ_{i}$ from (13) denoted as $θ_{i}^{(s)}$ , $s = 1, 2, \dots, M$ . Based on these samples, $E [h (θ_{i}) | y_{i}, {\tilde{y}}_{i}, X_{i}, Z_{i}, T_{i}, \hat{Ω}]$ can be estimated by

\begin{aligned} \int_{- \infty}^{\infty} h (θ_{i}) f (y_{i}, {\tilde{y}}_{i} | α_{0}, α_{1}, θ_{i}, X_{i}, \hat{β}, \hat{γ}) f (θ_{i} | Z_{i}, T_{i}, α_{0}, α_{1}, \hat{θ}, \hat{D}, {\hat{σ}}_{1}^{2}, \dots, {\hat{σ}}_{J}^{2}) d θ_{i} \\ \approx \frac{1}{M} \sum_{s = 1}^{M} h (θ_{i}^{(s)}) f (y_{i}, {\tilde{y}}_{i} | α_{0}, α_{1}, θ_{i}^{(s)}, X_{i}, \hat{β}, \hat{γ}) . \end{aligned}

Similarly, the approximated value is obtained as for (A3) as

\frac{\sum_{s = 1}^{M} h (θ_{i}^{(s)}) f (y_{i}, {\tilde{y}}_{i} | α_{0}, α_{1}, θ_{i}^{(s)}, X_{i}, \hat{β}, \hat{γ})}{\sum_{s = 1}^{M} f (y_{i}, {\tilde{y}}_{i} | α_{0}, α_{1}, θ_{i}^{(s)}, X_{i}, \hat{β}, \hat{γ})} .

(A4)

Funding Statement

This research was partially supported in part by the Ministry of Science and Technology in Taiwan [grant numberMOST 106-2118-M-305-006-MY2].

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

1.Albert P.S., Hunsberger S.A., and Bird F.M., Modeling repeated measures with monotonic ordinal responses and misclassification, J. Amer. Statist. Assoc 92 (1997), pp. 1304–1311. [Google Scholar]
2.Assakul K. and Proctor C., Testing independence in two way contingency tables with data subject to misclassification, Psychometrika 32 (1967), pp. 67–76. [DOI] [PubMed] [Google Scholar]
3.Bross I., Misclassification in 2 × 2 tables, Biometrics 10 (1954), pp. 478–486. [Google Scholar]
4.Cheng K.F. and Hsueh H.M., Correcting bias due to misclassification in the estimation of logistic regression models, Stat. Probab. Lett. 44 (1999), pp. 229–240. [Google Scholar]
5.Chou C.C., Lee H.L., Chang P.C., Wo H. T.and Wen M.S., Yeh S.J., Lin F.C., and Hwang Y.T., Left atrial emptying fraction predicts recurrence of atrial fibrillation after radiofrequency catheter ablation, PLoS ONE. 13 (2018), pp. e0191196. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Hausman J., Abrevaya J., and Scott-Morton F.M., Misclassification of the dependent variable in a discrete-response setting, J. Econom. 87 (1998), pp. 239–269. [Google Scholar]
7.Hwang Y.T., Wang C.C., Wang C.H., Tseng Y.K., and Chang Y.J., Joint model of multiple longitudinal measures and a binary outcome – an application to predict orthostatic hypertension for subacute stroke patients, Biom. J. 57 (2015), pp. 662–675. [DOI] [PubMed] [Google Scholar]
8.Lagarias J.C., Reeds J.A., Wright M.H., and Wright P.E., Convergence properties of the Nelder-Mead simplex method in low dimensions, SIAM J. Optim. 9 (1998), pp. 112–147. [Google Scholar]
9.Lang R.M., Bierig M., Devereux R.B., Flachskamph F.A., Foster E., and Pellikka P.A., et al. Recommendations for chamber quantification: A report from the American Society of echocardiography's guidelines and standards committee and the chamber quantification writing group, developed in conjunction with the European Association of Echocardiography, a branch of the european society of cardiology, J. Am. Soc. Echocardiogr. 18 (2005), pp. 1440–1463. [DOI] [PubMed] [Google Scholar]
10.Lee H.L., Hwang Y.T., Chang P.C., Wen M.S., and Chou C.C., A 3-year longitudinal study of the relation between left atrial diameter remodeling and atrial fibrillation ablation outcome, J. Geriatr. Cardiol. 15 (2018), pp. 486–491. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Marquis K. and Moore J., Measurement Errors in SIPP Program Reports, Proceedings of the Bureau of the Census 1990 Annual Research Conference, Bureau of the Census, Washington, DC, 1990, pp. 721–745.
12.Miyazaki S., Kuwahara T., Kobori A., Takahashi Y., Takei A., and Sato A., et al. Preprocedural predictors of atrial fibrillation recurrence following pulmonary vein antrum isolation in patients with paroxysmal atrial fibrillation: long-term follow-up results, J. Cardiovasc. Electrophysiol. 22 (2011), pp. 621–625. [DOI] [PubMed] [Google Scholar]
13.Mote V.L. and Anderson R.L., An investigation of the effect of misclassification on the properties of $χ^{2}$ tests in the analysis of categorical data, Biometrika 52 (1965), pp. 95–109. [PubMed] [Google Scholar]
14.Mulukutla A., Althouse A.D., Jain S.K., and Saba S., Increased left atrial size is associated with higher atrial fibrillation recurrence in patients treated with antiarrhythmic medications, Clin. Cardiol. 41 (2018), pp. 825–829. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Ouyang F., Tilz R., Chun J., Schmidt B., Wissner E., Zerm T., Neven K., Ktrk B., Konstantinidou M., Metzner A., Fuernkranz A., and Kuck K.H., Long-term results of catheter ablation in paroxysmal atrial fibrillation: Lessons from a 5-year follow-up, Circulation 122 (2010), pp. 2368–2377. [DOI] [PubMed] [Google Scholar]
16.Pepe M.S., Inference using surrogate outcome data and a validation sample, Biometrika 79 (1992), pp. 355–365. [Google Scholar]
17.Sepanski J., Knickerbocker R., and Carroll R., A semiparametric correction for attenuation, J. Amer. Statist. Assoc 89 (1994), pp. 1366–1373. [Google Scholar]
18.Shin S.H., Park M.Y., Oh W.J., Hong S.J., Pak H.N., and Song W.H., et al. Left atrial volume is a predictor of atrial fibrillation recurrence after catheter ablation, J. Am. Soc. Echocardiogr. 21 (2008), pp. 697–702. [DOI] [PubMed] [Google Scholar]
19.Ullah W., S. Prabhu L.L.H., Lee G., and Kistler P., et al. Catheter ablation of atrial fibrillation in patients with heart failure: Impact of maintaining sinus rhythm on heart failure status and long-term rates of stroke and death, Clin. Res. 18 (2016), pp. 679–686. [DOI] [PubMed] [Google Scholar]
20.Weerasooriya R., Khairy P., Litalien J., Macle L., Hocini M., Sacher F., Lellouche N., Knecht S., Wright M., Nault I., Miyazaki S., Scavee C., Clementy J., Haissaguerre M., and Jais P., Catheter ablation for atrial fibrillation: are results maintained at 5 years of follow-up?, Circulation 57 (2011), pp. 160–166. [DOI] [PubMed] [Google Scholar]

[CIT0001] 1.Albert P.S., Hunsberger S.A., and Bird F.M., Modeling repeated measures with monotonic ordinal responses and misclassification, J. Amer. Statist. Assoc 92 (1997), pp. 1304–1311. [Google Scholar]

[CIT0002] 2.Assakul K. and Proctor C., Testing independence in two way contingency tables with data subject to misclassification, Psychometrika 32 (1967), pp. 67–76. [DOI] [PubMed] [Google Scholar]

[CIT0003] 3.Bross I., Misclassification in 2 × 2 tables, Biometrics 10 (1954), pp. 478–486. [Google Scholar]

[CIT0004] 4.Cheng K.F. and Hsueh H.M., Correcting bias due to misclassification in the estimation of logistic regression models, Stat. Probab. Lett. 44 (1999), pp. 229–240. [Google Scholar]

[CIT0005] 5.Chou C.C., Lee H.L., Chang P.C., Wo H. T.and Wen M.S., Yeh S.J., Lin F.C., and Hwang Y.T., Left atrial emptying fraction predicts recurrence of atrial fibrillation after radiofrequency catheter ablation, PLoS ONE. 13 (2018), pp. e0191196. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0006] 6.Hausman J., Abrevaya J., and Scott-Morton F.M., Misclassification of the dependent variable in a discrete-response setting, J. Econom. 87 (1998), pp. 239–269. [Google Scholar]

[CIT0007] 7.Hwang Y.T., Wang C.C., Wang C.H., Tseng Y.K., and Chang Y.J., Joint model of multiple longitudinal measures and a binary outcome – an application to predict orthostatic hypertension for subacute stroke patients, Biom. J. 57 (2015), pp. 662–675. [DOI] [PubMed] [Google Scholar]

[CIT0008] 8.Lagarias J.C., Reeds J.A., Wright M.H., and Wright P.E., Convergence properties of the Nelder-Mead simplex method in low dimensions, SIAM J. Optim. 9 (1998), pp. 112–147. [Google Scholar]

[CIT0009] 9.Lang R.M., Bierig M., Devereux R.B., Flachskamph F.A., Foster E., and Pellikka P.A., et al. Recommendations for chamber quantification: A report from the American Society of echocardiography's guidelines and standards committee and the chamber quantification writing group, developed in conjunction with the European Association of Echocardiography, a branch of the european society of cardiology, J. Am. Soc. Echocardiogr. 18 (2005), pp. 1440–1463. [DOI] [PubMed] [Google Scholar]

[CIT0010] 10.Lee H.L., Hwang Y.T., Chang P.C., Wen M.S., and Chou C.C., A 3-year longitudinal study of the relation between left atrial diameter remodeling and atrial fibrillation ablation outcome, J. Geriatr. Cardiol. 15 (2018), pp. 486–491. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0011] 11.Marquis K. and Moore J., Measurement Errors in SIPP Program Reports, Proceedings of the Bureau of the Census 1990 Annual Research Conference, Bureau of the Census, Washington, DC, 1990, pp. 721–745.

[CIT0012] 12.Miyazaki S., Kuwahara T., Kobori A., Takahashi Y., Takei A., and Sato A., et al. Preprocedural predictors of atrial fibrillation recurrence following pulmonary vein antrum isolation in patients with paroxysmal atrial fibrillation: long-term follow-up results, J. Cardiovasc. Electrophysiol. 22 (2011), pp. 621–625. [DOI] [PubMed] [Google Scholar]

[CIT0013] 13.Mote V.L. and Anderson R.L., An investigation of the effect of misclassification on the properties of $χ^{2}$ tests in the analysis of categorical data, Biometrika 52 (1965), pp. 95–109. [PubMed] [Google Scholar]

[CIT0014] 14.Mulukutla A., Althouse A.D., Jain S.K., and Saba S., Increased left atrial size is associated with higher atrial fibrillation recurrence in patients treated with antiarrhythmic medications, Clin. Cardiol. 41 (2018), pp. 825–829. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0015] 15.Ouyang F., Tilz R., Chun J., Schmidt B., Wissner E., Zerm T., Neven K., Ktrk B., Konstantinidou M., Metzner A., Fuernkranz A., and Kuck K.H., Long-term results of catheter ablation in paroxysmal atrial fibrillation: Lessons from a 5-year follow-up, Circulation 122 (2010), pp. 2368–2377. [DOI] [PubMed] [Google Scholar]

[CIT0016] 16.Pepe M.S., Inference using surrogate outcome data and a validation sample, Biometrika 79 (1992), pp. 355–365. [Google Scholar]

[CIT0017] 17.Sepanski J., Knickerbocker R., and Carroll R., A semiparametric correction for attenuation, J. Amer. Statist. Assoc 89 (1994), pp. 1366–1373. [Google Scholar]

[CIT0018] 18.Shin S.H., Park M.Y., Oh W.J., Hong S.J., Pak H.N., and Song W.H., et al. Left atrial volume is a predictor of atrial fibrillation recurrence after catheter ablation, J. Am. Soc. Echocardiogr. 21 (2008), pp. 697–702. [DOI] [PubMed] [Google Scholar]

[CIT0019] 19.Ullah W., S. Prabhu L.L.H., Lee G., and Kistler P., et al. Catheter ablation of atrial fibrillation in patients with heart failure: Impact of maintaining sinus rhythm on heart failure status and long-term rates of stroke and death, Clin. Res. 18 (2016), pp. 679–686. [DOI] [PubMed] [Google Scholar]

[CIT0020] 20.Weerasooriya R., Khairy P., Litalien J., Macle L., Hocini M., Sacher F., Lellouche N., Knecht S., Wright M., Nault I., Miyazaki S., Scavee C., Clementy J., Haissaguerre M., and Jais P., Catheter ablation for atrial fibrillation: are results maintained at 5 years of follow-up?, Circulation 57 (2011), pp. 160–166. [DOI] [PubMed] [Google Scholar]

PERMALINK

Misspecification of a binary dependent variable in the logistic model controlling for the repeated longitudinal measures

Chun-Chao Wang

Yi-Ting Hwang

Chung-Chuan Chou

Hui-Ling Lee

Abstract

1. Introduction

2. The proposed model

2.1. Estimation

3. Simulations

Table 1.

Table 2.

Table 3.

4. Example

Figure 1.

Table 4.

5. Discussions

Acknowledgments

Appendices.

Appendix 1. M step in the joint model

Appendix 2. Expectation in M step

Funding Statement

Disclosure statement

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Misspecification of a binary dependent variable in the logistic model controlling for the repeated longitudinal measures

Chun-Chao Wang

Yi-Ting Hwang

Chung-Chuan Chou

Hui-Ling Lee

Abstract

1. Introduction

2. The proposed model

2.1. Estimation

3. Simulations

Table 1.

Table 2.

Table 3.

4. Example

Figure 1.

Table 4.

5. Discussions

Acknowledgments

Appendices.

Appendix 1. M step in the joint model

Appendix 2. Expectation in M step

Funding Statement

Disclosure statement

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases