Assessing Complier Average Causal Effects from Longitudinal Trials with Multiple Endpoints and Treatment Noncompliance: an Application to a Study of Arthritis Health Journal

Lulu Guo; Yi Qian; Hui Xie

doi:10.1002/sim.9364

. Author manuscript; available in PMC: 2023 Jun 15.

Published in final edited form as: Stat Med. 2022 Mar 10;41(13):2448–2465. doi: 10.1002/sim.9364

Assessing Complier Average Causal Effects from Longitudinal Trials with Multiple Endpoints and Treatment Noncompliance: an Application to a Study of Arthritis Health Journal

Lulu Guo ^1,², Yi Qian ³, Hui Xie ^2,^4,^5,^*

PMCID: PMC9035077 NIHMSID: NIHMS1782678 PMID: 35274333

Summary

Treatment noncompliance often occurs in longitudinal randomized controlled trials (RCTs) on human subjects, and can greatly complicate treatment effect assessment. The complier average causal effect (CACE) informs the intervention efficacy for the subpopulation who would comply regardless of assigned treatment and has been considered as patient-oriented treatment effects of interest in the presence of noncompliance. Real-world RCTs evaluating multifaceted interventions often employ multiple study endpoints to measure treatment success. In such trials, limited sample sizes, lowcompliance rates and small to moderate effect sizes on individual endpoints can significantly reduce the power to detect CACE when these correlated endpoints are analyzed separately. To overcome the challenge, we develop a multivariate longitudinal potential outcome model with stratification on latent compliance types to efficiently assess multivariate CACEs (MCACE) by combining information across multiple endpoints and visits. Evaluation using simulation data shows a significant increase in the estimation efficiency with the MCACE model, including up to 50% reduction in standard errors of CACE estimates and 1-fold increase in the power to detect CACE. Finally, we apply the proposed MCACE model to an RCT on Arthritis Health Journal online tool. Results show that the MCACE analysis detects significant and beneficial intervention effects on two of the six endpoints while estimating CACEs for these endpoints separately fail to detect treatment effect on any endpoint.

Keywords: causal inference, potential outcome model, treatment effects estimation, multi-level model, hierarchical random effects model, principal stratification

1 |. INTRODUCTION

Randomized controlled trials (RCTs) are the gold standard for evaluating treatment effects. However, Noncompliance often occurs and can greatly complicate assessing treatment effects. Intention-to-treat (ITT) analysis preserves randomization and is the main method to report trial results.^1,2 However, ITT analysis estimates the effect of assigning treatment and targets program effectiveness. While program effectiveness is often of interest to policymakers, patients and their health decision-makers are more interested in intervention efficacy that informs what to expect when patients comply with treatment.^3,4 Generally, the ITT gives conservative estimates of intervention efficacy.³ The alternative as-treated (AT) approach compares outcomes based on actually received treatments. The AT estimate violates randomization assumption and can be confounded by unobserved factors correlated with compliance behaviors. Thus AT is subject to selection bias as an estimate of intervention efficacy and should not be used without first evaluating the size of potential bias.⁵

An appealing alternative is to estimate the complier average causal effect (CACE) via the method of latent class instrumental variables⁶ that can properly adjust for post-randomization compliance status when estimating treatment effects. Under noncompliance, the principal strata are (partially) unobserved compliance types (compliers, always-takers, never-takers and defiers) determined by the joint potential compliance behaviors under both control and treatment groups.^7,8 These compliance types are predetermined before randomization, which permits one to define causal effects within subpopulations partitioned by compliance types. Imbens and Angrist⁷ and Baker and Lindeman⁸ showed how to estimate the average treatment effects for the subpopulation who would comply regardless of treatment assignment. Patients and their treatment decision makers are typically more interested in knowing the expected treatment effect when taking the treatment. The CACE is considered as more relevant for such patient-oriented treatment effects, as compared with the program effectiveness targeted by ITT.⁴

In this paper, we propose a multivariate longitudinal potential outcome model with principal strata for latent compliance types to efficiently assess multivariate CACEs (MCACE) in longitudinal RCTs with multiple endpoints. Real-world RCTs evaluating multifaceted interventions often employ longitudinal designs and multiple endpoints. In such trials, limited sample sizes, low compliance rates or small to moderate effect sizes on endpoints can significantly reduce the power to detect CACE. This work is motivated by a longitudinal RCT conducted at Arthritis Research Canada to evaluate the effectiveness of a behavioral intervention, the Arthritis Health Journal (AHJ).⁹ AHJ is an online tool that enables patients with rheumatoid arthritis to monitor their disease activity. Six health endpoints were collected longitudinally, measuring multifaceted aspects of managing disease, symptoms, knowledge etc. A preliminary evaluation using ITT analysis for each endpoint separately reported no significant treatment effects on all endpoints⁹ with full results presented in Web Table 1 in the supporting information. However, a substantial number of participants did not use the intervention, or used it rarely, which can render ITT estimates too conservative for evaluating the patient-oriented intervention effect. The low compliance rate combined with the limited sample size (n = 94) and moderate effect size expected from using AHJ motivated us to seek the most efficient analysis to estimate CACEs by combining data across multiple endpoints and visits.

The literature demonstrating the benefits of pooling information across multiple endpoints in RCTs has largely focused on perfect compliance. One exception is Jo and Muthén¹⁰ who considered CACE estimation with multiple correlated endpoints to increase the power to detect intervention effects for cross-sectional data. Mealli and Pacini¹¹ and Mattei et al¹² showed a secondary outcome can be exploited to sharpen the nonparametric bound and Bayesian inference of CACE of the primary endpoint in the cross-sectional setting. Yau and Little¹³ and Jo and Muthén¹⁰ developed methods for CACE estimation with longitudinal measurements of single endpoint subject to noncompliance and attrition, demonstrating the benefits of longitudinal data for CACE estimation, including increased power, better handling of missing data and estimation of growth trends. We extend these prior works to longitudinal RCTs with multiple endpoints and treatment noncompliance.

Our MCACE model consists of a sub-model for the unobserved compliance types and a hierarchical random-effects potential outcome sub-model for longitudinal measurements of multiple endpoints within each compliance type. Unlike univariate CACE (UCACE) analysis that analyzes each endpoint separately, under MCACE each subject has a single estimate of class membership of compliance type which permits more accurate estimation of the unobserved subject-specific compliance types. By combining data from all endpoints and longitudinal trajectories, MCACE model maximizes the information used to estimate CACE. A global likelihood ratio test is used to test the null hypothesis of no treatment effect on all endpoints. We compared MCACE analysis with UCACE analysis in simulation studies. Results show a significant increase in the estimation efficiency with the MCACE model, including up to 50% reduction in standard errors of CACE estimates and a 1-fold increase in the power to detect CACE. Finally, we apply the proposed MCACE model to the AHJ data. With MCACE model, we detect a significant overall treatment effect using the global likelihood ratio test and identify statistically significant CACEs on two out of six endpoints. In contrast, the less powerful UCACE analysis cannot detect any significant treatment effects.

Next in Section 2, we describe the proposed model and its estimation and inference. Section 3 describes a simulation study that compares the MCACE method with the UCACE method in terms of the point estimate, nominal rate and width of confidence intervals and power of hypothesis testing. Finally, we apply the proposed MCACE model and UCACE model to the AHJ data in Section 4, followed by a discussion in Section 5.

2 |. NOTATION AND MODEL

Let A_i indicate the i^th subject’s treatment assignment, where i = 1, ⋯, N. A_i = 1 (or 0) if subject i is assigned to the treatment (or placebo). Let D_i be the indicator of the receipt of the treatment. D_i equals to 1 (or 0) if subject i receives the treatment (or placebo). Let A and D be N-dimensional vectors with the i^th elements equal to A_i and D_i. We consider K endpoints measured over time on each of N participants. For CACE analysis, we define two types of potential outcomes, the secondary potential outcome D_i(A) and the primary potential outcome Y_ijk(A, D(A)). D_i(A) is the potential treatment received by subject i when subjects are randomized to A. Y_ijk(A, D(A)) is the potential outcome value for the k^th outcome at occasion j for subject i under treatment assignment A and treatment received D(A), where i = 1, ⋯, N, j = 0, ⋯, J and k = 1, ⋯, K. Let Y_i(A, D) be the vector of K * (J + 1) potential outcomes for subject i given A and D(A). Let Y (A, D) denote the vector of potential outcomes collecting all the Y_ijk(A, D(A)) over i, j, k.

2.1 |. Assumptions for complier average causal effect analysis

Our analysis makes two assumptions which are often invoked for causal inference in RCTs. The stable unit treatment value assumption (SUTVA^14,15,16) requires no interference between subjects and no multiple versions of treatments. In AHJ study, SUTVA is plausible because the online tool was the same for all participants who independently accessed and used the online tool with minimum expected interference. SUTVA allows us to write Y_i(A, D) and D_i(A) as Y_i(A_i, D_i) and D_i(A_i). The second assumption is random assignment, which assumes that given observed baseline variables, A_i is independent of potential outcomes Y_i(A, D) and D_i(A). This assumption is satisfied in RCTs since treatments were randomly assigned to study participants.

In our analysis of CACEs in the AHJ data, we make two additional assumptions. The third assumption is no access to the treatment in the control group. This assumption holds in many placebo-controlled trials, including the AHJ study in which participants in the control group would not have access to the online tool during the 6 months after randomization. Following Imbens and Ruben,¹⁷ we denote C_i, the compliance behavior of participant i, as:

C_{i} = {\begin{array}{l} c (complier), & if D_{i} (a) = a, for a = 0, 1, \\ n (never - taker), & if D_{i} (a) = 0, for a = 0, 1, \\ a (always - taker), & if D_{i} (a) = 1, for a = 0, 1, \\ d (defier), & if D_{i} (a) = 1 - a, for a = 0, 1. \end{array}

When held, the third assumption excludes defiers and always-takers. As a result, the compliance status (complier .vs. never-taker) is known for participants in the treatment group but remains unknown for those in the control group. The fourth assumption is exclusion restriction.^7,8 Under the assumption, Y (A, D) = Y (A′, D) ∀A, A′ and ∀D, and thus Y (A, D) can be written as Y (D). In the AHJ study, this implies never-takers’ outcomes were the same regardless of treatment assignment. Because compliance behavior C_i is based on the potential outcomes of D_i(A_i), C_i is unaffected by the treatment assignment and thus behaves like a baseline variable. One can perform treatment effect evaluation within each compliance strata if C_i was observed for each participant. However, because C_i is only partially observed, the fourth assumption can be exploited to sharpen the estimation of the compliance-strata-specific treatment effects.¹⁷ Under the above set of assumptions, CACE is shown to be identifiable with likelihood-based inference.^8,17

2.2 |. Models for outcomes and compliance

We consider modeling the joint distribution of two types of potential outcomes: the potential values of multiple endpoints after being assigned to control and treatment (Y (D(0)), Y (D(1))) and the potential treatment received (D(0), D(1)), given the randomization A and covariates W. For notation convenience, we denote Y (D(l)) by Y^l, Y_i(D_i(l)) by $Y_{i}^{l}$ and Y_ijk(D_i(l)) by $Y_{i j k}^{l}$ , l ∈ {0, 1} denoting potential assignment to control and treatment, respectively. Because the potential outcomes (D(0), D(1)) are one-to-one function of the compliance behavior C, we can equivalently model the joint distribution of (Y⁰, Y¹, C), which is then expressed as the product of the conditional distribution (Y⁰, Y¹) given the (latent) compliance type C and the distribution of C. When modeling the conditional distribution (Y⁰, Y¹) given the compliance type C, we employ a hierarchical random-effects potential outcome model for longitudinal measurements of multiple endpoints within each compliance type. This hierarchical model has two levels with the first level specifying a within-subjects model for potential outcomes given subject- and endpoint-specific parameters $b_{m i k}^{l}$ , and the second level specifying a between-subjects model for $b_{m i k}^{l}$ . The distribution of C is specified based on a logistic regression model. To guide model development, we depict the main structure of the proposed MCACE model in Figure 1 with nodes to be fully defined in the following two subsections. Section 2.2.1 below describes the two-level sub-model for potential outcomes Y^l given partially observed compliance type C. Section 2.2.2 describes the sub-model for the compliance type C given baseline covariates W.

Illustration of the structure of the MCACE model.

2.2.1 |. The sub-model for Y^l | C

Let m denote the unique value of compliance type, m ∈ {c, n}. We assume the following potential outcome model for multiple endpoints within the compliance type m:

Y_{i}^{l} ∣ (C_{i} = m, X_{i}) ~ N (μ_{m i}^{l}, Σ_{m i}^{l})

(1)

where $Y_{i}^{l} = (Y_{i 01}^{l}, \dots, Y_{i 0 K}^{l}, Y_{i 11}^{l}, \dots, Y_{i 1 K}^{l}, \dots, Y_{i J 1}^{l}, \dots, Y_{i J K}^{l})$ , and X_i is the vector of explanatory variables for the potential outcomes. The above model assumes $Y_{i}^{0}$ and $Y_{i}^{1}$ are independent conditional on compliance type, covariates and parameters. This is justified because $Y_{o b s, i} = A_{i} Y_{i}^{1} + (1 - A_{i}) Y_{i}^{0}$ and thus $Y_{o b s, i} ∣ C_{i}, X_{i} ~ N (A_{i} μ_{m i}^{1} + (1 - A_{i}) μ_{m i}^{0}, A_{i} Σ_{m i}^{1} + (1 - A_{i}) Σ_{m i}^{0})$ , meaning that the likelihood function of the observed data does not depend on the correlation between potential outcomes $Y_{i}^{0}$ and $Y_{i}^{1}$ . Therefore, the correlation between potential outcomes $Y_{i}^{0}$ and $Y_{i}^{1}$ becomes unimportant under likelihood-based methods (Page 181 in Chapter 8 in Imbens and Rubin,¹⁸ Hirano et al¹⁹). Even with the modeling simplification, for multivariate longitudinal data, it is still very challenging to specify a sensible structure for $Σ_{m i}^{l}$ . For a general unstructured covariance matrix, $Σ_{m i}^{l}$ is a (J + 1)K × (J + 1)K covariance matrix with $(\begin{matrix} (J + 1) K + 1 \\ 2 \end{matrix})$ parameters. When K = 6 and J = 2, the number of unique parameters needed to be estimated in $Σ_{m i}^{l}$ will be 171. Such parameter proliferation can be a severe issue because the limited sample sizes compounded by the low compliance rates in many practical RCTs lead to an insufficient number of compliers that do not afford enough degree of freedoms to estimate a general covariance matrix. To reduce the number of nuisance parameters in $Σ_{m i}^{l}$ , we employ the hierarchical random-effects modeling approach²⁰ that captures the potentially complex variance structure by explicitly modeling individual heterogeneity in longitudinal trajectories of multiple outcomes.

The hierarchical random-effects model is also known as multi-level model. The level-1 part of our multi-level model specifies the following within-subjects model for the potential outcome for the k^th endpoint at occasion j for individual i under treatment assignment l, given the participant i’s compliance type m and random effects $b_{m i k}^{l}$ :

y_{i j k}^{l} ∣ (C_{i} = m, b_{m i k}^{l}, Z_{i j}) = {Z_{i j}}^{T} b_{m i k}^{l} + ϵ_{mijk}^{l} .

(2)

In Eqn 2, Z_ij contains time-varying covariates, such as the time t_ij and higher-order terms of t_ij to capture potentially non-linear time trends in the potential outcomes. Let $ϵ_{m i j}^{l} = {(ϵ_{m i j 1}^{l}, \dots, ϵ_{mijK}^{l})}^{T}$ and $ϵ_{m i j}^{l} \overset{i i d}{~} N (0, Φ_{m})$ , where $Φ_{m} = diag (σ_{m 1}^{2}, \dots, σ_{m K}^{2})$ . We assume $ϵ_{m i j}^{1}$ is independent of $ϵ_{m i j}^{0}$ . The level-2 model specifies the between-subjects model for the individual-specific parameters $b_{m i k}^{l}$ .

b_{m i k}^{l} = β_{m 0 k} + β_{m 1 k} D_{i} (l) + v_{m i}^{l},

(3)

where β_m0k is the vector containing population average regression coefficients for subjects with compliance type m, assigned to treatment l and actually received the control; β_m1k represents the population average changes in these regression coefficients when these subjects actually received the treatment; $v_{m i}^{l}$ is the deviation of subject i’s coefficients from the population mean. Here we assume that $v_{m i}^{l}$ is a mean zero Gaussian variable with variance Σ_mv. We also assume that $v_{m i}^{1}$ is independent of $v_{m i}^{0}$ , $v_{m i}^{l}$ and $ϵ_{m i}^{l}$ are independent, where $ϵ_{m i}^{l} = {(ϵ_{m i 0}^{l}^{T}, \dots, ϵ_{m i J}^{l}^{T})}^{T}$ .

Combining the above two-level models for all k endpoints at all time points, we can obtain one overall model for the potential outcomes for individual i with compliance type m as

Y_{i}^{l} ∣ (C_{i} = m, v_{m i}, X_{i}) = (X_{i, l} \otimes I_{K}) β_{m} + (Z_{i} v_{m i}^{l}) \otimes 1_{K} + ϵ_{m i}^{l},

where $Y_{i}^{l} = {Y_{i j k}^{l} : j = 0, \dots, J; k = 1, \dots, K}$ , β_m = {β_mpqk : p = 0, ⋯, P; q = 0, ⋯, Q; k = 1, ⋯, K}, where P and Q depend on the forms of Eqns 2 and 3: (P + 1) equals to the dimension of random effects in level-1 model; Q equals to the number of predictors in level-2 model and could be greater than 1 if more predictors are included in the level-2 model. X_i,l and Z_i are design matrices for fixed effects and random effects respectively. Besides, X_i,l is a (J + 1) by R matrix where R equals to the number of fixed effects coefficients in Eqn 3. Z_i is a (J + 1) by H matrix where H equals to the dimension of random effects in Eqn 3.

By combining random effects and residual error terms, we obtain the marginal distribution for ${Y_{i}^{l} ∣ C_{i} = m}$ in Eqn 1 as

y_{i}^{l} ∣ C_{i} = m, X_{i} ~ M V N_{β_{m}, ψ_{m}} (μ_{m i}^{l}, Σ_{m i}),

(4)

where $μ_{m i}^{l} = (X_{i, l} \otimes I_{K}) β_{m}$ , $Σ_{m i} = (Z_{i} Σ_{m v} Z_{i}^{T}) \otimes (1_{K} 1_{K}^{T}) + V_{m}; V_{m} = v a r (ϵ_{m i}^{l}) = diag (Φ_{m 0}, Φ_{m 1}, \dots, Φ_{m J})$ , $Φ_{m 0} = Φ_{m 1} = \dots = Φ_{m J} = Φ_{m}; ψ_{m} = {({σ_{m v}}^{T}, σ_{m 1}^{2}, \dots, σ_{m K}^{2})}^{T}$ , σ_mv is the vector of unique parameters in Σ_mv, the variance-covariance matrix of random effects $v_{m i}^{l}$ .

An illustrative example

For illustration purpose, consider the following level-1 model with a quadratic function of time since baseline:

y_{i j k}^{l} ∣ (C_{i} = m, b_{m i k}^{l}, Z_{i j}) = b_{m 0 i k}^{l} + b_{m 1 i k}^{l} t_{i j} + b_{m 2 i k}^{l} t_{i j}^{2} + ϵ_{mijk}^{l},

(5)

with the following level-2 (between-subjects) models are as below,

b_{m 0 i k}^{l} = β_{m 00 k} + β_{m 01 k} D_{i} (l) + v_{m 0 i}^{l}, b_{m 1 i k}^{l} = β_{m 10 k} + β_{m 11 k} D_{i} (l) + v_{m 1 i}^{l}, b_{m 2 i k}^{l} = β_{m 20 k} + β_{m 21 k} D_{i} (l) + v_{m 2 i}^{l},

(6)

where $v_{m i}^{l} = {(v_{m 0 i}^{l}, v_{m 1 i}^{l}, v_{m 2 i}^{l})}^{T} \overset{i i d}{~} N (0, Σ_{m v})$ , $Σ_{m v} = [\begin{matrix} σ_{v_{m 0}}^{2} & σ_{v_{m 0} v_{m 1}} & σ_{v_{m 0} v_{m 2}} \\ σ_{v_{m 0} v_{m 1}} & σ_{v_{m 1}}^{2} & σ_{v_{m 1} v_{m 2}} \\ σ_{v_{m 0} v_{m 2}} & σ_{v_{m 1} v_{m 2}} & σ_{v_{m 2}}^{2} \end{matrix}]$ .

The matrix form of the model for the k^th outcome of individual i is

y_{i k}^{l} ∣ (C_{i} = m) = X_{i, l} * β_{m k} + Z_{i} * v_{m i}^{l} + ϵ_{m i k}^{l},

(7)

where $y_{i k}^{l} = {(y_{i 0 k}^{l}, y_{i 1 k}^{l}, \dots, y_{i J k}^{l})}^{T}$ , β_mk = (β_m00k, β_m10k, β_m20k, β_m01k, β_m11k, β_m21k)^T, $v_{m i}^{l} = {(v_{m 0 i}^{l}, v_{m 1 i}^{l}, v_{m 2 i}^{l})}^{T}$ , $ϵ_{m i k}^{l} = {(ϵ_{m i 0 k}^{l}, ϵ_{m i 1 k}^{l}, \dots, ϵ_{miJk}^{l})}^{T}$ . Specifically,

X_{i, l} = (\begin{matrix} 1 & 0 & 0 & D_{i} (l) & 0 & 0 \\ 1 & t_{i 1} & t_{i 1}^{2} & D_{i} (l) & t_{i 1} * D_{i} (l) & t_{i 1}^{2} * D_{i} (l) \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 1 & t_{i J} & t_{i J}^{2} & D_{i} (l) & t_{i J} * D_{i} (l) & t_{i J}^{2} * D_{i} (l) \end{matrix}), Z_{i} = Z = (\begin{matrix} 1 & 0 & 0 \\ 1 & t_{i 1} & t_{i 1}^{2} \\ ⋮ & ⋮ & ⋮ \\ 1 & t_{i J} & t_{i J}^{2} \end{matrix}) .

When the errors $ϵ_{mijk}^{l}$ are independent of each other over i, j, k, the correlations among endpoints at the same or different times are induced by the shared random effects $v_{m i}^{l}$ . Correlations among longitudinal repeated measurements for the same endpoint are induced by the random effects $v_{m i}^{l}$ as shown in Eqn 7. In Eqn 6, $b_{m i k}^{l}$ for different endpoints contains the same $v_{m i}^{l}$ , which induces the correlations among different endpoints. The correlations for all endpoints at all time points within the same subject induced by the random effects $v_{m i}^{l}$ could also be seen by examining the specific form of Σ_mi in Eqn 4, in which $Σ_{m i} = (Z_{i} Σ_{m v} Z_{i}^{T}) \otimes (1_{K} 1_{K}^{T}) + V_{m}$ . A combination of flexible specifications of design matrix Z_i and variance-covariance matrix Σ_mv for random-effects $v_{m i}^{l}$ and V_m for residuals can generate complex forms of Σ_mi. For example, variance-covariance matrix and correlation matrix of repeated measures are allowed to differ across endpoints in Σ_mi.

Based on the above X_i,l and Z_i, we can obtain the marginal distribution using Eqn 4. Compared with specifying a general variance matrix for Σ_mi, this model reduces the number of nuisance parameters from $(\begin{matrix} (J + 1) K + 1 \\ 2 \end{matrix})$ down to K + 6. If J = 2 and K = 6, then the number of nuisance covariance parameters dropped from 171 to 12.

Principal causal effects

Principal causal effects (PCE) are defined as the ITT effects of the treatment within subpopulations defined by compliance behavior. In many RCTs including the AHJ study, subjects were followed-up at equal time intervals (t_ij = t_j). Our interest is the visit-specific PCEs for compliers, which are:

I T T_{c, j k} = E (y_{i j k}^{1} - y_{i j k}^{0} ∣ C_{i} = c)

(8)

For the above illustrative example with the level-1 and level-2 model specified in Eqns 5 and 6, respectively, we have $y_{i j k}^{l} ∣ (C_{i} = m) = β_{m 00 k} + β_{m 10 k} t_{j} + β_{m 20 k} t_{j}^{2} + β_{m 01 k} D_{i} (l) + β_{m 11 k} D_{i} (l) t_{j} + β_{m 21 k} D_{i} (l) t_{j}^{2} + v_{m 0 i}^{l} + v_{m 1 i}^{l} t_{j} + v_{m 2 i}^{T} t_{j}^{2} + ϵ_{mijk}^{l}$ . For participants, the mean response for the k^th outcome at visit j is:

E (y_{i j k}^{1} ∣ C_{i} = c) = β_{c 00 k} + β_{c 10 k} t_{j} + β_{c 20 k} t_{j}^{2} + β_{c 01 k} + β_{c 11 k} t_{j} + β_{c 21 k} t_{j}^{2},

E (y_{i j k}^{0} ∣ C_{i} = c) = β_{c 00 k} + β_{c 10 k} t_{j} + β_{c 20 k} t_{j}^{2},

E (y_{i j k}^{1} ∣ C_{i} = n) = E (y_{i j k}^{0} ∣ C_{i} = n) = β_{n 00 k} + β_{n 10 k} t_{j} + β_{n 20 k} t_{j}^{2} .

Therefore, $I T T_{c, j k} = E (y_{i j k}^{1} ∣ C_{i} = c) - E (y_{i j k}^{0} ∣ C_{i} = c) = β_{c 01 k} + β_{c 11 k} t_{j} + β_{c 21 k} t_{j}^{2}$ , $I T T_{n, j k} = E (y_{i j k}^{1} ∣ C_{i} = n) - E (y_{i j k}^{0} ∣ C_{i} = n) = 0$ . ITT_c,jk and ITT_n,jk estimate the effect of treatment assignment for compliers and never-takers respectively. For compliers, the treatment received is the same as the treatment assigned. Thus, ITT_c,jk also estimates the complier average causal effect of the treatment received. On the other hand, ITT_n,jk compares potential outcomes which result from actually receiving the control regardless of treatment assignment. Thus ITT_n,jk always equals to zero under exclusion restriction assumption. Under randomization assumption, there should be no difference at baseline for compliers between two groups. Thus, we expect that β_c01k be zero, and β_c11k and β_c21k jointly determine the CACE for the k^th outcome. The CACE is null for the k^th outcome if both β_c11k and β_c21k equal to zero.

2.2.2 |. Model for compliance status C

Given the baseline covariates W, we can model the probability of being a complier using a logistic regression model as

p_{c i} = P r (C_{i} = c ∣ W_{i} = w_{i}, α) = \frac{e x p (w_{i}^{'} α)}{1 + e x p (w_{i}^{'} α)} .

(9)

As noted before, compliance status (complier or never-taker) is observed for participants assigned to the treatment group, but is unobserved for those in the control group. Therefore, the compliance model Eqn 9 can not be estimated directly on the entire sample.

Finally, when K = 1, the multivariate CACE (MCACE) model proposed above reduces to the univariate CACE (UCACE) model, which is akin to Yau and Little.¹³ One can apply UCACE to analyze each endpoint separately. Unlike MCACE, the UCACE method ignores and does not pool the information from multiple correlated endpoints to sharpen the estimation of compliance status and CACE.

2.3 |. Estimation and Inference

Let $Y_{o b s, i} = a_{i} Y_{i}^{1} + (1 - a_{i}) Y_{i}^{0}$ , d_i = D_obs,i = D_i(a_i). We denote Y_obs as the vector of observed outcomes collecting all the Y_obs,i over i and D_obs as a N × 1 vector with the i^th element equal to d_i. In our case, there are three combinations for (a_i, d_i): (1,1), (1,0) and (0,0). We use $S (1, 1)$ , $S (1, 0)$ and $S (0, 0)$ to indicate the subsets of units exhibiting each pattern of (a_i, d_i). In our case where the population only includes compliers and never takers, $S (1, 1)$ and $S (1, 0)$ include the compliers and never-takers, respectively, in the treatment group and $S (0, 0)$ represents a mixture of compliers and never-takers in the control group. Let π = (β_c, β_n, Ψ_c, Ψ_n, α) denote the vector collecting all model parameters and X denote the vector collecting all X_i over i, the likelihood function based on observed data for all participants in the study is

L (π; Y_{o b s}, D_{o b s}, A_{o b s} ∣ W, X) = L_{11} \times L_{10} \times L_{00}

where

L_{11} = \prod_{{i \in S (1, 1)}} p_{c i} \frac{1}{{(2 π)}^{\frac{(J + 1) K}{2}} {| Σ_{c} |}^{\frac{1}{2}}} exp {- \frac{1}{2} {(y_{o b s, i} - μ_{c i}^{1})}^{T} Σ_{c}^{- 1} (y_{o b s, i} - μ_{c_{i}}^{1})},

L_{10} = \prod_{{i \in S (1, 0)}} (1 - p_{c i}) \frac{1}{{(2 π)}^{\frac{(J + 1) K}{2}} {| Σ_{n} |}^{\frac{1}{2}}} exp {- \frac{1}{2} {(y_{o b s, i} - μ_{n i}^{1})}^{T} Σ_{n}^{- 1} (y_{o b s, i} - μ_{n i}^{1})},

L_{00} = \prod_{{i \in S (0, 0)}} [p_{c i} \frac{1}{{(2 π)}^{\frac{(J + 1) K}{2}} {| Σ_{c} |}^{\frac{1}{2}}} exp {- \frac{1}{2} {(y_{o b s, i} - μ_{c_{i}}^{0})}^{T} Σ_{c}^{- 1} (y_{o b s, i} - μ_{c_{i}}^{0})} + (1 - p_{c i}) \frac{1}{{(2 π)}^{\frac{(J + 1) K}{2}} {| Σ_{n} |}^{\frac{1}{2}}} exp {- \frac{1}{2} {(y_{o b s, i} - μ_{n i}^{0})}^{T} Σ_{n}^{- 1} (y_{o b s, i} - μ_{n i}^{0})}] .

The maximum likelihood estimates (MLEs) of model parameters can be obtained by maximizing the observed data log-likelihood function via the Quasi-Newton algorithm for function optimization. The starting values are chosen based on the result from univariate ITT analysis conducted for six outcomes separately. Different starting values are tried and the algorithm converges to the same result. The variance of the estimates can be obtained via the inverse Hessian matrix of the log-likelihood function evaluated at the MLEs. The estimates of PCEs in Eqn 8 can be obtained by plugging in the MLE model parameter estimates with the standard errors of PCE estimates obtained using the delta method.

3 |. SIMULATION STUDY

We compare the MCACE model proposed above with the alternative approach of fitting separate univariate CACE (UCACE) models in their performance of treatment effect estimation and inference with longitudinal observations of multiple study endpoints. Our comparison evaluates the consistency and variability of CACE estimates, width and coverage rate of confidence intervals as well as the power of hypothesis testing for CACE.

3.1 |. Description of data generation

We simulated data from the MCACE model as specified in Eqns 5, 6 and 9 with six endpoints (K = 6) at three time points (J = 2) for n individuals, where n = 100, 200 or 500. To simplify the simulation setting, the model is set as a random-intercept and fixed-slope model, which means $b_{m 2 i k}^{l} = 0$ in Eqn 5, v_c1i = 0 in Eqn 6 and the level-2 model only include the first two rows in Eqn 6. We further set β_c01k = 0 (i.e., no baseline difference between two randomized arms). The compliance status model (Eqn 9) includes only intercept, i.e., p_ci = p_c.

Under the above linear trend model, β_c11k informs complier average causal effect on the k^th endpoint, which is of primary interest. Recall β_c11 = (β_c111, ⋯, β_c116)^T. For ease in result presentation, we set β_c11 as a vector of the same value in the simulation. When evaluating the consistency and efficiency of the MLEs and confidence intervals of β_c11, we choose the common value as 0, 1.5 and 3. However, the treatment effect size for each endpoint also depends on the variance of the endpoint, thus differs across endpoints because variances vary over endpoints (Web Appendix A.1 in the supporting information). True values of other parameters in the multi-level MCACE model are informed by the AHJ data and can be found in Web Appendix A.1. Last, we simulated compliance status from a Bernoulli distribution with p_c = 0.3. A detailed description of the data generating process can be found in Web Appendix A.1.

3.2 |. Simulation results

Because the complier average causal effect is of our primary interest, we focus on the results for the CACE effects captured by {β_c11k}.

3.2.1 |. Point estimate

Figure 2 shows the sample means, sample standard deviations of estimates and asymptotic standard error estimates for β_c11k’s from fitting the MCACE model (red lines) and multiple UCACE models (green lines) under different sample sizes (listed as column headings) and different effect sizes (listed as row headings). The dashed lines indicate the true values in different settings. These results are obtained based on 500 repetitions. Web Table 2 in the supporting information presents the same result in a tabular format.

Means ± standard deviations (wide bar) of MCACE and UCACE estimates of β_c11k as well as ± mean of standard error estimates (narrow bar) computed from the Fisher information matrix of the MCACE model and UCACE model over 500 replications.

For both MCACE and multiple UCACE models, the sample means of estimates are close to the corresponding true values. This verifies the consistency of MLEs. Figure 2 also shows that the sample standard deviations of the estimates from MCACE model are almost half of that yielded by multiple UCACE models. Therefore, we conclude that the MCACE model significantly improves the estimation efficiency compared with UCACE analyses. Furthermore, the means of standard error estimates (red narrow bars) produced by Fisher information are almost identical to the true standard deviations (red wide bars) of the MCACE estimates. When performing the UCACE analysis of each endpoint separately, most means of standard error estimates (green narrow bars) produced by Fisher information are noticeably smaller than their true values (green wide bars) when sample size equals to 100. However, as sample size increases, the means of standard error estimates produced by Fisher information become closer to their true values. This is expected because the standard error estimator produced by Fisher information approximates the true standard error well when sample size is large enough, and may perform poorly when sample size is small. We do not observe inaccurate estimation of standard errors from MCACE model because MCACE model analyzes six outcomes simultaneously and get estimates based on larger datasets.

3.2.2 |. Confidence interval

In MCACE model, under appropriate regularity conditions, the MLE of β_c11 has asymptotic normality, for n → ∞, $\sqrt{n} ({\hat{β}}_{c 11} - β_{c 11}) \overset{d}{\to} M V N (0, {[I (β_{c 11})]}^{- 1})$ , where I(β_c11) is the Fisher information. Based on the asymptotic normality, we are able to calculate simultaneous confidence intervals and use Bonferroni correction to ensure that the probability of all confidence intervals contain their true values is no less than 1 − α. For β_c11k, we could get a confidence interval as $t_{k}^{'} {\hat{β}}_{c 11} \pm c \sqrt{t_{k}^{'} ({[I_{n} ({\hat{β}}_{c 11})]}^{- 1}) t_{k}}$ , where t_k is a six-dimensional column vector with the k^th element being 1 and all other elements being 0, and $I_{n} ({\hat{β}}_{c 11})$ is Fisher information for n independent units and is estimated by the inverse Hessian matrix of the log-likelihood for the sample of n units. When conducting UCACE analysis, the confidence intervals are constructed as ${\hat{β}}_{c 11 k} \pm c \sqrt{1 / I_{n} ({\hat{β}}_{c 11 k})}$ , where ${\hat{β}}_{c 11 k}$ and $I_{n} ({\hat{β}}_{c 11 k})$ are obtained from UCACE analysis on the kth endpoint only and are generally different from those calculated from MCACE. For both MCACE model and UCACE model, c is the critical value and is set as Z_1−α/(2k) based on Bonferroni correction.

Figure 3.a shows the distribution of the length of 95% confidence intervals based on 500 simulated datasets. For both MCACE model and UCACE models, the average length of the 95% confidence intervals decreases and the distributions of the length become more concentrated as sample size increases. With the same sample size, the confidence intervals from MCACE model are shorter than those from UCACE models.

Plot 3.a describes the distribution of the length of 95% confidence intervals; plot 3.b shows coverage rate of 95% confidence intervals, the dashed line corresponds to the value of 0.95. The results from both figures are based on 500 repetitions.

Figure 3.b plots the coverage rates of confidence intervals from the MCACE model and multiple UCACE models. The coverage rate is calculated as the proportion of times that all six confidence intervals include their corresponding true values simultaneously. We observe that the coverage rates calculated from the MCACE model are higher than those from the multiple UCACE models and are closer to the nominal 95% rate. However, as sample size increases, the coverage rates from UCACE models improves and get closer to the 95% nominal rate. Similar with the point estimate, the improvement in UCACE performance as sample size increase can be explained by the asymptotic property of the MLE which requires large sample sizes for MLEs to perform well. MCACE model makes inference by analyzing six outcomes at one time, which makes use of more information from a larger number of observations. Thus the coverage rates from MCACE are closer to the nominal 95% rate and perform well for the number of subjects as small as 100.

3.2.3 |. Statistical power

We also compare the statistical power of the MCACE model with that of multiple UCACE models. Power is the probability of rejecting the null hypothesis when the null hypothesis is false. When conducting power analysis for MCACE model, we consider the global null hypothesis as H₀ : β_c11k = 0 for all k, and calculate the proportion of times of rejecting the null hypothesis among 500 simulated data sets. The likelihood ratio statistic is

λ = - 2 ({l_{reduced} |}_{{\hat{π}}_{r}} - {l_{full} |}_{{\hat{π}}_{f}}),

where l is the log-likelihood, ${\hat{π}}_{r}$ and ${\hat{π}}_{f}$ are the MLEs of model parameters obtained from the reduced model and the full model, respectively. The full model consists of all parameters and reduced model sets β_c11k = 0 for all k. Under H₀, the test statistic λ follows asymptotically a chi-square distribution with a degree of freedom K. Because UCACE model analyzes each endpoint separately, UCACE analysis does not offer a single global test for H₀ : β_c11k = 0 for all k, as MCACE analysis does. Thus, when conducting the power analysis of multiple UCACE models, we analyze and conduct a likelihood-ratio test of zero complier average causal effects for each endpoint separately with Bonferonni’s adjustment for multiple tests to control the overall Type-I error rate at 0.05, which means if any of these K hypotheses is rejected at the significance level 0.05/6 ≈ 0.008, we conclude the CACE is present for at least one endpoint. Because of high efficiency of MCACE estimation and conservativeness of Bonferonni’s adjustment, we expect the Bonferonni’s adjustment for multiple testing employed in UCACE analysis can have substantially inflated Type-2 error rates (i.e., low power), as compared with the global likelihood test available in MCACE analysis.

Figure 4 plots the power curves under different sample sizes. To ease the result presentation, we set the values of β_c11k to be the same across k and vary from 0 to 15 when simulating data. When β_c11k’s values are equal to 0, the value of the power function is the Type I error rate. We observe that Type I errors for MCACE from the three plots are around 0.05, which means the global likelihood ratio test provided by MCACE controls Type I error rate well in our simulation settings. It’s worth noticing that the type I error under UCACE models when sample size equals to 100 equals to 0.06, which is a little higher than 0.05. This is consistent with our earlier results where standard error estimator tends to give inaccurate estimates under UCACE models when sample size is small. As the sample size increases, the power reaches to 1 for both the MCACE model and multiple UCACE models. However, the power curves of the MCACE model consistently have a steeper slope than the corresponding power curves of multiple UCACE models and reach to 1 sooner as the effect size increases. The increase in study power can be substantial. For example, when β_c11k = 5 and n = 100, the power can be increased from 0.46 when using multiple UCACE analysis to 0.90 when conducting the MCACE analysis. Thus, MCACE model can lead to a 100% increase in the power to reject the null compared with the separate UCACE analysis.

Power analysis, based on 500 simulated datasets.

Overall, the simulation results demonstrate that MCACE model outperforms multiple UCACE models in terms of the efficiency of point estimates for CACE, nominal rate and width of confidence intervals and the power of hypothesis testing.

4 |. APPLICATION

4.1 |. Study description and preliminary analysis

In this section, we apply the proposed model to estimate the CACE of Arthritis Health Journal (AHJ). The study is a randomized clinical trial comparing the AHJ with the usual care in managing rheumatoid arthritis (RA). AHJ is a patient-centered online tool to help patients track symptoms, monitor disease activity and develop action plans.⁹ By helping RA patients better monitor their disease activity, this tool aims to facilitate the treat to target approach by providing early signs when the disease is not controlled.

A total of 94 patients were randomly assigned to two groups. Patients in the first group (n = 45) were provided with online access to AHJ (the intervention) immediately; patients in the second group (n = 49) received usual care (control) for 6 months at which time point they were provided with online access to AHJ. We illustrate the proposed methodology using 6-months data of the study, during which period the second group served as the control to the intervention. When they began the intervention, participants were provided with online access to the AHJ and were asked to use it for 6 months. They were evaluated every three months using a self-administered questionnaire. The baseline questionnaires collected information about the demo-graphics and disease information. The follow-up questionnaires evaluated the frequency of using the tool, satisfaction with care, self-management, consumer effectiveness and health status. The study has the following 6 endpoints on which to evaluate the treatment effects of using AHJ : effective consumer 17 scale, the overall score of questions about how patients manage their disease on a 0 to 100 scale with 100 indicating “most confident”; manage symptoms scale, the overall score of questions about how patients manage their symptoms on a 0 to 10 scale with 10 indicates “totally confident”; manage disease in general scale, the overall score of questions about how patients manage their disease in general on 0 to 10 scale with 10 indicates “totally confident”; communicate with physician scale, the overall score of patients’ confidence in communicating with their rheumatologists on a 0 to 10 scale with 10 indicates “totally confident”; partners in health scale, the overall score of patients’ knowledge of disease and treatment on a 0 to 80 scale with 80 indicates “poor self-management”; satisfaction with various aspects of medical care, the overall score of their satisfaction with the content and format of the tool on a 0 to 10 scale with 10 indicates “completely satisfied”. Because these six endpoints are of different scales, we rescaled them all on the 0 to 100 scale. For the fifth endpoint, a higher value represents a worse outcome. We thus redefine the endpoint as 100 minus the original value so that a higher value represents a better outcome for all endpoints.

Figure 5 plots the means and standard errors of means for six endpoints by treatment arm and visit. We observe that the AHJ intervention group (green lines) had comparable baseline values for all endpoints as the control group (red lines) and that the AHJ group appeared to have higher average values than the control group consistently for all endpoints at the two post-intervention visits. However, the standard error bars are wide. Analyzing these endpoints separately showed none of the group differences at the sixth month was statistically significant (full results are available in Web Table 1 in the supporting information).⁹ This suggests examining multiple endpoints simultaneously in order to pool similar treatment effects across endpoints to increase study power. Furthermore, many in the intervention group rarely used the AHJ for a variety of reasons. Consequently, the program effectiveness estimated by the ITT analysis can be substantially smaller than the treatment efficacy, the latter of which is often of more interest to patients and caregivers. Figure 5 also plots the raw statistics for the compliers in the treatment group. These compliers consist of patients who used the AHJ at least one time per month on average within six months after randomized to the intervention group. We observe that the upward trends in endpoint measurements for compliers in the treatment group (blue lines) appear to be larger than those for the overall treatment group, especially for the fourth, fifth and sixth endpoints. Overall, the moderate sample size, low compliance rate and moderate beneficial treatment effect sizes across multiple endpoints motivated us to perform a multivariate longitudinal analysis of treatment efficacy. Such analysis aims to maximize the power to detect the overall treatment efficacy by pooling CACE estimation across all endpoints over longitudinal measurement occasions.

Means and standard errors for means in treatment group, control group and the compliers in treatment group for each of the six endpoints.

Our CACE analysis also considers the following baseline covariates: DiseaseDuration: an indicator variable for early disease (having RA for no more than two years); Disease Activity: an indicator variable for high disease activity (high RAPID4 values) with the reference level including remission and moderate/low RAPID4 values; Gender: an indicator variable for male; Age: an indicator variable for older than the median age (54.5). These baseline variables are well balanced between the intervention and control groups (Table 1). In contrast, the compliers in the intervention group had longer RA duration, higher disease activity, and were younger and all-female (Table 1). Table 1 also reports the missing data patterns. There were a moderate amount of dropouts and a small amount of intermittent missingness in both treatment arms. Our CACE analysis employs the likelihood approach, which has the benefit of yielding valid inference under the more general missing data mechanism (missing at random) than missing completely at random. Interestingly, there were no dropouts or intermittent missingness for the compliers, another indication of inherent differences between the subgroup of compliers and the overall study population. Thus the conventional AT analysis that directly compares the compliers in the treatment group with those untreated will be confounded by these inherent differences and be biased for treatment efficacy. The CACE analysis overcomes this limitation of the AT analysis.

TABLE 1.

Baseline characteristics and missingness patterns during follow up visits

Covariates^*	Control group (N = 49)		Treatment group (N = 45)		Compliers (Treatment) (N = 15)
Covariates^*	n	%	n	%	n	%
Disease Duration (Early)	6	12.2	5	11.1	1	6.7
Disease Activity (High)	38	77.6	33	73.3	13	86.7
Gender (Male)	5	10.2	6	13.3	0	0
Age (>54.5)	25	51.0	22	48.9	6	40.0
Missing data pattern^†	0 (1): presence (absent)
000	41	83.7	33	73.3	15	100
011	2	4.1	9	20.0	0	0
001	5	10.2	1	2.2	0	0
010	1	2.0	2	4.4	0	0

Open in a new tab

The covariates are binary and are reported as n and Percentage for the category indicated in the parenthesis.

^†

The 3-digit indicator represents the missing data pattern for outcomes at baseline, months 3 and 6.

4.2 |. MCACE analysis

We analyzed the AHJ data using the method proposed in Section 2. Figure 5 suggests the possibility of quadratic time trends for the study endpoints for the compliers in treatment, which motivated us to start with a quadratic time trend in the submodel (Eqns 5 and 6) for our MCACE analysis. This submodel corresponds to a saturated time effect model for three visits. For the submodel of compliance status (Eqn 9), disease duration, disease activity, gender and age were included in W to predict the probability of being a complier. Using likelihood ratio tests and AIC statistics, we conducted model selection to select a parsimonious and reasonable model (see Web Table 4) and chose MCACE.M7 as our best model for MCACE analysis. The estimation results from the model MCACE.M7 are presented in Web Table 5. In model MCACE.M7, the fixed effects parameters on quadratic trends for compliers (β_c20k and β_c21k) and never takers (β_n20k) were all no different from zero and were thus dropped while keeping the random effects of quadratic trends for both two compliance strata ( $σ_{v_{c 2}}^{2} \neq 0$ and $σ_{v_{n 2}}^{2} \neq 0$ ). Besides, based on the likelihood ratio test, the set of β_c01k parameters were no different from zero, which is expected in an RCT.

As a comparison, we also conducted the UCACE analysis by performing CACE estimation for the endpoints one by one. The model specification for each endpoint was the same as that in the model MCACE.M7, but unlike MCACE, the UCACE analysis ignored the correlations among endpoints. Thus, the UCACE analysis did not borrow information across multiple endpoints as MCACE did, when attempting to identify compliers from the never-takers in the control group which consists of a mixture distribution of these two subgroups of patients. Consequently, we expect a reduced power to detect the presence of treatment efficacy for UCACE as compared with MCACE. The estimates of the 6-month treatment efficacy $(2 {\hat{β}}_{c 11 k})$ from both UCACE and MCACE analysis are reported in Table 2, where β_c11k represents average treatment difference at the 3rd month in the k^th outcome for compliers in treatment group. We observe the treatment efficacy estimates from the MCACE model are different from those from multiple UCACE models. The UCACE analysis shows that half of the estimates point to a harmful treatment effect in UCACE models, although none of them is statistically significant. However, in the MCACE model, all estimates except the one for the 3rd endpoint point to beneficial treatment effects. This observation that MCACE model and UCACE models give different directions of treatment effects is possible because of the large variability of these estimates. Consistent with the results from simulation studies, the standard errors from MCACE analysis are smaller than those from UCACE analysis except for the first two endpoints. As noted in the simulation study, the standard error estimator via Fisher information gives inaccurate estimates in UCACE when sample size is as small as 100. Therefore, it is likely that the true standard errors from the MCACE analysis are all no greater than those from the UCACE analysis for all endpoints.

TABLE 2.

Estimates and standard errors for causal treatment effects at six month

Outcome	CACE ^†						ITT ^‡			AT ^‡
Outcome	MCACE			UCACE			ITT ^‡			AT ^‡
	est	se	p-value	est	se	p-value	est	se	p-value	est	se	p-value
1	0.324	2.275	0.887	−0.410	1.857	0.825	1.262	2.096	0.547	0.232	3.178	0.942
2	1.706	4.379	0.697	−1.180	3.568	0.741	0.711	3.198	0.824	1.628	5.720	0.776
3	−4.005	4.379	0.360	7.810	8.213	0.342	−1.625	2.738	0.553	−1.283	4.724	0.786
4	17.754	6.329	0.005	16.383	7.173	0.022	6.413	3.189	0.044	12.858	5.681	0.024
5	1.834	4.375	0.675	−2.125	5.830	0.716	0.408	2.540	0.872	5.609	4.223	0.184
6	15.647	5.457	0.004	14.965	5.885	0.011	4.634	3.196	0.147	16.013	5.613	0.004
overall p-value			0.008			-			0.294			0.028

Open in a new tab

^†

The estimates from CACE models are for parameters 2 * β_c11k

^‡

The estimates from ITT analysis and AT analysis are for 2 * β_11k

Table 2 also reports estimation results from ITT analysis and AT analysis. Both ITT and AT analyses employ hierarchical random-effects models with fixed-effects linear time trends for all longitudinal endpoints that pool information across endpoints. However, unlike the MCACE model, they do not model the partially observed compliance behavior. Instead, they compare the outcome trajectories between either the treatment assigned for ITT analysis or treatment received in AT analysis (Web Appendix A.2). Thus ITT and AT analyses generally do not yield consistent estimates for treatment efficacy as MCACE and UCACE do. Table 2 shows appreciable differences between estimates from MCACE and ITT, especially for endpoints 4 and 6 while AT estimates are relatively closer to those from MCACE.

We next move to the hypothesis testing of treatment effects on the six endpoints in the AHJ study. A hypothesis-testing strategy to control an inflated Type I error rate in RCTs with multiple endpoints is to first conduct a global test of no treatment effects for all endpoints and proceed to examine the individual endpoint if the global test rejects the null hypothesis of no treatment effects for all endpoints. For this purpose, we conduct multivariate Wald global tests of treatment differences for MCACE, ITT and AT. The UCACE analysis does not provide such a global test since it analyzes each endpoint separately. The null hypothesis for the global test in MCACE analysis is that population mean differences between treatment received among compliers are zeros for all six endpoints simultaneously (i.e., β_c11k = 0 for k = 1, ⋯, 6) whereas the global null hypothesis for ITT and AT is the population mean differences between treatment assigned for ITT and treatment received for AT are zeros for all endpoints (i.e., β_11k = 0 for k = 1, ⋯, 6), respectively. The last row in Table 2 reports the p-values from the global test for MCACE, ITT and AT. Both MCACE and AT analysis rejected the global null hypothesis (p-value < 0.05) while ITT failed to reject the global null hypothesis.

Given that MCACE rejected the global null hypothesis, we conclude that there were non-zero CACEs for at least one endpoint and move to examine which endpoints have non-zero CACEs. We apply a Wald test for each endpoint separately with Bonferroni correction that sets the threshold value for statistical significance at 0.05/6=0.0083 for each test. We observe that MCACE analysis found statistically significant beneficial CACEs of using AHJ on the fourth endpoint (communication with a physician) and the sixth endpoint (satisfaction with medical care). In comparison, UCACE analysis failed to detect a treatment effect for any endpoint with a threshold value of 0.0083. We attribute the lack of power to detect treatment effects in UCACE to its loss of estimation efficiency because of its ignoring correlations among endpoints. Although AT analysis also rejects the global null hypothesis and finds statistical significance for endpoint 6 at the level of 0.0083, we note that its test result and the AT estimates for individual endpoints are confounded by subjects’ nonrandom compliance behavior and thus are generally biased for treatment efficacy.

We now turn to the estimation results of the compliance model. In the compliance model (Eqn 9), W includes disease duration, disease activity, gender and age at baseline with corresponding regression coefficients reported as α₁ to α₄ in Web Table 5. All these baseline variables are binary variables. The intercept estimate ${\hat{α}}_{0} = - 0.970$ indicates that a female patient under 54.5 years old with more than two years’ disease and low disease activity had a probability of 0.27 to be a complier. Recall that there is no male in compliers in the treatment group (Table 1). Thus we expect a large coefficient estimate for gender: indeed ${\hat{α}}_{3} = - 16.744$ in Web Table 5. In this case, the coefficients of other baseline variables represent the independent effects of these variables on the probability of being a complier in females only. For example, ${\hat{α}}_{2} = 1.053$ (p-value=0.140) implies that a female patient with high disease activity is more likely to be a complier than a female patient with low disease activity, holding other predictors constant. Overall these coefficient estimates seem to suggest that, within female participants, patients who are younger with longer disease duration and high disease activity were more likely to be a complier. Although only the coefficient estimate for disease activity approaches statistical significance, such analysis could be useful for understanding the characteristics of compliers and for predicting the compliance of participants.

Based on the compliance model, we calculate the probability of being compliers for participants in the control group. Table 3 reports the average probability of being compliers for participants in the control group from MCACE and UCACE models. UCACE analysis yields six different fitted compliance models and the average probability of being compliers in the control group ranges from 0.29 to 0.36. However, MCACE model is able to pool the information across all endpoints to provide one fitted compliance model. The improved accuracy in identifying compliers helps MCACE achieve higher accuracy in CACE estimation.

TABLE 3.

Average probability of being a complier for control group

Treatment group	Control group
Proportion of compliers	Aver.prob. from UCACE						Aver.prob. from MCACE
Proportion of compliers	1	2	3	4	5	6	Aver.prob. from MCACE
0.333	0.329	0.336	0.289	0.357	0.352	0.360	0.311

Open in a new tab

4.3 |. Alternative analysis

One issue different from the CACE analysis of multiple endpoints (our primary focus here) is the definition of compliers, which may not be clear-cut in all RCTs with treatment noncompliance. In the AHJ study, the expected benefit of AHJ is mainly the increased patient general self-awareness. Use of the AHJ tool is expected to lead to patients’ increased realization of uncontrolled symptoms, increased understanding of the patterns in how their disease worked or the connection between symptoms (e.g., pain) and day-to-day life events (e.g., sleep and medications), as well as more efficient and effective rheumatology consultation during visits to doctors. Regular use of the AHJ is needed to achieve the anticipated benefits of the tool, and it is believed that this requires a minimum of monthly usage of AHJ. Hence the analysis so far defines the compliers as those who would use the AHJ at least once per month on average during the study period if assigned to the treatment group. The definition implies never-takers in intervention group includes patients who used the AHJ too rarely (less than once per month on average) to experience treatment effect. Although the implication seems to be reasonable, the difference between never-takers in treatment group and never-takers in control group reduces the plausibility of the exclusion restriction assumption.

One approach to increasing the plausibility of exclusion restriction is to relax the definition of compliers. We conduct the following analyses using two alternative definitions of compliers. The first alternative definition (A1) defines compliers as patients who would use the AHJ at least once within six months if assigned to treatment group. With this definition, never-takers in both groups did not use the AHJ and the assumption of exclusion restriction is more plausible. The tradeoff is to mis-classify those patients who used AHJ rarely as compliers and consequently dilute the CACEs. The second alternative (A2) defines compliers as patients who would use the AHJ at least three times in the 6-month period after being assigned to treatment group. The estimation results using the two alternative definitions of compliers are reported in Web Table 6. The last row in Web Table 6 reports the p-values from the multivariate Wald global tests of overall treatment differences across all six endpoints from MCACE analysis. The p-values for the global tests are 0.119 using definition A1, 0.010 using definition A2 and 0.008 as reported in Table 2 using the original definition of compliers. The finding is consistent with the expectation that the CACEs could be diluted by the less strict definition of compliers, albeit with increased plausibility of exclusion restriction. However, regardless of the definition used to classify compliers, we find that the p-values for the global test from MCACE analysis are all smaller than the overall p-value of 0.294 from the ITT analysis reported in Table 2.

5 |. DISCUSSION

CACE is considered as more relevant for patient-oriented treatment effects of interest for RCTs under noncompliance. We propose a multivariate longitudinal potential outcome model with principal strata for latent compliance types to make inferences for CACE in longitudinal studies with multiple endpoints and treatment noncompliance. The method combines all data from correlated endpoints and over all longitudinal visits, and can substantially improve the estimation efficiency in RCTs with low compliance rate and moderate effect sizes on correlated endpoints. Simulation studies show significantly higher estimation efficiency for MCACE as compared with the UCACE analysis, including up to 50% smaller standard errors of CACE estimates. In the power analysis, we evaluate a single overall test of the null hypothesis of no treatment effect under the MCACE model, which produces a 1-fold increase in the power of rejecting the null hypothesis compared with the UCACE analysis. These results demonstrate the potential of the proposed MCACE method to improve the efficiency and accuracy of evaluating comparative effectiveness and to reduce financial and time costs of conducting patient-oriented research.

We apply the proposed MCACE model, multiple UCACE models, multivariate ITT and AT models to the study of Arthritis Health Journal. Examining the overall p-value in Table 2, both MCACE analysis and AT analysis show the presence of a significant overall treatment effect while ITT analysis does not. However, AT analysis violates the randomization assumption and its p-value is not reliable. Besides, under Bonferroni correction, none of the p-values for the CACE estimates of individual endpoints from the UCACE analysis exhibits statistically significant treatment effects, whereas the MCACE finds significant CACEs for two out of six endpoints. These findings demonstrate the impact that the efficient MCACE procedure can make in real-world RCTs.

In our level-1 model, we assume diagonal matrices for V_c and V_n, the variance-covariance matrix for the residuals given the random effects and compliance type, which means the correlations among six outcomes and all time points are attributed to random effects and compliance types. This assumption could be relaxed by specifying a structure for Φ_m (e.g., compound symmetric or auto-regressive). Our MCACE assumes the potential outcomes given the compliance type follow multivariate normal distributions. The parametric distributional assumption permits efficient estimation of CACE estimates at the expense of potential model misspecifications. Future work can relax this assumption. One approach is to consider more flexible distributions, such as the multivariate t-distribution.

The assumption of exclusion restriction is often invoked to sharpen the CACE estimation. The definition of compliers may not be clear in all RCTs with treatment noncompliance and may involve a trade-off between the plausibility of exclusion restriction and the accuracy in classifying compliers. Instead of considering all-or-none compliance, extending the proposed methodology to continuously-measured partial compliance could be considered in the future, which avoids the need to define a dichotomized compliance measure. A major challenge in the partial compliance approach is to find reasonable assumptions for model identification.⁶ Besides, with multiple endpoints, the exclusion restriction assumption may be more plausible for some of these endpoints than the remaining ones. Although it is not the focus of this work, the proposed method can be extended to relax the assumption of exclusion restriction for all endpoints.

Supplementary Material

supinfo

NIHMS1782678-supplement-supinfo.pdf^{(195.6KB, pdf)}

ACKNOWLEDGMENTS

We acknowledge the funding supports from BC Academic Health Sciences Network, NSERC Grant (RGPIN-2018-04313) and NIH grants (R01CA178061). We thank the Associate Editor, reviewers, Professor X. Joan Hu, and Professor Diane Lacaille for helpful comments and suggestions.

Footnotes

SUPPORTING INFORMATION

Additional supporting information can be found online in the Supporting Information section at the end of this article.

DATA AVAILABILITY STATEMENT

The code and data that support the findings in simulation study are available on request from the corresponding author. The data in the application section are not publicly available due to privacy or ethical restrictions.

References

1.Lee YJ, Ellenberg JH, Hirtz DG, Nelson KB. Analysis of clinical trials by treatment actually received: is it really an option?. Statist Med. 1991; 10(10): 1595–1605. [DOI] [PubMed] [Google Scholar]
2.Meier P. Compliance as an explanatory variable in clinical trials: comment. J Am Stat Assoc. 1991; 86(413): 19–22. [Google Scholar]
3.Sheiner LB, Rubin DB. Intention-to-treat analysis and the goals of clinical trials. Clin Pharmacol Ther. 1995; 57(1): 6–15. [DOI] [PubMed] [Google Scholar]
4.Steele RJ, Shrier I, Kaufman JS, Platt RW. Simple estimation of patient-oriented effects from randomized trials: an open and shut CACE. Am J Epidemiol. 2015; 182(6): 557–566. [DOI] [PubMed] [Google Scholar]
5.Xie H, Heitjan DF. Sensitivity analysis of causal inference in a clinical trial subject to crossover. Clin Trials. 2004; 1(1): 21–30. [DOI] [PubMed] [Google Scholar]
6.Baker SG, Kramer BS, Lindeman KS. Latent class instrumental variables: a clinical and biostatistical perspective. Statist Med. 2016; 35(1): 147–160. [correction 2019; 38(5): 901]. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Imbens GW, Angrist JD. Identification and estimation of local average treatment effects. Econometrica. 1994; 62(2): 467–475. [Google Scholar]
8.Baker SG, Lindeman KS. The paired availability design: a proposal for evaluating epidural analgesia during labor. Statist Med. 1994; 13(21): 2269–2278. [DOI] [PubMed] [Google Scholar]
9.Lacaille D, Carruthers E, As vB, et al. Proof of Concept Study of the Arthritis Health Journal: An Online Tool to Promote Self-Monitoring in People with Rheumatoid Arthritis. American College of Rheumatology Annual Meeting Abstract 2334 2015. [Google Scholar]
10.Jo B, Muthén BO. Modeling of intervention effects with noncompliance: a latent variable approach for randomized trials. In: Marcoulides GA, Schumacker RE., eds. New Developments and Techniques in Structural Equation Modeling Mahwah, NJ: Lawrence Erlbaum Associates. 2001. (pp. 57–87). [Google Scholar]
11.Mealli F, Pacini B. Using secondary outcomes to sharpen inference in randomized experiments with noncompliance. J Am Stat Assoc. 2013; 108(503): 1120–1131. [Google Scholar]
12.Mattei A, Li F, Mealli F, others. Exploiting multiple outcomes in Bayesian principal stratification analysis with application to the evaluation of a job training program. Ann Appl Stat. 2013; 7(4): 2336–2360. [Google Scholar]
13.Yau LH, Little RJ. Inference for the complier-average causal effect from longitudinal data subject to noncompliance and missing data, with application to a job training assessment for the unemployed. J Am Stat Assoc. 2001; 96(456): 1232–1244. [Google Scholar]
14.Rubin DB. Bayesian inference for causal effects: the role of randomization. Ann Stat. 1978; 6(1): 34–58. [Google Scholar]
15.Rubin DB. Randomization analysis of experimental data: the Fisher randomization test comment. J Am Stat Assoc. 1980; 75(371): 591–593. [Google Scholar]
16.Rubin DB. Comment: Neyman (1923) and causal inference in experiments and observational studies. Stat Sci. 1990; 5(4): 472–480. [Google Scholar]
17.Imbens GW, Rubin DB. Bayesian inference for causal effects in randomized experiments with noncompliance. Ann Stat. 1997; 25(1): 305–327. [Google Scholar]
18.Imbens GW, Rubin DB. Causal inference in statistics, social, and biomedical sciences. New York: Cambridge University Press. 2015. [Google Scholar]
19.Hirano K, Imbens GW, Rubin DB, Zhou XH. Assessing the effect of an influenza vaccine in an encouragement design. Biostatistics. 2000; 1(1): 69–88. [DOI] [PubMed] [Google Scholar]
20.Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982; 38(4): 963–974. [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supinfo

NIHMS1782678-supplement-supinfo.pdf^{(195.6KB, pdf)}

Data Availability Statement

[R1] 1.Lee YJ, Ellenberg JH, Hirtz DG, Nelson KB. Analysis of clinical trials by treatment actually received: is it really an option?. Statist Med. 1991; 10(10): 1595–1605. [DOI] [PubMed] [Google Scholar]

[R2] 2.Meier P. Compliance as an explanatory variable in clinical trials: comment. J Am Stat Assoc. 1991; 86(413): 19–22. [Google Scholar]

[R3] 3.Sheiner LB, Rubin DB. Intention-to-treat analysis and the goals of clinical trials. Clin Pharmacol Ther. 1995; 57(1): 6–15. [DOI] [PubMed] [Google Scholar]

[R4] 4.Steele RJ, Shrier I, Kaufman JS, Platt RW. Simple estimation of patient-oriented effects from randomized trials: an open and shut CACE. Am J Epidemiol. 2015; 182(6): 557–566. [DOI] [PubMed] [Google Scholar]

[R5] 5.Xie H, Heitjan DF. Sensitivity analysis of causal inference in a clinical trial subject to crossover. Clin Trials. 2004; 1(1): 21–30. [DOI] [PubMed] [Google Scholar]

[R6] 6.Baker SG, Kramer BS, Lindeman KS. Latent class instrumental variables: a clinical and biostatistical perspective. Statist Med. 2016; 35(1): 147–160. [correction 2019; 38(5): 901]. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Imbens GW, Angrist JD. Identification and estimation of local average treatment effects. Econometrica. 1994; 62(2): 467–475. [Google Scholar]

[R8] 8.Baker SG, Lindeman KS. The paired availability design: a proposal for evaluating epidural analgesia during labor. Statist Med. 1994; 13(21): 2269–2278. [DOI] [PubMed] [Google Scholar]

[R9] 9.Lacaille D, Carruthers E, As vB, et al. Proof of Concept Study of the Arthritis Health Journal: An Online Tool to Promote Self-Monitoring in People with Rheumatoid Arthritis. American College of Rheumatology Annual Meeting Abstract 2334 2015. [Google Scholar]

[R10] 10.Jo B, Muthén BO. Modeling of intervention effects with noncompliance: a latent variable approach for randomized trials. In: Marcoulides GA, Schumacker RE., eds. New Developments and Techniques in Structural Equation Modeling Mahwah, NJ: Lawrence Erlbaum Associates. 2001. (pp. 57–87). [Google Scholar]

[R11] 11.Mealli F, Pacini B. Using secondary outcomes to sharpen inference in randomized experiments with noncompliance. J Am Stat Assoc. 2013; 108(503): 1120–1131. [Google Scholar]

[R12] 12.Mattei A, Li F, Mealli F, others. Exploiting multiple outcomes in Bayesian principal stratification analysis with application to the evaluation of a job training program. Ann Appl Stat. 2013; 7(4): 2336–2360. [Google Scholar]

[R13] 13.Yau LH, Little RJ. Inference for the complier-average causal effect from longitudinal data subject to noncompliance and missing data, with application to a job training assessment for the unemployed. J Am Stat Assoc. 2001; 96(456): 1232–1244. [Google Scholar]

[R14] 14.Rubin DB. Bayesian inference for causal effects: the role of randomization. Ann Stat. 1978; 6(1): 34–58. [Google Scholar]

[R15] 15.Rubin DB. Randomization analysis of experimental data: the Fisher randomization test comment. J Am Stat Assoc. 1980; 75(371): 591–593. [Google Scholar]

[R16] 16.Rubin DB. Comment: Neyman (1923) and causal inference in experiments and observational studies. Stat Sci. 1990; 5(4): 472–480. [Google Scholar]

[R17] 17.Imbens GW, Rubin DB. Bayesian inference for causal effects in randomized experiments with noncompliance. Ann Stat. 1997; 25(1): 305–327. [Google Scholar]

[R18] 18.Imbens GW, Rubin DB. Causal inference in statistics, social, and biomedical sciences. New York: Cambridge University Press. 2015. [Google Scholar]

[R19] 19.Hirano K, Imbens GW, Rubin DB, Zhou XH. Assessing the effect of an influenza vaccine in an encouragement design. Biostatistics. 2000; 1(1): 69–88. [DOI] [PubMed] [Google Scholar]

[R20] 20.Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982; 38(4): 963–974. [PubMed] [Google Scholar]

PERMALINK

Assessing Complier Average Causal Effects from Longitudinal Trials with Multiple Endpoints and Treatment Noncompliance: an Application to a Study of Arthritis Health Journal

Lulu Guo

Yi Qian

Hui Xie

Summary

1 |. INTRODUCTION

2 |. NOTATION AND MODEL

2.1 |. Assumptions for complier average causal effect analysis

2.2 |. Models for outcomes and compliance

FIGURE 1.

2.2.1 |. The sub-model for Yl | C

An illustrative example

Principal causal effects

2.2.2 |. Model for compliance status C

2.3 |. Estimation and Inference

3 |. SIMULATION STUDY

3.1 |. Description of data generation

3.2 |. Simulation results

3.2.1 |. Point estimate

FIGURE 2.

3.2.2 |. Confidence interval

FIGURE 3.

3.2.3 |. Statistical power

FIGURE 4.

4 |. APPLICATION

4.1 |. Study description and preliminary analysis

FIGURE 5.

TABLE 1.

4.2 |. MCACE analysis

TABLE 2.

TABLE 3.

4.3 |. Alternative analysis

5 |. DISCUSSION

Supplementary Material

ACKNOWLEDGMENTS

Footnotes

DATA AVAILABILITY STATEMENT

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

2.2.1 |. The sub-model for Y^l | C