Statistical methods for testing carryover effects: A mixed effects model approach

S Gwynn Sturdevant; Thomas Lumley

doi:10.1016/j.conctc.2021.100711

. 2021 Feb 25;22:100711. doi: 10.1016/j.conctc.2021.100711

Statistical methods for testing carryover effects: A mixed effects model approach

S Gwynn Sturdevant ^a,^∗, Thomas Lumley ^b

PMCID: PMC8102872 PMID: 33997456

Abstract

Carryover, or the effects of treatment after it ceases, has been largely ignored in statistical literature except as a nuisance parameter. When testing for carryover, comparing cumulative incidence rates is biased when diagnosis is based on a noisy measurement crossing a threshold (such as in blood pressure) then followed by open-label treatment. This issue was raised in the context of preventing hypertension by the TROPHY trial. We show that modelling the noisy measurement itself using linear mixed effect models, then computing the expected proportion over the threshold, gives valid tests and consistent estimates. The key insight is that the data made unavailable by open-label treatment after diagnosis are missing at random. We demonstrate the analysis in simulations based on a large set of blood pressure measurements from a New Zealand healthcare organisation and show that properly specified random effects models accurately estimate carryover effects even in the presence of data censored at diagnosis.

Keywords: Measurement error, Longitudinal data, Censoring, Hypertension, Linear mixed model, Blood pressure, Diabetes, Cholesterol

1. Introduction

One goal of healthcare is prevention; “prevention is better than treatment” [1]. As such, some researchers want to investigate treatments that delay the onset of diseases, this is equivalent to measuring the carryover effects. Unfortunately, statistical methodology has not yet been developed that enables investigators to test a carryover hypothesis. We use Stephen Senn's Definition of carryover effects as the “residual effects” of treatment after it has ceased [2]. He also laments “that little has appeared... on the subject of modelling for carry-over [sic] that is more grounded in clinical and pharmacological reality” [2]. This article begins to fill this gap. Recently, three studies have attempted to measure carryover effects of pharmacological treatments [[3], [4], [5]] but all used faulty methodology. For the purpose of concreteness we focus on one, the TROPHY trial.

Rates of diabetes and hypertension are on the rise and responsible for increased death and disease [6]. To explore a prevention strategy to reducing rates of hypertension, AstraZeneca tested Candesartan for carryover effects. Since parallel-group trials with blinding are the gold standard for determining causality [7], TROPHY trial investigators believed it could effectively test a carryover hypothesis. They randomised 809 participants with blood pressure (BP) close to the 140/90 mm Hg threshold for hypertension to the treatment arm (Candesartan for 2 years followed by 2 years of monitoring) or placebo for 4 years. Fig. 1 plots a possible BP trajectory of a participant in the treatment arm of the TROPHY trial which involved BP measurements every 3 months for the duration of the study. Investigators diagnosed participants with hypertension when any 3 measurements were above the 140/90 mm Hg threshold. Cumulative incidence in the treatment arm was 53.2% and 63.0% in the placebo arm. The investigators concluded that “the effect of active treatment on delaying the onset of hypertension can extend up to 2 years after the discontinuation of treatment” [5].

Fig. 1 — One possible trajectory (of the many we simulated) of BP for a participant in the treatment arm of the TROPHY trial which concluded that the effect of treatment could be extended up to 2 years. Carryovers ( $Z_{i t_{j}}$ ) of lengths 0–2 years are plotted and randomisation occurs at time 0.

Other authors have criticised TROPHY's design. Meltzer (2006) said an “idiosyncratic primary endpoint seriously impairs external applicability.” Persell and Baker (2006) noted that cumulative diagnosis rates would differ even with identical underlying BP.

A person is diagnosed with hypertension when their long-term average BP exceeds a threshold. However, BP measurements vary for multiple reasons [[8], [9], [10], [11], [12], [13]]. Investigators must carefully consider this variation when analysing data as the instantaneous BP, not the long-term average, is used in most analyses, and was used in TROPHY. The long-term average BP requires multiple measurements over multiple days and is.

Not practical to implement. Noisy measurements complicates diagnosis of hypertension which occurs when long-term average BP crosses a threshold; localising when the threshold is crossed is challenging.

We can visualize the problem with the TROPHY design by simulating systolic BP ( $Y_{i t_{j}}$ for individual i at time t_j) as normally distributed around an individual-specific long-term average trend

Equation 1.

(1)

where $Y_{i t_{j}}$ is the BP measurement of person i at time t_j, a_i ~ Unif (125, 140), b_i ~ N (0, Σ), c_i estimates the treatment effects, d_i estimates the carryover, $X_{i t_{j}}$ is 1 if person i is on treatment at time t and 0 otherwise, and $Z_{i t_{j}}$ starts at 1 when someone stops treatment and decreases linearly to 0 over the predefined carryover period. TROPHY's inclusion criteria allowed for randomisation of participants with baseline BP close to the threshold of 140/90 mm Hg. To account for this, we modeled a uniform distribution on the individual random intercepts.

Fig. 2 visualises variation and its impact upon diagnosis using the threshold of 140 mm Hg. The plot on the left includes long-term average values for both treatment and control arms. The plot on the right includes both long-term and instantaneous BP for the treatment arm. In the plot on the right, two points show the impact of variation on instantaneous BP. At time 0 there are two instantaneous measurements: one slightly before treatment starts, the other after. At the first measurement, the participant has instantaneous BP in excess of the threshold due to variation despite the long-term average BP being below it. For the second measurement, because the long-term average BP is much lower, the instantaneous BP is not likely to be in excess of the threshold. Medicated participants have lower long-term average BP, so are less likely to have instantaneous BP in excess of the threshold.

In the TROPHY trial, hypertension was diagnosed when 3 measurements were above the threshold and the treatment arm had lower BP for the first two years. Random variation likely resulted in measurements in excess of the threshold in the control arm regardless of carryover and when the long-term average measurement was in fact below the threshold. Furthermore, simulations suggest an 80% Type I error rate for the null hypothesis of no carryover in the underlying data [14].

Another complication of the TROPHY trial is missing data. After diagnosis occurs, participants must be treated for hypertension and subsequent measurements cannot be used to determine the long-term average BP. This further impairs our ability to test a carryover hypothesis.

A more thorough discussion of the faults in TROPHY and carryover effects can be found in sturdevant and Lumley (2016) [15].

1.1. Solutions

We explored several methods to test a carryover hypothesis: a simple design, a rerandomisation design, methodology similar to the Prostate Cancer Prevention Trial (PCPT), and using a mixed effect model to effectively impute missing data.

In a simple design, comparing cumulative incidences in a parallel or a crossover design is ineffective at testing a carryover hypothesis [15]. In the parallel design, noisy measurements combined with treatment result in diagnosis rates which are consistently lower in the treatment arm regardless of carryover. In the crossover design, a nominal trend results in more diagnoses in the second period which skews estimates. In addition, a three arm trial with both parallel, crossover and control arms are also ineffective. sturdevant and Lumley (2016) [15] discuss this further.

In a rerandomisation design, participants are randomised to treatment or control at the beginning of the study, after the first period the remaining participants are rerandomised to treatment or control [16]. Applying this method to TROPHY would require investigators rerandomise remaining participants after treatment has ceased in the treatment arm, then analysing data. Nason suggests two methods for analysis: mixture models and the Maentel—Haenszel estimator. Our data violates one assumption for the mixture model, that there is no “period effect” [16]. In BP measurements, there is a non-negligible trend which results in more diagnosis in the second period [14,17]; this also caused the crossover design to fail [15]. The second approach uses the Maentel—Haenszel estimator. This estimator compares risks of participants with the same treatment in the past period but different treatments in the current period [18]. Investigators would compare risks of participants who both had treatment in the first period, but are currently on treatment or control. Unfortunately, our question differs: for people with different treatments in the past period but the same (control) treatment in the current period, how does the risk differ? It may be possible to modify the estimator to look at carryover, but it does not currently. Currently, rerandomisation is not a feasible solution to test a carryover hypothesis.

The PCPT was designed to determine if treatment with finasteride was effective at preventing prostate cancer [19]. The investigators randomised 18,882 men to treatment or placebo for 7 years. Finasteride impacts upon PSA levels which increases rates of diagnosis in the treatment arm. The investigators addressed this in two ways: all participants were required to have biopsies at the end of the study, and the investigators used different thresholds of PSA levels in the two arms of the study to determine if a biopsy was necessary; high levels were sent to a statistical centre for analysis [19].

Although extremely burdensome, PCPT methodologies could be implemented as a solution. Ambulatory BP (participants wear a cuff for an extended time which measures BP every 15–30 min) could be used a few years after the end of the trial to diagnose hypertension in all participants [20]. A wash-out period is necessary to remove any carryover effects.

Although extremely dubious, thresholds in the treatment and control arms could have been adjusted to allow similar diagnosis rates in both arms. An initial threshold in both arms of 140/90 mm Hg could be used. The first half of the trial would require more participants diagnosed in the treatment group, as their BP is artificially lowered. The second half requires more diagnosis in the control group. A further complication is variation in the impact of treatment on BP [21]. Diagnosing people with the highest BP with treatment may not equate to diagnosing people with the highest underlying BP. We believe that simpler methodologies exist.

As a simple design, a rerandomisation design, and methodology similar to the PCPT are either not feasible nor practical for this problem, the rest of this article will discuss a mixed effect model.

1.2. Mixed effect model

As we discussed in the introduction, analysis of data is complicated due to missing data. However, all data subsequent to diagnosis is missing deterministically based upon previous measurements. As missing data is based on previous measurements, the missing data is missing at random and a linear mixed model should give unbiased estimates of carryover parameters.

Mathematically, to discuss diagnosis times we define D_i to be the time person i is diagnosed as hypertensive. We let $Y_{i t_{j}}$ be the systolic BP of person i at time t_j and ${\tilde{Y}}_{i t_{j}}$ be the counterfactual trajectory of systolic BP of person i at time t_j without treatment after diagnosis. To discuss how data relate to diagnosis we use rule $f_{i t_{j}}$ laid out in trial protocol that defines when person i is diagnosed as hypertensive using $Y_{i t_{j}}$ . For example, the TROPHY trial diagnosed people with hypertension when 3 measurements were above the threshold. We have that ${\tilde{Y}}_{i t_{j}}$ = $Y_{i t_{j}}$ for all t_j ≤ D_i (until diagnosis occurs). After D_i they differ due to $f_{i t_{j}}$ : if person i is in the control group after diagnosis they will receive treatment per trial protocol, if not they will receive new treatment or more treatment after diagnosis. $f_{i t_{j}}$ results in measurements $Y_{i t_{j}}, t_{j}$ > D_i not adhering to Eq (2). The D_i can be seen as a stopping time for the stochastic process.

Equation 2.

(2)

where $Y_{i t_{j}}$ is the systolic BP measurement, a_i estimates the intercept for person i, β is the average trend, γ is the average treatment effect, δ is the average carryover, $X_{i t_{j}}$ is 1 if person i is on treatment at time t_j and 0 otherwise, and $Z_{i t_{j}}$ is 1 at the beginning of period 2, when treatment ceases, and decreases linearly to 0 over the predefined carryover period. Unfortunately, we can not access all $\tilde{Y_{i t_{j}}}$ as diagnosis occurs and subsequent data is missing due to ensuing treatment beginning when $f_{i t_{j}}$ = 1. To estimate parameters we analyse $Y_{i t_{j}}$ where data is available only until diagnosis. The unavailability of all $\tilde{Y_{i t_{j}}}$ results in biased parameter estimates if analysis is based on a linear model.

As our missing data is due to diagnosis based upon past measurements, and none of the parameters have any impact upon diagnosis, our data is missing at random [22]. We formalise in Theorem 1.1.

Definition

D_i = d if f_d = 1 and $f_{t_{j}}$ = 0 for all t_j < d.

Theorem 1.1

If we set $Y_{i t_{j}}$ to missing for t_j > D_i it is missing at random.

Proof. Diagnosing person i is based on $Y_{i t_{j}}$ in a deterministic manner defined by $f_{t_{j}}$ . D_i = d is completely determined by $Y_{i t_{j}, t_{j} \leq D_{i}}$ and therefore is independent of $Y_{i t_{j}, t_{j} > D_{i}}$ conditional on $Y_{i t_{j}, t_{j} \leq D_{i}}$ . If $Y_{i t_{j}, t_{j} > D_{i}}$ are treated as missing, they are missing at random.

If we coarsened $Y_{i t_{j}}$ then missing data after diagnosis would be missing at random only if diagnosis was deterministic given the coarsened data. For example, if $A_{i t_{j}}$ was the indicator that $Y_{i t_{j}}$ ≥ 140 missing data after diagnosis would be missing at random if $f_{t_{j}}$ defined hypertension by counting the number of measurements above the 140 mm Hg. It would not be missing at random if $f_{t_{j}}$ diagnosed hypertension by averaging measurements as the likelihood ceases to factor into independent products with only one containing parameters. For the rest of this article we refer to $Y_{i t_{j}, t_{j} \leq D_{i}}$ as $Y_{i t_{j}}$ .

If data missing after diagnosis are missing at random, the likelihood of the parameters factors into the data distribution and a function of the parameters for the probability of missingness. These two sets of parameters differ and are independent. The restricted maximum likelihood of the parameters is dependent on only the likelihood for the parameters; the probability of missingness is irrelevant. This likelihood for person i is:

l_{i} = \sum_{j = 1}^{M} (\frac{- 1}{2} \log | Σ | - \frac{1}{2} (Y_{i t_{j}} - (a_{i} + t_{i t_{j}} + X_{i t_{j}} + Z_{i t_{j}})) | Σ | (Y_{i t_{j}} - (a_{i} + t_{i t_{j}} + X_{i t_{j}} + Z_{i t_{j}})))

with measurement times t_j = t₁, t₂, ..., t_M. The likelihood for the whole data set is $L = \sum_{i = 1}^{N} l_{i}$ where there are i = 1, 2, ..., N participants.

We found the restricted maximum likelihood (REML) estimates by maximising $\int Ldθ$ where θ = (α, β, γ, δ)^T. This gave consistent estimates

(\begin{array}{l} a_{i} \\ b_{i} \\ c_{i} \\ d_{i} \end{array}) of (\begin{array}{l} α \\ β \\ γ \\ δ \end{array}) with distribution N [(\begin{array}{l} a_{i} \\ b_{i} \\ c_{i} \\ d_{i} \end{array}), Σ]

where $\sum$ is an unconstrained covariance matrix not assumed to be diagonal. We optimised the likelihood using the lmer function from the lme4 package in R.

With this model, we used a parametric bootstrap to find the risk ratio (rr) of hypertension. We define the rr as ”a measure of the risk of a certain event happening in one group compared to the risk of the same event happening in another group [23].” Mathematically, we have

Equation 3.

(3)

where TE is the number of people with hypertension in the treatment arm, TN the number of non-hypertensives in the treatment arm, CE is the number of people with hypertension in the control arm, and CN the number of non-hypertensives in the control arm. We bootstrapped the risk ratio for diagnosis of hypertension at the first time point after treatment ceases. In Fig. 1 we find the rr 2 years and 3 months after the start of the trial, when systolic BP with no carryover and all other lengths of carryover differs the most.

To find the true risk ratio of hypertension from the mixed model, Eq (2), we simulated 10⁴ participants with no measurement error ( $ε_{i t_{j}}$ = 0). When carryover is 0, at the first measurement after treatment ceased, the risk ratios were centred around 1. Risk ratios centred on 1 suggests that diagnosis of hypertension is equiprobable in the two arms: because there is no carryover, the effects of treatment cease immediately, with no residual benefits of treatment, diagnosis rates in the two arms should be similar.

To test a carryover hypothesis fit a mixed model using lmer; conduct a parametric bootstrap of the risk ratio for the first measurement after treatment ceases using bootMer and check to see if 1 is in the confidence interval. If it is, based upon our simulations there is a 99% probability that no carryover was in the model used to simulate the data. After some preliminary data analysis to ensure the assumptions of our models, the details follow in Section 2.

1.3. Exploring the correlation structure

Assumptions for a linear mixed model are: random effects in the data, normality of random effects, and errors are normal $ε_{i t_{j}}$ ~ N (μ = 0, σ² = σ²), and uncorrelated [24]. The first two assumptions show that when using linear mixed models, correct specification of the random effects is important. To ensure a random effects model was realistic, we analysed a large dataset of longitudinal BP.

Repeated BP measurements were found in the chronic care management programme in Counties Manukau. Counties Manukau has high Pacific.

Islander and Maaori population [25]; 34% of the population lives in “very deprived” conditions [25]. The District Health Board analysed increasing rates of acute adult medical admissions and found that they could be reduced by up to 30% “with more timely primary care intervention.” The aim of the study was “to develop an effective and efficient process for the seamless delivery of care for targeted patients with specific chronic diseases.” One of the critical objectives was “to achieve best-practice management of ... blood pressure control” [25]. We obtained 103, 098 repeated BP measurements from 9, 043 people from the chronic care management programme in Counties Manukau [25].

Our purpose was to explore the correlation structure of BP for people who were prehypertensive. To have data that reflected this population, we deleted rows with missing systolic measurements or systolic measurements below 50 mm Hg or above 160 mm Hg. The elevated upper bound of 160 mm Hg was due to the uncertainty in measurement; systolic BP can vary up to 20 mm Hg in any given day [26]. We deleted diastolic measurements less than 30 mm Hg which were likely the result of a recording error. Participants with only 1 measurement were also deleted.

1.4. Normality

Fig. 3 demonstrates a well-documented phenomenon: terminal digit preference in BP measurements. Research indicates the terminal digit is 0 in 45% of systolic BP measurements and 48% of diastolic BP. Even numbers are more likely to occur than odd, with the most common odd digit being 5 [27]. Besides terminal digit preference the quantiles of the data appear to follow a normal distribution.

1.5. Correlation structure

We used a variogram to determine how time impacted upon systolic BP. For each person, we found all combinations of measurement times and the corresponding BP measurements. We computed y_i,jk = $\frac{1}{2}$ ( $B P_{i t_{j}}$ — $B P_{i t_{k}}$ )² and x_i,jk = t_ij − t_ik where t_ij and $B P_{i t_{j}}$ refer to time of measurement and BP measurement, respectively, for person i at time t_j. For the first 4 years, we partitioned them into 90 day intervals, after 4 years, we increased the interval lengths to 180 days for smoothness; 4 years is also the duration of our study. Fig. 4 shows the mean of each partition.

The stability of Fig. 4 suggests no evidence of autocorrelation in this data. Fluctuations are likely due to variation among the data.

1.6. Random effects

We also used these data to identify if fixed or random effects in the gradient most accurately represented the measurements within people. We separated the data into participants above and below age 50 and randomly drew 10,000 samples of size 400 from both subpopulations. We fit 2 mixed effect models to both systolic and diastolic BP to all 20, 000 samples. One model was $Y_{i t_{j}}$ = a_i + $β t_{i t_{j}}$ + $ε_{i t_{j}}$ , the other $Y_{i t_{j}}$ = a_i + b_i $t_{i t_{j}}$ + $ε_{i t_{j}}$ ; the first model has a fixed trend, the second a random trend. We fit fixed and random trend effects for systolic and diastolic BP to the 23,786 measurements for people less than age 50, and the 79,312 above to find the population trend. The percent of confidence intervals for β and b_i which contained the population trend are found in Table 1. A random effect model is more likely to contain the true trend in its confidence interval, so was used for both trend and treatment effect in our third set of simulations.

Table 1.

Summary of fixed and random effects trends for systolic and diastolic blood pressures. Random effects represent the data better.

	Less than 50		More than 50
	Systolic	Diastolic	Systolic	Diastolic
Fixed trend	70%	85%	64%	72%
Random trend	97%	97%	92%	96%

Open in a new tab

2. Methods

We conducted three systematic simulation studies as outlined in Fig. 5.

The first set demonstrated that restricted maximum likelihood gave unbiased estimates of the carryover parameter using the lmer function in R [28,29]. The second set demonstrated the same using both systolic and diastolic BP. The third involved using the bootMer function to find confidence intervals for the risk ratio.

Our first simulations removed some of the complications by only simulating systolic BP. We started simply to explore bias in our estimates prior to proceeding. We began by randomly generating a number from a uniform distribution between 125 and 140 mm Hg; TROPHY trial participants had systolic BP close to the 140 mm Hg threshold [5]. Trends of 0, 1, and 2 mm Hg per year were used as systolic BP increases over time [30,31]. The treatment arm was given treatments of either −5 mm Hg or −10 mm Hg for either 1, 1.5, 2, 2.5, or 3 years [14]. Measurements were taken every 3 months, 6 months, or yearly, and carryovers of length 0, 0.5, 1, 1.5, and 2 years were included. BP varies due to both measurement error and intraindividual variability which we combined and assumed normally distributed standard deviations of 3, 5, and 7 mm Hg [14,17]. We analysed the data by fitting a model similar to Eq (2) using both lm and lmer in R; lm fits a linear model, while lmer fits a mixed effect model with random and fixed effects outlined in Fig. 5.

The second set of simulations involved both systolic and diastolic measurements. Although more extreme, the simulations were qualitatively similar and had the goal of exploring bias in estimates. A description of the systolic BP simulation is above. Baseline diastolic BP was randomly generated between 80 and 90 mm Hg and a quadratic effect with age meant trends of −1, 0, and 1 mm Hg per year were used [32]. All combinations of systolic and diastolic trends were used in the simulation. To reflect day-to-day variation and measurement error, we added normally-distributed random error with standard deviation 3, 5, and 7 mm Hg for systolic BP which corresponded to 2, 3, and 4 mm Hg for diastolic BP [32]. Active treatment was assumed to lower systolic pressure by 5 or 10 mm Hg which corresponded to diastolic decreasing by 3 and 6 mm Hg, respectively [32]. We used the same lengths of study, durations of treatments, and carryover lengths as our first set.

We fit the mixed effects model $Y_{i k t_{j}}$ = a_ik + β_k $t_{i t_{j}}$ +γ_k $X_{i t_{j}}$ +δ_k $Z_{i t_{j}}$ + $ε_{i k t_{j}}$ using lm and lmer in R. Here, $Y_{i k t_{j}}$ is systolic BP when k is 1, and diastolic BP otherwise. The lm estimates used a sum of squares maximisation, while lmer used restricted maximum likelihood [28,29].

In the third set of simulations we generated data using random effects for both treatment effect and trend. When both systolic and diastolic were included, and diagnosis happened early in a simulated trial, the model fit by lmer was unidentifiable or barely identifiable. For that reason, we included on systolic BP. In addition, these simulations included bootstrapping the risk ratio at the time point after treatment ceased.

We generated baseline systolic measurements as above. Our simulated trend for each person was sampled from a random normal distribution with mean 0, 1 or 2 mm Hg per year and standard deviation 1.5 mm Hg, which our data indicated. Random treatment effects were found by sampling from a normal distribution with mean −5, −10 or 0 mm Hg and standard deviation 2.5 mg Hg [21]. We fit a mixed effects model with random effects for intercept, trend, and treatment effects. We used the measurement schedules and lengths of carryover outlined above.

Equation 4.

(4)

For each combination of parameters we simulated 400 trials, fit a mixed effects model using lmer, and used the bootMer function to simulate 100 parametric bootstraps of the risk ratio found in Eq (3) [28,29]. A schematic of these simulations can be found in Fig. 9.

Fig. 9 — Schematic overview of bootstrapping for relative risks. This method allows us to accurately test a carryover hypothesis. Adapted from [34].

In a parametric bootstrap we assume a distribution based on the estimates obtained from fitting a model to the data, then sample from this distribution [33]. In our case, we fit a mixed effects model and sampled from the model to find bootstrap estimates of the risk ratio.

We explored differing definitions of hypertension that investigators could define in trial protocol and used them for all 3 sets of simulations. There were 6 rules, some of which involved averaging multiple measurements, and others that counted the number of measurements above the threshold. Our rules diagnosed hypertension when:

•
1 measurement is above the threshold
•
2 consecutive measurements are above the threshold
•
any 3 measurements are above the threshold
•
the average of two consecutive measurements are above the threshold
•
when the average of 3 consecutive measurements are above the thresh-old
•
a rule that signaled when there was 1 measurement above the threshold, then removed measurement error and diagnosed when the long-term average BP was above the threshold.

To reduce the impact of measurement error we tried diagnosing after averaging measurements. We emphasised the importance of measurement error in the last rule. We denote these rules as $f_{i t_{j}}$ . To simulate missing data, we deleted data after diagnosis.

3. Results and discussion

3.1. Primary simulation results

Our first simulation demonstrated that the lmer function in R gave negligibly biased estimates for the carryover parameter. Fig. 6 shows bias in estimates for carryover found using the linear model and the mixed effects model fit to 100 simulated trials. The relevant parameters are: treatment effect −10 mm Hg, carryover 2 years, and diagnosis defined by 3 measurements above the 140 mm Hg threshold. The horizontal axis separates lengths of treatment 1, 1.5, 2, 2.5, or 3 years and the bottom two graphs have standard deviation 3 mm Hg, the top 7 mm Hg. Trials with standard deviation 5 mm Hg were also simulated with results similar to those in Fig. 6.

The baseline BP for the simulated data came from a uniform distribution as described in Sec 2 but our model assumed a normal distribution for baseline BP. As we can see from Fig. 6 our model appears to be robust to the normality assumption at baseline.

3.2. Bivariate simulation

Fig. 7 shows the mean biases in our estimates of the carryover parameter from all 7016 combinations of parameters discussed in Sec 2. For each set of parameters, we conducted 100 bivariate simulated trials, then found the mean of the 100 estimates of the carryover parameters. All the biases for feasible models for both systolic and diastolic measurements were computed and are included here.

The bivariate linear mixed model also appears robust to the normality assumption at baseline as the simulated data came from a uniform distribution.

3.3. Risk ratio

Simulations that used a parametric bootstrap to find a 95% confidence interval of the risk ratio for the time after treatment ceased are helpful in testing a carryover hypothesis.

Mathematically, we let $r r_{k t_{j}}$ (f_l∗) be the k^th simulated bootstrap risk ratio from the l^th simulated trial at time t_j. We define f_l∗ as the mixed effect model found using lmer to maximise the likelihood from the l^th simulated trial. For one combination of parameters in our simulations, l = 1, 2, ... , 400 and j = 1, 2, ..., M. We found the 97.5% and 2.5% quantiles of $r r_{k t_{j}}$ (f_l∗) where k = 1, 2, ... , 100 and t_j was the first time point after treatment ceased. We tested these confidence intervals for 1.

Intuitively, with no carryover in the model, equiprobable diagnosis in the treatment and control arms (risk ratio of 1) after treatment ceases seems predictable. As there is no extended impact of treatment, due to randomisation, BP in both arms is similar, and diagnosis rates should be similar. To support this we use data. The true risk ratio was found by using a sample size of 10⁴ and no error, all other parameters were the same as those discussed in Sec 2. As we have no error in the data, we diagnose hypertension when 1 measurement is above the threshold. We found the risk ratio of incidence hypertension at times when simulated carryover was present in the model (i.e. $Z_{i t_{j}} \neq 0$ in Eq (4)). We plotted the true risk ratio at the first measurement after treatment ceases. Fig. 8 shows the stability around 1 when carryover is 0. With increasing magnitudes of carryover, no participants in the treatment arm are diagnosed which results in a risk ratio of 0. This complicates distinguishing magnitudes of carryover which requires further work.

Fig. 8 — Because the true risk ratio of hypertension is approximately 1 the first measurement after treatment ceases when carryover is 0, we can test the bootstrap confidence intervals for 1.

We simulated 400 trials for each combination of parameters. For each trial we fit a mixed effect model and used the bootMer function in the lme4 package in R to find 100 parametric bootstraps of risk ratio at the first measurement after treatment ceased (the first t_j where $Z_{i t_{j}} \neq 0$ ). Using these, we computed a 95% confidence interval which we tested for 1. The coverage for the combination of parameters was the percentage of bootstrap confidence intervals that contained the value 1. Fig. 10 partitions this coverage at the first measurement after treatment has ceased by length of carryover. If 1 is in the confidence interval our simulations suggest that it is 99% likely that the data came from a simulation that contained no carryover in the model. The process is outlined in Fig. 9.

Fig. 10 — Proportion of bootstrap confidence intervals that contain 1 partitioned by length of carryover. When carryover is 0, 1 is the true value and is likely to be in the confidence intervals.

4. Conclusion

To summarise, our simulations suggest that using a mixed effect model similar to (4) to estimate carryover parameters will give negligibly biased estimates in both systolic and diastolic models. Once a model is fit, a parametric bootstrap of the risk ratio can be found for the measurement directly after treatment ceases using the bootMer function. If 1 is in the bootstrap confidence interval it is highly likely there is no carryover in the model from which the data were simulated. This tests a carryover hypothesis. However, testing for duration of carryover will require more effort as insufficient participants are diagnosed in the treatment arm. Survival analysis may provide a viable solution in these situations.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

1.Prevention & treatment http://www.oxfordreference.com/view/10.1093/oi/authority.20110803100344375
2.Senn S. Cross-over trials in statistics in medicine: the first ’25’ years. Stat. Med. 2006;25:3430–3442. doi: 10.1002/sim.2706. [DOI] [PubMed] [Google Scholar]
3.Sacks F.M., Svetkey L.P., Vollmer W.M., Appel L.J., Bray G.A., Harsha D., Obarznek E., Conlin P.R., M E.R., III, Simons- Morton D.G., Karanja N., Lin P.-H. Effects on blood pressure of reduced dietary sodium and the dietary approaches to stop hypertension. N. Engl. J. Med. 2001;344(1):3–10. doi: 10.1056/NEJM200101043440101. [DOI] [PubMed] [Google Scholar]
4.Knowler W., Barrett-Connor E., Fowler S., Hamman R., Lachin J., Walker E., Nathan D. Reduction in the incidence of type 2 dia- betes with lifestyle intervention of metformin. N. Engl. J. Med. 2002;6(346):393–403. doi: 10.1056/NEJMoa012512. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Julius S., Nesbitt S.D., Egan B.M., Weber M.A., Michelson E.L., Kaciroti N., Black H.R., Grimm R.H., Messerli F.H., Oparil S. Feasibility of treating prehypertension with an angiotensin-receptor blocker. N. Engl. J. Med. 2006;354(16):1685–1697. doi: 10.1056/NEJMoa060838. [DOI] [PubMed] [Google Scholar]
6.Lawes C.M.M., Hoorn S.V., Law M.R., Elliot P., MacMa- hon S., Rodgers A. High blood pressure. In: Ezzati M., Lopez A.D., Rodgers A., Murray C.J., editors. Comparative Quantification of Health Risks: Global and Regional Burden of Disease Attributable to Selected Major Risk Factors. World Health Organization; 2004. pp. 281–389. [Google Scholar]
7.Sibbald B., Roland M. Understanding controlled trials. Why are randomised controlled trials important? BMJ. 1998;316(7126):201. doi: 10.1136/bmj.316.7126.201. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Millar-Craig M.W., Bishop C.N., Raferty E.B. Circadian variation of blood-pressure. Lancet. 1978;311(8068):795–797. doi: 10.1016/s0140-6736(78)92998-7. [DOI] [PubMed] [Google Scholar]
9.Woodhouse P.R., Khaw K.-T., Plummer M. Seasonal variation of blood pressure and its relationship to ambient temperature in an elderly population. J. Hypertens. 1993;11(11):1267–1274. [PubMed] [Google Scholar]
10.James G.D., Yee L.S., Pickering T.G. Winter-summer difference in the effects of emotion, posture and place of measurement on blood pressure. Soc. Sci. Med. 1990;31(11):1213–1217. doi: 10.1016/0277-9536(90)90126-d. [DOI] [PubMed] [Google Scholar]
11.Clark L.A., Denby L., Pregibon D., Harshfield G.A., Pickering T.G., Blank S., Laragh J.H. A quantitative analysis of the effects of activity and time of day on the diurnal variations of blood pressure. J. Chron. Dis. 1987;40(7):671–681. doi: 10.1016/0021-9681(87)90103-2. [DOI] [PubMed] [Google Scholar]
12.Lusardi P., Zoppi A., Preti P., Pesce R.M., Piazza E., Fogari R. Effects of insufficient sleep on blood pressure in hypertensive patients a 24-h study. Am. J. Hypertens. 1999;12(1):63–68. doi: 10.1016/s0895-7061(98)00200-3. [DOI] [PubMed] [Google Scholar]
13.Neufeld P.D., Johnson D.L. Observer error in blood pressure mea- surement. Can. Med. Assoc. J. 1986;135(6):633–637. [PMC free article] [PubMed] [Google Scholar]
14.Lumley T., Rice K.M., Psaty B.M. Carryover effects after cessa- tion of drug treatment: trophies or dreams? Am. J. Hypertens. 2008;21:14–16. doi: 10.1038/ajh.2007.21. [DOI] [PubMed] [Google Scholar]
15.Sturdevant S.G., Lumley T. Testing for carryover effects after cessation of treatments: a design approach. BMC Med. Res. Methodol. 2016;16(1):92. doi: 10.1186/s12874-016-0191-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Nason M., Follman D. Design and analysis of crossover trials for absorbing binary endpoints. Biometrics. 2010;66:958–965. doi: 10.1111/j.1541-0420.2009.01358.x. [DOI] [PubMed] [Google Scholar]
17.Persell S.D., Baker D.W. Studying interventions to prevent the progression from prehypertension to hypertension: does trophy win the prize? Am. J. Hypertens. 2006;19(11):1095–1097. doi: 10.1016/j.amjhyper.2006.09.013. [DOI] [PubMed] [Google Scholar]
18.Mantel N., Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. J. Natl. Cancer Inst. 1959;22(4):719–748. [PubMed] [Google Scholar]
19.Goodman P.J., Thompson I.M., Tangen C.M., Crowley J.J., Ford L.G., Coltman C.A. The prostate cancer prevention trial: design, biases and interpretation of study results. J. Urol. 2006;175(6):2234–2242. doi: 10.1016/S0022-5347(06)00284-9. [DOI] [PubMed] [Google Scholar]
20.Pickering T.G., Hall J.E., Appel L.J., Falkner B.E., Graves J., Hill M.N., Jones D.W., Kurtz T., Sheps S.G., Roccella E.J. Recommen- dations for blood pressure measurement in humans and experimental animals part 1: blood pressure measurement in humans: a statement for professionals from the subcommittee of professional and public education of the american heart association council on high blood pressure research. Hypertension. 2005;45(1):142–161. doi: 10.1161/01.HYP.0000150859.47929.8e. [DOI] [PubMed] [Google Scholar]
21.Bell K.J., Hayen A., Macaskill P., Craig J.C., Neal B.C., Fox K.M., Remme W.J., Asselbergs F.W., van Gilst W.H., MacMahon S., Re- muzzi G., Ruggenenti P., Teo K.K., Irwig L. Monitoring initial response to angiotensin-converting enzyme inhibitor-based regimens: an indi- vidual patient data meta-analysis from randomized, placebo-controlled trials, Hypertension. J.Am. Heart Assoc. 2010;56:533–539. doi: 10.1161/HYPERTENSIONAHA.110.152421. [DOI] [PubMed] [Google Scholar]
22.Rubin D.B. Inference and missing data. Biometrika. 1976;63:581–592. [Google Scholar]
23.risk ratio https://www.cancer.gov/publications/dictionaries/cancer-terms/def/risk-ratio accessed: 2020-12-04.
24.Verbeke G., Molenberghs G. Springer; 2000. Linear Mixed Models for Longitudinal Data, Springer Series in Statistics. [Google Scholar]
25.Wellingham J., Tracey J., Rea H., Gribben B. The development and implementation of the chronic care management programme in Counties Manukau. N. Z. Med. J. 2003;116:33–46. [PubMed] [Google Scholar]
26.Muntner P., Shimbo D., Tonelli M., Reynolds K., Arnett D.K., Oparil S. The relationship between visit-to-visit variability in systolic blood pressure and all-cause mortality in the general population. Hyper- tension. 2011;57(2):160–166. doi: 10.1161/HYPERTENSIONAHA.110.162255. http://hyper.ahajournals.org/content/57/2/160.full.pdf http://hyper.ahajournals.org/content/57/2/160 doi:10.1161/HYPERTENSIONAHA. 110.162255. URL. [DOI] [PubMed] [Google Scholar]
27.Nietert P.J., Wessell A.M., Feifer C., Ornstein S.M. Effect of terminal digit preference on blood pressure measurement and treatment in primary care. Am. J. Hypertens. 2006;19(2):147–152. doi: 10.1016/j.amjhyper.2005.08.016. [DOI] [PubMed] [Google Scholar]
28.Bates D., Maechler M., Bolker B., Walker S. lme4: linear mixed- effects models using Eigen and S4, r package version 1.1-8. 2015. http://CRAN.R-project.org/package=lme4
29.Bates D., Maechler M., Bolker B.M., Walker S. Fitting linear mixed-effects models using lme4, arXiv e-print; in press. J. Stat. Software. 2015;67(1):1–48. doi: 10.18637/jss.v067.i01. http://arxiv.org/abs/1406.5823 [DOI] [Google Scholar]
30.Hajjar I., Kotchen T.A. Trends in prevalence, awareness, treatment, and control of hypertension in the United States. JAMA. 2003;290:199–206. doi: 10.1001/jama.290.2.199. [DOI] [PubMed] [Google Scholar]
31.Wolf-Maier K., Cooper R.S., Banegas J.R., Giampaoli S., Hense H.-W., Joffres M., Kastarinen M., Poulter N., Primatesta P., Rodrıguez- Artalejo F., Stegmayr B., Thamm M., Tuomilehto J., Vanuzzo D., Vescio F. Hypertension prevalence and blood pressure levels in 6 european countries, Canada, and the United States. JAMA. 2003;289:2363–2369. doi: 10.1001/jama.289.18.2363. [DOI] [PubMed] [Google Scholar]
32.Wright J.D., Hughes J.P., Ostchega Y., Yoon S.S., Nwankwo T. Mean systolic and diastolic blood pressure in adults aged 18 and over in the United States, 2001-2008. National Health Statistics Reports. 2011;35:1–23. [PubMed] [Google Scholar]
33.Murphy K.P. Adap- Tive Computation and Machine Learning Series. MIT Press; 2012. Machine learning: a probabilistic perspective. [Google Scholar]
34.Efron B., Tibshirani R. Chapman & Hall/CRC; 1994. An Introduction to the Bootstrap. [Google Scholar]

[bib1] 1.Prevention & treatment http://www.oxfordreference.com/view/10.1093/oi/authority.20110803100344375

[bib2] 2.Senn S. Cross-over trials in statistics in medicine: the first ’25’ years. Stat. Med. 2006;25:3430–3442. doi: 10.1002/sim.2706. [DOI] [PubMed] [Google Scholar]

[bib3] 3.Sacks F.M., Svetkey L.P., Vollmer W.M., Appel L.J., Bray G.A., Harsha D., Obarznek E., Conlin P.R., M E.R., III, Simons- Morton D.G., Karanja N., Lin P.-H. Effects on blood pressure of reduced dietary sodium and the dietary approaches to stop hypertension. N. Engl. J. Med. 2001;344(1):3–10. doi: 10.1056/NEJM200101043440101. [DOI] [PubMed] [Google Scholar]

[bib4] 4.Knowler W., Barrett-Connor E., Fowler S., Hamman R., Lachin J., Walker E., Nathan D. Reduction in the incidence of type 2 dia- betes with lifestyle intervention of metformin. N. Engl. J. Med. 2002;6(346):393–403. doi: 10.1056/NEJMoa012512. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] 5.Julius S., Nesbitt S.D., Egan B.M., Weber M.A., Michelson E.L., Kaciroti N., Black H.R., Grimm R.H., Messerli F.H., Oparil S. Feasibility of treating prehypertension with an angiotensin-receptor blocker. N. Engl. J. Med. 2006;354(16):1685–1697. doi: 10.1056/NEJMoa060838. [DOI] [PubMed] [Google Scholar]

[bib6] 6.Lawes C.M.M., Hoorn S.V., Law M.R., Elliot P., MacMa- hon S., Rodgers A. High blood pressure. In: Ezzati M., Lopez A.D., Rodgers A., Murray C.J., editors. Comparative Quantification of Health Risks: Global and Regional Burden of Disease Attributable to Selected Major Risk Factors. World Health Organization; 2004. pp. 281–389. [Google Scholar]

[bib7] 7.Sibbald B., Roland M. Understanding controlled trials. Why are randomised controlled trials important? BMJ. 1998;316(7126):201. doi: 10.1136/bmj.316.7126.201. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] 8.Millar-Craig M.W., Bishop C.N., Raferty E.B. Circadian variation of blood-pressure. Lancet. 1978;311(8068):795–797. doi: 10.1016/s0140-6736(78)92998-7. [DOI] [PubMed] [Google Scholar]

[bib9] 9.Woodhouse P.R., Khaw K.-T., Plummer M. Seasonal variation of blood pressure and its relationship to ambient temperature in an elderly population. J. Hypertens. 1993;11(11):1267–1274. [PubMed] [Google Scholar]

[bib10] 10.James G.D., Yee L.S., Pickering T.G. Winter-summer difference in the effects of emotion, posture and place of measurement on blood pressure. Soc. Sci. Med. 1990;31(11):1213–1217. doi: 10.1016/0277-9536(90)90126-d. [DOI] [PubMed] [Google Scholar]

[bib11] 11.Clark L.A., Denby L., Pregibon D., Harshfield G.A., Pickering T.G., Blank S., Laragh J.H. A quantitative analysis of the effects of activity and time of day on the diurnal variations of blood pressure. J. Chron. Dis. 1987;40(7):671–681. doi: 10.1016/0021-9681(87)90103-2. [DOI] [PubMed] [Google Scholar]

[bib12] 12.Lusardi P., Zoppi A., Preti P., Pesce R.M., Piazza E., Fogari R. Effects of insufficient sleep on blood pressure in hypertensive patients a 24-h study. Am. J. Hypertens. 1999;12(1):63–68. doi: 10.1016/s0895-7061(98)00200-3. [DOI] [PubMed] [Google Scholar]

[bib13] 13.Neufeld P.D., Johnson D.L. Observer error in blood pressure mea- surement. Can. Med. Assoc. J. 1986;135(6):633–637. [PMC free article] [PubMed] [Google Scholar]

[bib14] 14.Lumley T., Rice K.M., Psaty B.M. Carryover effects after cessa- tion of drug treatment: trophies or dreams? Am. J. Hypertens. 2008;21:14–16. doi: 10.1038/ajh.2007.21. [DOI] [PubMed] [Google Scholar]

[bib15] 15.Sturdevant S.G., Lumley T. Testing for carryover effects after cessation of treatments: a design approach. BMC Med. Res. Methodol. 2016;16(1):92. doi: 10.1186/s12874-016-0191-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] 16.Nason M., Follman D. Design and analysis of crossover trials for absorbing binary endpoints. Biometrics. 2010;66:958–965. doi: 10.1111/j.1541-0420.2009.01358.x. [DOI] [PubMed] [Google Scholar]

[bib17] 17.Persell S.D., Baker D.W. Studying interventions to prevent the progression from prehypertension to hypertension: does trophy win the prize? Am. J. Hypertens. 2006;19(11):1095–1097. doi: 10.1016/j.amjhyper.2006.09.013. [DOI] [PubMed] [Google Scholar]

[bib18] 18.Mantel N., Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. J. Natl. Cancer Inst. 1959;22(4):719–748. [PubMed] [Google Scholar]

[bib19] 19.Goodman P.J., Thompson I.M., Tangen C.M., Crowley J.J., Ford L.G., Coltman C.A. The prostate cancer prevention trial: design, biases and interpretation of study results. J. Urol. 2006;175(6):2234–2242. doi: 10.1016/S0022-5347(06)00284-9. [DOI] [PubMed] [Google Scholar]

[bib20] 20.Pickering T.G., Hall J.E., Appel L.J., Falkner B.E., Graves J., Hill M.N., Jones D.W., Kurtz T., Sheps S.G., Roccella E.J. Recommen- dations for blood pressure measurement in humans and experimental animals part 1: blood pressure measurement in humans: a statement for professionals from the subcommittee of professional and public education of the american heart association council on high blood pressure research. Hypertension. 2005;45(1):142–161. doi: 10.1161/01.HYP.0000150859.47929.8e. [DOI] [PubMed] [Google Scholar]

[bib21] 21.Bell K.J., Hayen A., Macaskill P., Craig J.C., Neal B.C., Fox K.M., Remme W.J., Asselbergs F.W., van Gilst W.H., MacMahon S., Re- muzzi G., Ruggenenti P., Teo K.K., Irwig L. Monitoring initial response to angiotensin-converting enzyme inhibitor-based regimens: an indi- vidual patient data meta-analysis from randomized, placebo-controlled trials, Hypertension. J.Am. Heart Assoc. 2010;56:533–539. doi: 10.1161/HYPERTENSIONAHA.110.152421. [DOI] [PubMed] [Google Scholar]

[bib22] 22.Rubin D.B. Inference and missing data. Biometrika. 1976;63:581–592. [Google Scholar]

[bib23] 23.risk ratio https://www.cancer.gov/publications/dictionaries/cancer-terms/def/risk-ratio accessed: 2020-12-04.

[bib24] 24.Verbeke G., Molenberghs G. Springer; 2000. Linear Mixed Models for Longitudinal Data, Springer Series in Statistics. [Google Scholar]

[bib25] 25.Wellingham J., Tracey J., Rea H., Gribben B. The development and implementation of the chronic care management programme in Counties Manukau. N. Z. Med. J. 2003;116:33–46. [PubMed] [Google Scholar]

[bib26] 26.Muntner P., Shimbo D., Tonelli M., Reynolds K., Arnett D.K., Oparil S. The relationship between visit-to-visit variability in systolic blood pressure and all-cause mortality in the general population. Hyper- tension. 2011;57(2):160–166. doi: 10.1161/HYPERTENSIONAHA.110.162255. http://hyper.ahajournals.org/content/57/2/160.full.pdf http://hyper.ahajournals.org/content/57/2/160 doi:10.1161/HYPERTENSIONAHA. 110.162255. URL. [DOI] [PubMed] [Google Scholar]

[bib27] 27.Nietert P.J., Wessell A.M., Feifer C., Ornstein S.M. Effect of terminal digit preference on blood pressure measurement and treatment in primary care. Am. J. Hypertens. 2006;19(2):147–152. doi: 10.1016/j.amjhyper.2005.08.016. [DOI] [PubMed] [Google Scholar]

[bib28] 28.Bates D., Maechler M., Bolker B., Walker S. lme4: linear mixed- effects models using Eigen and S4, r package version 1.1-8. 2015. http://CRAN.R-project.org/package=lme4

[bib29] 29.Bates D., Maechler M., Bolker B.M., Walker S. Fitting linear mixed-effects models using lme4, arXiv e-print; in press. J. Stat. Software. 2015;67(1):1–48. doi: 10.18637/jss.v067.i01. http://arxiv.org/abs/1406.5823 [DOI] [Google Scholar]

[bib30] 30.Hajjar I., Kotchen T.A. Trends in prevalence, awareness, treatment, and control of hypertension in the United States. JAMA. 2003;290:199–206. doi: 10.1001/jama.290.2.199. [DOI] [PubMed] [Google Scholar]

[bib31] 31.Wolf-Maier K., Cooper R.S., Banegas J.R., Giampaoli S., Hense H.-W., Joffres M., Kastarinen M., Poulter N., Primatesta P., Rodrıguez- Artalejo F., Stegmayr B., Thamm M., Tuomilehto J., Vanuzzo D., Vescio F. Hypertension prevalence and blood pressure levels in 6 european countries, Canada, and the United States. JAMA. 2003;289:2363–2369. doi: 10.1001/jama.289.18.2363. [DOI] [PubMed] [Google Scholar]

[bib32] 32.Wright J.D., Hughes J.P., Ostchega Y., Yoon S.S., Nwankwo T. Mean systolic and diastolic blood pressure in adults aged 18 and over in the United States, 2001-2008. National Health Statistics Reports. 2011;35:1–23. [PubMed] [Google Scholar]

[bib33] 33.Murphy K.P. Adap- Tive Computation and Machine Learning Series. MIT Press; 2012. Machine learning: a probabilistic perspective. [Google Scholar]

[bib34] 34.Efron B., Tibshirani R. Chapman & Hall/CRC; 1994. An Introduction to the Bootstrap. [Google Scholar]

PERMALINK

Statistical methods for testing carryover effects: A mixed effects model approach

S Gwynn Sturdevant

Thomas Lumley

Abstract

1. Introduction

Fig. 1.