Skip to main content
BMC Medical Research Methodology logoLink to BMC Medical Research Methodology
. 2021 Jul 24;21:150. doi: 10.1186/s12874-021-01323-9

Statistical analysis of two arm randomized pre-post designs with one post-treatment measurement

Fei Wan 1,
PMCID: PMC8305561  PMID: 34303343

Abstract

Background

Randomized pre-post designs, with outcomes measured at baseline and after treatment, have been commonly used to compare the clinical effectiveness of two competing treatments. There are vast, but often conflicting, amount of information in current literature about the best analytic methods for pre-post designs. It is challenging for applied researchers to make an informed choice.

Methods

We discuss six methods commonly used in literature: one way analysis of variance (“ANOVA”), analysis of covariance main effect and interaction models on the post-treatment score (“ANCOVAI” and “ANCOVAII”), ANOVA on the change score between the baseline and post-treatment scores (“ANOVA-Change”), repeated measures (“RM”) and constrained repeated measures (“cRM”) models on the baseline and post-treatment scores as joint outcomes. We review a number of study endpoints in randomized pre-post designs and identify the mean difference in the post-treatment score as the common treatment effect that all six methods target. We delineate the underlying differences and connections between these competing methods in homogeneous and heterogeneous study populations.

Results

ANCOVA and cRM outperform other alternative methods because their treatment effect estimators have the smallest variances. cRM has comparable performance to ANCOVAI in the homogeneous scenario and to ANCOVAII in the heterogeneous scenario. In spite of that, ANCOVA has several advantages over cRM: i) the baseline score is adjusted as covariate because it is not an outcome by definition; ii) it is very convenient to incorporate other baseline variables and easy to handle complex heteroscedasticity patterns in a linear regression framework.

Conclusions

ANCOVA is a simple and the most efficient approach for analyzing pre-post randomized designs.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12874-021-01323-9.

Keywords: Pre-post design, ANCOVA, ANOVA, Repeated measures, Change score, Treatment effect

Background

Two arm parallel randomized trials have been widely used to compare the clinical effectiveness of competing treatments in improving patients’ health outcomes. In these trials, continuous outcomes of interest were routinely measured at baseline (defined as “baseline score”) and one post treatment time point (defined as “post-treatment score”). The primary purpose of designing a pre-post randomized study is to answer the scientific question of interest: is treatment A more effective than treatment B? To assess the difference in the treatment effectiveness between two treatments, we need to select a study endpoint and quantify a treatment effect. Common study endpoints include the post treatment score, the change score from baseline to post treatment, a percentage change from baseline, and rate of change from baseline. The difference between two arms on selected study endpoints is defined as the treatment effect. Few studies have investigated the links between these different metrics of treatment effect in a randomized pre-post trial. These underlying connections are critical in understanding the equivalence among some statistical methods that may appear to be very different at the first sight. We need to be certain about the type of treatment effect each method targets and select the one that yields an unbiased and the most efficient estimator of the treatment effect of our interest.

There are a number of statistical methods commonly used in analyzing pre-post trials. We can analyze the post-treatment score using one way analysis of variance model (ANOVA) [1, 2], analysis of covariance model adjusting for the baseline score (ANCOVAI) [27], and ANCOVA including a baseline score by treatment interaction (ANCOVAII) [3, 4, 810]. We can also analyze the change score using ANOVA (ANOVA-Change) [11]. Alternatively, we can model the baseline and post-treatment scores jointly using repeated measures models (RM) and constrained repeated measures models (cRM) [10, 1214]. Despite of the simplicity and wide application of randomized pre-post designs, which method is the best analytic approach has been a debated topic and many methodological studies have been performed to compare different statistical methods for past decades [113]. However, it is challenging for applied researchers to evaluate this vast, but often conflicting, amount of information in current literature and make an informed choice.

In this study we aim to review ANOVA, ANCOVAI, ANCOVAII, ANOVA-Change, RM,andcRM from a practical standpoint, with the focus on delineating the differences and underlying connections between them. In section Methods, we first provide notations and assumptions for a typical pre-post design, define homogeneous and heterogeneous study populations, and discuss some common study endpoints and the associated metrics of treatment effects. We next analytically assess differences and connections between these competing models in the homogeneous and heterogeneous scenarios by first describing each model using the same set of population mean, variance, and covariance parameters. In section Results, we compare the relative efficiency of these competing methods theoretically using three simulated weight loss trial examples (homogeneous data, heterogeneous data with balanced design, heterogeneous data with unbalanced design). In the last two sections, we discuss the results and give recommendation on the best analytical approach in a randomized pre-post design.

Methods

A hypothetical weight loss trial and metrics of treatment effects

Notations

In a hypothetical two arm parallel weight loss trial comparing the effect of a new drug (“treatment”) and a placebo (“control”) in reducing participants’ body weights, we use Yijt to denote body weight of the i th subject (i = 1, 2, 3, …nj) in the jth treatment arm (j = 0, 1) at the t th time (t = t0, t1 ). n0 and n1 are the number of subjects in the control and treatment arms.

We denote the mean baseline weights for the treatment and control arms by μ1t0 and μ0t0, respectively. Random allocation guarantees μ1t0=μ0t0 and we let μt0 denote the overall mean baseline weight. The mean weights of the treatment and control arms at time t1 are denoted by μ1t1 and μ0t1, respectively (Fig. 1). We define homogeneous and heterogeneous study populations as follows:

  • i)
    The homogeneous scenario: every participant has the same pattern of variance and covariance structure for their baseline and post-treatment weights, which is parameterized as below:
    =σ02ρσ0σ1ρσ0σ1σ12,

where σ02 and σ12 are the variances of the baseline and post-treatment weights, ρ is the correlation coefficient between the baseline and post-treatment weights.

  • ii)
    The heterogeneous scenario: variance and covariance structures of the baseline and post-treatment weights differ between the treatment and control arms. Formally, we have
    0=σ02ρ0σ0σ01ρ0σ0σ01σ012,

and

1=σ02ρ1σ0σ11ρ1σ0σ11σ112,

where σ02 is the common variance of the baseline body weight in the control and treatment arms. Randomization guarantees that the variances of the baseline weights in both arms are equal to σ02. σ012 and σ112 are the variances of the post-treatment weight in the control and treatment arms. ρ0 and ρ1 are the correlation coefficients between the baseline and post-treatment weights in the control and treatment arms, respectively. In practice, participants may respond to the treatment more differently so that variability of the post-treatment weight tends to be larger in the treatment arm than in the control arm and the correlation between pre- and post-treatment weights are usually stronger in the control arm than in the treatment arm. i.e., ρ0 > ρ1 and σ112>σ012.

Fig. 1.

Fig. 1

Hypothetical two arm pre-post weight loss randomized trial

Metrics of treatment effect

We discuss the following three metrics of treatment effect commonly reported in pre-post trials:

  • i)
    The primary endpoint is the post-treatment weight measured at t1. The difference in the mean post-treatment weights of two arms is defined as a treatment effect, which is parameterized as follows:
    τ=μ1t1μ0t1

For example, if τ =  − 10, we can interpret the results as “at the end of the trial, the mean weight was 10 pounds lower in the treatment group than in the control group.”

  • ii)
    The primary endpoint is the change score calculated by subtracting the baseline weight from the post-treatment weight. i.e., ij=Yijt1Yijt0. The difference in the mean change scores of two arms is a treatment effect. Formally, we have:
    τ~=μ1t1μ1t0μ0t1μ0t0

e.g. if τ~=10, this difference is usually interpreted as “weight reductions were 10 pounds greater in the treatment group than in the control group”. Since randomization ensures μ0t1=μ0t0, it follows directly τ~=τ. When we code “0” for t0 and “1” for t1, the mean change score for each arm can also be interpreted as the mean change rate per unit time for each arm, represented by slopes in Fig. 1. Thus, the difference in slopes, denoted by τ~~=α1α0, is also equivalent to τ. As shown in previous section, ANOVA and ANCOVA target τ, ANOVA-CHANGE targets τ~, and RM targets τ~~. However, we can compare these statistical methods targeting seemingly very different types of treatment effects in a meaningful way because of the equivalence between τ, τ~, and τ~~ in randomized pre-post designs.

  • iii)
    The primary endpoint is the percent change from baseline weight, denoted by φij=Yijt1Yijt0Yijt0. The mean difference in the percent change between two arms is defined as a treatment effect and parameterized as follows:
    τ=φ¯1φ¯0,

where φ¯1 and φ¯0 are the mean percent changes of the treatment and control arms. Although the percent change is popular among clinical researchers, this metric has several drawbacks [1, 15, 16]: i) the percent change is a function of ratio Yijt1Yijt0 . The distribution of the percent change is highly skewed. Analyzing it with normal-theory based statistical methods is not justified and non-parametric statistical methods are generally less powerful; ii) the percent change is not a symmetric measure. For example, the mean weight of adults over 20 in US is 197.8 pound for men and 170.5 pound for women. The mean difference is 27.3 pound between men and women. Men weight 16% (i.e.,100 × ((197.8–170.5)/170.5)) more than women, whereas women weight 13.8% (i.e., 100 × ((197.8–170.5)/197.8)) less than men. The differences could be different depending on which sex is used as devisor; iii) the percent change is not an additive measure. For example, if a participant’s weight increases by 10% in first 6 months and fall by 10% for the next 6 months, the 2 % changes do not cancel out. The participant’s weight at the end would be only 99% of the participant’s starting weight.

Statistical models

In this section, we focus on six methods that estimate τ. We describe each statistical model using the same set of population mean, variance, and covariance parameters defined in section Methods for homogeneous and heterogeneous scenarios, separately. For each method, we present the closed-form expressions of the point estimator of treatment effect and its variance. It often goes unnoticed in practice that different statistical methods have different types of variances (i.e., conditional vs. unconditional variances) associated with their treatment effect estimators. For example, the OLS model-based variances for ANCOVA are conditional because OLS assumes the baseline weight is fixed. Generally speaking, the baseline weight is random because we rarely enroll participants into randomized trials based on predetermined values of the baseline weight. Thus, the unconditional variance and the corresponding unconditional inference is of greater interest because we want the findings derived from the current sample to be generalizable to the population of interest. We will discuss in details whether the OLS model-based conditional inference (i.e., test statistics and p-values from standard statistical softwares) for ANCOVA is still valid for unconditional hypothesis testing and the potential fixes that we can use to draw valid unconditional inference if the usual OLS model-based inference is biased.

When the study population is homogeneous

Method 1:ANOVA modeling post treatment measure (“ANOVA-Post”). We model the post-treatment body weight Yijt1 using the binary treatment indicator Gij (1 if in the treatment arm; 0 if in the control arm) as follows:

Yijt1=β01+β11Gij+eij1,i=1,2,,nj;j=0,1; 1
eij1~N0σ12,

where β01=μ0t1,β11=μ1t1μ0t1=τ, and eij1 is independently and identically distributed (i.i.d) random error. β11 represents the treatment effect. Model (1) is homoscedastic with a constant residual variance σ12.

We can fit an ordinary least squares (OLS) regression to estimate the coefficients and standard errors of model (1). The closed-form expressions of the OLS estimator β^1,ols1 and its “unconditional” variance, denoted by varβ^1,ols1, are presented in Table 1. β^1,ols1 is estimated by the sample group mean difference in the post-treatment weight between two arms. β^1,ols1 is unbiased for τ. The OLS model-based variance of β^1,ols1 assuming known σ12 is:

varolsβ^1,ols1=σ12j=01i=1njGijG..2,

where G..=j=01i=1njGijn0+n1=n1n0+n1. σ12 is estimated by

σ^12=j=01i=1njyijt1y^ijt112n0+n12,

where y^ijt11=β^0,ols1+β^1,ols1Gij is the predicted value from model (1). We let var^olsβ^1,ols1 denote the OLS model-based variance estimator with σ^12 substituted for σ12, which is output by standard statistical softwares (Table 1). Since j=01i=1njGijG..2=n0n1n0+n1, it follows that varolsβ^1,ols1=varβ^1,ols1. It is well established that var^olsβ^1,ols1 is unbiased for varolsβ^1,ols1. Thus, var^olsβ^1,ols1 is unbiased for varβ^1,ols1. The usual OLS model-based inference (i.e., test statistics t=β^1,ols1var^olsβ^1,ols1 and the associated p-value) is valid for testing Ho : τ = 0 unconditionally.

Table 1.

Estimators of treatment effect and variance estimators in a homogeneous study population

Model Estimator of treatment effect (τ) Typea True variance of treatment effect estimator OLS model based variance estimator
ANOVA-Post β^1,ols1=y¯.1t1y¯.0t1 U varβ^1,ols1=σ12n0+σ12n1

var^olsβ^1,ols1=σ^12j=01i=1njGijG..2

σ^12=j=01i=1njyijt1y^ijt12n0+n12

ANCOVA-Post I β^1,ols2=y¯.1t1y¯.0t1β^2,ols2y¯.1t0y¯.0t0 C

varβ^1,ols2Yijt0=1n0+1n1+y¯.1t0y¯.0t02j=01i=1njyijt0y¯.jt02σϵ22,

σϵ22=1ρ2σ12

var^ols(β^1,ols2Yijt0=1n0+1n1+y¯.1t0y¯.0t02j=01i=1njyijt0y¯.jt02σ^eij22,

σ^eij22=j=01i=1njyijt1y^ijt12n0+n14

U varβ^1,ols2=1n0+1n11ρ2σ12
RM γ^3,gls3=y¯.1t1y¯.1t0y¯.0t1y¯.0t0 U varγ^3,gls3=1n0+1n1σ12+σ022ρσ0σ1
cRM γ^3,gls4=y¯.1t1y¯.0t1ρσ0σ1σ02y¯.1t0y¯.0t0 U varγ^3,gls4=1n0+1n11ρ2σ12
ANOVA-Change β^1,ols5=y¯.1t1y¯.1t0y¯.0t1y¯.0t0 U varβ^1,ols5=1n0+1n1σ12+σ022ρσ0σ1

var^olsβ^1,ols5=σ^ϵ52j=01i=1njGijG..2,

σ^ϵ52=j=01i=1njij^ij52n0+n12

aU- unconditional variance; C- conditional variance

Method 2:ANCOVA modeling post treatment measure (“ANCOVAI”): We model the post-treatment weight Yijt1 using the binary treatment indicator Gij and the baseline weight Yijt0.

Yijt1=β02+β12Gij+β22Yijt0+eij2,i=1,2,,nj;j=0,1; 2
eij2~N0σϵ22andσϵ22=1ρ2σ12.

, where β02=μ0t1ρσ1σ0μt0, β12 = τ,β22 = ρσ1σ0, and eij2 is i.i.d random error. β12 measures the treatment effect τ and β22 represents the slope of the pre-post association between Yijt1 and Yijt0. Model (2) has a common residual variance σϵ22 and implicitly assumes that two arms share the common baseline mean μt0.

The coefficients and standard errors of model (2) are also estimated using an OLS regression. The OLS estimator β^1,ols2 is derived as the sample mean difference in the post-treatment weight adjusting for the sample mean difference in the baseline weight between two arms. The group mean difference in the baseline weight can be seen as chance imbalance in a randomized trial. β^1,ols2 is unbiased for τ both conditional on Yijt0 and unconditionally. The formulas of β^1,ols2 and its “unconditional” variance varβ^1,ols2 are listed in Table 1. However, OLS assumes that the baseline weight Yijt0 is fixed. OLS targets the conditional variance of β^1,ols2, denoted by var(β^1,ols2Yijt0, instead of varβ^1,ols2. The formula of var(β^1,ols2Yijt0 with a known common residual variance σϵ22 is presented in Table 1. Since σϵ22 is generally unknown, it is estimated by the following sample residual variance:

σ^eij22=j=01i=1njyijt1y^ijt122n0+n13

, where y^ijt12=β^0,ols2+β^1,ols2Gij+β^2,ols2Yijt0, the predicted value from model (2). We let var^olsβ^1,ols2Yijt0 denote the OLS model-based variance estimator with σ^ϵ22 substituted for σϵ22 . Note that var^olsβ^1,ols2Yijt0 is reported by standard statistical softwares (e.g. “proc reg” in SAS). Its formula is presented in Table 1.

Since we want to generalize our conclusions to a general population and Yijt0 can take different values from those collected in the current sample, we may wonder whether significance tests based on the model-based conditional variance assuming Yijt0 is fixed (e.g., t=β^1,ols2var^olsβ^1,ols2Yijt0) is comparable to unconditional inference (e.g., t=β^1,ols2varβ^1,ols2), in which Yijt0 is treated as random variable, for testing Ho : τ = 0. To establish this equivalence, we need to show: i) var^olsβ^1,ols2Yijt0 is unbiased for var(β^1,ols2Yijt0; ii) varβ^1,ols2Yijt0 is unbiased for varβ^1,ols2. The first part is well established in a homoscedastic linear model. The second part holds because we can show that varβ^1,ols2 =E(varβ^1,ols2Yijt0) using the law of total variance formula and the fact that β^1,ols2 is unbiased for τ. That is, the unconditional variance of β^1,ols2 is the average of its conditional variance over the distribution of the baseline weight. Therefore, the usual model-based standard errors and associated p-values are valid for unconditional inference [3, 5, 17].

Method 3:Repeated measures model (“RM”):RM models the baseline and post-treatment weights (Yijt0, Yijt1) jointly using the binary treatment indicator Gij, the binary time factor Tij, the time by treatment interaction Gij × Tij as follows:

Yijt=γ03+γ13Gij+γ23Tij+γ33Gij×Tij+eijt3,i=1,2,,nj;j=0,1;t=t0,t1, 3
eijt03eijt13~N00,

When t0 = 0 and t1 = 1, γ03=μ0t0, γ13=μ1t0μ0t0,γ23=μ0t1μ0t0, and γ33=μ1t1μ1t0μ0t1μ0t0. γ03 represents the mean baseline weight of the control arm, γ13 represents the difference in the mean baseline weights of the treatment and control arms, γ23 represents the mean change from baseline in the control arm, and γ33 is generally interpreted as the difference in the mean change from baseline in a unit time interval between the treatment and control arms (“difference in difference”), also known as the difference in slopes. We have μ1t0=μ0t0 from random allocation and it follows that γ13=0 and γ33=μ1t1μ1t1=τ. Thus, testing Ho:γ33=0 is equivalent to testing Ho : τ = 0.

The generalized least squares (GLS) model with correlated outcomes is routinely used to estimate the coefficients and standard errors of model (3). The GLS estimator of the treatment effect γ^3,gls3 and its variance var(γ^3,gls3) given known variance and covariance parameters are presented in Table 1. γ^3,gls3 is estimated by the sample mean difference in body weight change between two arms and is unbiased for τ in a large sample. The variance and covariance parameters are generally unknown and need to be estimated using the restricted maximum likelihood (REML). The conventional maximal likelihood estimation (MLE) should be avoided. The REML variance estimator var^reml(γ^3,gls3) is derived by plugging the REML estimators of the variance and covariance parameters (i.e., σ02,σ12,ρσ0σ1) into the formula of varγ^3,gls3.We use Kenward and Roger method [18](“ddfm = kenwardroger” in SAS proc. mixed procedure) to adjust for the potential finite sample bias in var^reml(γ^3,gls3) because of its failure to incorporate variabilities of the REML estimators of the variance and covariance parameters. This adjustment involves inflating the variance and covariance matrix and computing an adjusted approximation degrees of freedom.

Method 4:Constrained Repeated measures Model (“cRM”): By specifying γ13 in the model, RM model (3) assumes the mean baseline weight is different between two arms. Liang and Zeger [8] proposed the following cRM model by fixing γ13=0 to force the treatment and control arms to have the same intercept. Intuitively, cRM is more efficient than RM because cRM estimates one less parameter. Formally, we model the baseline and post-treatment weights (Yijt0, Yijt1) jointly using the binary factor Tij, a time by treatment interaction Gij × Tij in the following cRM model:

Yijt=γ04+γ24Tij+γ34Gij×Tij+eijt4,i=1,2,,nj;j=0,1;t=t0,t1 4
eijt04eijt14~N00,

where γ04=μt0,γ24=μ0t1μ0t0, and γ34=τ. Interpretations of γ04, γ24, and γ34 are the same as their counterparts in RM. The formulas of the GLS point estimator γ^3,gls4 and its variance varγ^3,gls4 are listed in Table 1. γ^3,gls4 is unbiased for τ asymptotically. The empirical or the model-based variance estimate for varγ^3,gls4 is derived using REML in the same way as a regular RM model.

Method 5:ANOVA with change score (“ANOVA-Change”): We model change score ij=Yijt1Yijt0 using the binary treatment indicator Gij as follows:

ij=β05+β15Gij+eij5,i=1,2,,nj;j=0,1; 5
eij5~N0σϵ52andσϵ52=σ12+σ022ρσ0σ1,

where β05=μ0t1μ0t0, β15=μ1t1μ1t0μ0t1μ0t0, and eij3 is i.i.d random error. β05 measures the mean difference score in the control arm. β15 measures the treatment effect τ~. Since μ1t0=μ0t0 due to randomization at baseline, β15 is reduced to τ. The closed-form expressions of β^1,ols5 and varβ^1,ols5 are listed in Table 1. β^1,ols5 is derived as the sample mean difference in the change score between two arms (“difference in difference”) and is unbiased for τ. The OLS model-based variance of β^1,ols5 assuming known σϵ52 is

varolsβ^1,ols5=σϵ52j=01i=1njGijG..2,

where G..=j=01i=1njGijn0+n1=n1n0+n1. σϵ52 is estimated by

σ^ϵ52=j=01i=1njij^ij52n0+n12,

where ^ij5 is the fitted value from model (5). We let var^olsβ^1,ols5 denote the OLS model-based variance estimator with σ^ϵ52 substituted for σϵ52 Table 1, which is reported by standard statistical softwares. Since j=01i=1njGijG..2=n0n1n0+n1, it follows that varolsβ^1,ols5=varβ^1,ols5. It is well established that var^olsβ^1,ols5 is unbiased for varolsβ^1,ols5, and thus for varβ^1,ols5. The usual OLS model-based inference is valid for unconditional hypothesis testing.

When the study population is heterogeneous

Method 6:ANCOVAII: Different variance and covariance structures in the treatment and control arms suggest a baseline measurement by treatment interaction term in ANCOVA [2, 3, 9, 10]. To estimate τ using an interaction model, we first compute the mean centered baseline weight Y~ijt0 by subtracting the overall mean baseline weight from individual baseline weights. i.e., Y~ijt0=Yijt0μt0. We then model the post-treatment body weight Yijt1 using the binary treatment indicator Gij, the mean centered baseline weight Y~ijt0, and the baseline weight by treatment interaction Gij×Y~ijt0 as follows:

Yijt1=β06+β16Gij+β26Y~ijt0+β36Gij×Y~ijt0+eij6,i=1,2,,nj;j=0,1; 6
ei06~N0σϵ062andσϵ062=1ρ02σ012
ei16~N0σϵ162andσϵ162=1ρ12σ112

, where β06=μ0t1, β16=τ,β26=ρ0σ0t0σ0, and β36=ρ1σ1t1σ0ρ0σ0t0σ0. ei06 and ei16 are i.i.d random errors in the control and treatment arms. β16 measures the treatment effect. β26 is the regression slope of the baseline body weight in the control arm. β36 measures the difference in the regression slopes of the baseline weight between the treatment and control arms. Model (6) is heteroscedastic because the error terms in the treatment and control arms have different residual variances.

As presented in Table 2, the OLS estimator β^1,ols6 is the adjusted mean difference in the post-treatment body weights controlling for a weighted mean difference of the baseline body weights between two arms with unequal weighting coefficients for treatment and control arms (i.e., β^2,ols6+β^3,ols6 for the treatment group, and β^2,ols6 for the control group). β^1,ols6 is unbiased for τ. The conditional variance of β^1,ols6, denoted by varβ^1,ols6Y~ijt0, incorporates two different residual variances σϵ062 and σϵ162 (Table 2). Standard statistical softwares such as SAS do not output varβ^1,ols6Y~ijt0 because OLS incorrectly assumes a common residual variance σϵ62, which is the following weighted average of σϵ062 and σϵ162:

σϵ62=n0n0+n1σϵ062+n1n0+n1σϵ162
Table 2.

Estimators of treatment effect and variance estimators in a heterogeneous study population

Model Estimator of treatment effect (τ) Type True variance of treatment effect estimator Variance estimator from OLS model
ANCOVA-Post II β^1,ols6=y¯.1t1β^2,ols6+β^3,ols6y~¯.1t0y¯.0t0β^2,ols6y~¯.0t0 C

varβ^1,ols6Y~ijt0=1n0+y~¯.ot02i=1n0y~i0t0y~¯.0t02σϵ062+1n1+y~¯.1t02i=1n0y~i1t0y~¯.1t02σϵ162

σϵ062=1ρ02σ012, σϵ162=1ρ12σ112

var^olsβ^1,ols6Y~ijt0=1n0+1n1+y~¯.ot02i=1n0y~i0t0y~¯.0t02+y~¯.1t02i=1n0y~i1t0y~¯.1t02σ^ϵ62

σ^ϵ62=j=01i=1njyijt1y^ijt12n0+n15

U varβ^1,ols6=1n01ρ02σ012+1n11ρ12σ112+ρ1σ11σ0ρ0σ01σ02σ02n0+n1
ANCOVA-Post I β^1,ols7=y¯.1t1y¯.0t1β^2,ols7y¯.1t0y¯.0t0 C

varβ^1,ols7Yijt0=1n0+i=1n0yi1t0y¯.0t02y¯.1t0y¯.0t0j=01i=1njyijt0y¯.jt02σϵ072+1n1+i=1n1yi1t0y¯.1t02y¯.1t0y¯.0t0j=01i=1njyi1t0y~¯.1t02σϵ172

σϵ072=1ρ02σ012, σϵ172=1ρ12σ112

var^olsβ^1,ols7Yijt0=1n0+1n1+i=1n0yi1t0y¯.0t02y¯.1t0y¯.0t0j=01i=1njyijt0y¯.jt02+i=1n1yi1t0y¯.1t02y¯.1t0y¯.0t0j=01i=1njyi1t0y~¯.1t02σ^ϵ72

σ^ϵ72=j=01i=1njyijt1y^ijt12n0+n14

U varβ^1,ols7=1n0[1ρ02σ012+ρ1σ11σ0ρ0σ01σ0p12σ02+1n1[1ρ12σ112+ρ1σ11σ0ρ0σ01σ0p02σ02]
cRM γ^3,gls4=y¯.1t1y¯.0t1(ρ0σ0σ01σ02y¯.1t0y¯..t0ρ1σ0σ11σ02y¯.1t0y¯..t0) U varγ^3,gls4=1n0[1ρ02σ012+ρ1σ11σ0ρ0σ01σ0p12σ02+1n1[1ρ12σ112+ρ1σ11σ0ρ0σ01σ0p02σ02]

We let varolsβ^1,ols6Y~ijt0 denote the OLS model-based conditional variance of β^1,ols6 incorporating σϵ62 (Table 2). Since σϵ62 is generally unknown, σϵ62 is estimated by

σ^ϵ62=j=01i=1njyijt1y^ijt12n0+n14,

where y^ijt1 is the predicted value of yijt1. We let var^olsβ^1,ols6Y~ijt0 denote the OLS model-based variance estimator of β^1,ols6 with σ^ϵ62 substituted for σϵ62. and known constant μt0 (Table 2). var^olsβ^1,ols6Y~ijt0 is reported by standard statistical softwares (e.g., “proc reg” in SAS). To assess the validity of the model-based standard errors and p-values from a regular ANCOVAII model for unconditional inference, we need to examine: i) whether var^olsβ^1,ols6Y~ijt0 is unbiased for var(β^1,ols6Y~ijt0; ii) whether varβ^1,ols6Y~ijt0 is unbiased for varβ^1,ols6.

First, var^olsβ^1,ols6Y~ijt0 is unbiased for varolsβ^1,ols6Y~ijt0. However, the unbiasedness of var^olsβ^1,ols6Y~ijt0 as an estimator of varβ^1,ols6Y~ijt0 depends on the relationship between varolsβ^1,ols6Y~ijt0 and varβ^1,ols6Y~ijt0. Asymptotically, we have

β^1,ols6=varols(β^1,ols6Y~ijt0var(β^1,ols6Y~ijt0=σϵ062σϵ1621n11n0

It can be shown in a balanced design (n0 = n1),

varols(β^1,ols6Y~ijt0var(β^1,ols6Y~ijt0.

Thus, var^olsβ^1,ols6Y~ijt0 is nearly unbiased for var(β^1,ols6Y~ijt03. When the design is unbalanced (n0 ≠ n1),

varols(β^1,ols6Y~ijt0var(β^1,ols6Y~ijt0.

Hence, var^olsβ^1,ols6Y~ijt0 is biased for var(β^1,ols6Y~ijt0. Due to heteroscedasticity, var^olsβ^1,ols6Y~ijt0 over-estimates varβ^1,ols6Y~ijt0 if the group with a larger residual variance has larger sample size and the group with a smaller residual variance has smaller sample size, and otherwise may underestimate varβ^1,ols6Y~ijt0 [3, 4].

Second, the common mean baseline weight μt0 is generally unknown. We need to estimate μt0 in Y~ijt0 using the overall sample mean μ^t0=j=01i=1njYijt0n0+n1 but ANCOVA treats μ^t0 as fixed and fails to capture this additional variability in the conditional variances. As shown below, it turns out that varβ^1,ols6Y~ijt0 underestimates varβ^1,ols6 by a factor of β3,ols62varμ^t0 [3]:

varβ^1,ols6=Evarβ^1,ols6Y~ijt0+β3,ols62varμ^t0.

Thus, the OLS model-based conditional inference is biased for unconditional hypothesis testing because of heteroscedasticity and neglecting of sampling variability in μ^t0. To fix these two problems, we can use the following adjusted heteroscedasticity-consistent (HC) variance estimator to replace var^olsβ^1,ols6Y~ijt0 for valid unconditional inference:

var^aHC(β^1,ols6Y~ijt0=var^HC(β^1,ols6Y~ijt0+β^3,ols62σ^02n0+n1,

where var^HC(β^1,ols6Y~ijt0 is a HC variance estimator for varβ^1,ols6Y~ijt0 [19] and can be output from standard softwares. HC variance estimators are consistent (i.e., unbiased in large sample). Among all available HC variance estimators, HC2 was shown to have the best performance in finite samples [3, 4] (e.g. “HCCMETHOD = 2” in proc. reg or “EMPIRICAL” in proc. mixed, SAS). β^3,ols6 is the OLS estimator of β36, and σ^02 is the overall sample variance of the baseline body weight. It follows directly that var^aHCβ^1,ols6Y~ijt0 is asymptotically unbiased for varβ^1,ols6 and we can construct a valid test t=β^1,ols6var^aHCβ^1,ols6Y~ijt0 for testing Ho : τ = 0 unconditionally.

Method 7ANCOVAI: We model the post-treatment weight Yijt1 using the binary treatment G and the baseline weight Yijt0:

Yijt1=β07+β17Gij+β27Yijt0+eij7 7
ei07~N0σϵ072andσϵ072=1ρ02σ012+β36p12σ02
ei17~N0σϵ172andσϵ172=1ρ12σ112+β36p02σ02

, where β07=β06β36p0μ0, and β17=τ. ei07 and ei17 are random errors in the control and treatment arms. Since ei07 and ei17 have different variances in general, model (7) is heteroscedastic and the severity of heteroscedasticity is determined by the correlation coefficient, the variances of the post-treatment weights in two arms, and whether the design is balanced.

As shown in Table 2, the OLS estimator β^1,ols7 is an adjusted mean difference in the post-treatment weights controlling for a weighted mean difference of the baseline weights between two arms with equal weighting coefficient for the treatment and control arms (i.e., β^2,ols7 for both arms). β^1,ols7 is unbiased for τ. The true conditional variance varβ^1,ols7Yijt0 incorporates two different residual variances. Similar to ANCOVAII, the OLS model-based inference for ANCOVAI also mistakenly assumes a constant residual variance σϵ72, which is a weighted average of σϵ072 and σϵ172, as follows:

σϵ72=n0n0+n1σϵ072+n1n0+n1σϵ172.

Since σϵ72 is unknown, it is estimated by

σ^ϵ72=j=01i=1njyijt1y^ijt12n0+n13,

where y^ijt1 is the predicted value of yijt1 from model (7). The closed form expressions of the OLS model-based conditional variance varolsβ^1,ols7Yijt0 incorporating σϵ72 and the OLS model-based variance estimator var^olsβ^1,ols7Yijt0 with σ^ϵ72 substituted for σϵ72 are given in Table 2. Recall that standard statistical softwares report var^olsβ^1,ols7Yijt0. To show the model-based standard errors and p-values are valid for unconditional inference, we need to examine: i) whether var^olsβ^1,ols7Yijt0 is unbiased for var(β^1,ols7Yijt0; ii) whether varβ^1,ols7Yijt0 is unbiased for varβ^1,ols7.

First, var^olsβ^1,ols7Yijt0 is unbiased for varolsβ^1,ols7Yijt0 but the unbiasedness of var^olsβ^1,ols7Yijt0 as an estimator of varβ^1,ols7Yijt0 depends on the relationship between varolsβ^1,ols7Yijt0 and varβ^1,ols7Yijt0. Asymptotically, we have

β^1,ols7=varols(β^1,ols7Yijt0varβ^1,ols7Yijt0=σϵ072σϵ1721n11n0

When sample sizes are equal between two arms, we have

varols(β^1,ols7Yijt0var(β^1,ols7Yijt0.

Thus, var^olsβ^1,ols7Yijt0 is nearly unbiased for varβ^1,ols7Yijt0 in a balanced design [3]. When sample sizes are not equal between two arms,

varols(β^1,ols7Yijt0var(β^1,ols7Yijt0,

it follows directly that var^olsβ^1,ols7Yijt0 is biased for varβ^1,ols7Yijt0 due to heteroscedasticity. var^olsβ^1,ols7Yijt0 may over-estimate varβ^1,ols7Yijt0 when the group with a larger residual variance has larger sample size and the group with a smaller residual variance has smaller sample size, and otherwise may underestimate varβ^1,ols7Y~ijt0 [3, 4] . ANCOVAI is robust against heteroscedasticity in a balanced design, but not in an unbalanced design.

Second, different from ANCOVAII, varβ^1,ols7Yijt0 is unbiased for varβ^1,ols7 because varβ^1,ols7=Evarβ^1,ols7Yijt0.

Thus, the model-based standard errors and p-values are valid for unconditional inference in a balanced design but are biased in an unbalanced design only due to heteroscedasticity. This bias can be easily corrected by replacing var^olsβ^1,ols7Yijt0 with an HC variance estimator var^HCβ^1,ols7Yijt0[4, 19] and corrected ANCOVAI will provide valid unconditional inference.

Constrained Repeated Measures heterogeneous variance model (“cRM”): We model the baseline and post-treatment weights (Yijt0,Yijt1) jointly using the binary time point Tij, time by treatment interaction Gij × Tij:

Yijt=γ08+γ18Tij+γ28Gij×Tij+eijt8j=0,1;i=1,2,nj. 8
ei0t08ei0t18~N000in the controlarm,
ei1t08ei1t18~N001in the treatmentarm,

where γ08=μt0,γ28=μ0t1μ0t0, and γ28=τ. Noting that subjects in the treatment and control arms have different variance-covariance structures for the association between the pre- and post-treatment weights, we fit a cRM heterogeneous variance GLS model with group specific variance-covariance structure (“repeated/group=” in SAS proc. mixed procedure specifies distinct variance-covariance structure for each treatment arm). The formulas of γ^2,gls8 and varγ^2,gls8are listed in Table 2. The GLS estimator γ^2,gls8is asymptotically unbiased forγ28. REML is used to derive the empirical or model-based variance estimatorvar^reml(γ^2,gls8).

Results

All treatment effect estimators, except the ANOVA estimator, are expressed as the mean difference in post-treatment measurements adjusting for the chance imbalance in baseline measurement between two arms in certain ways. Nonetheless, all estimators are unbiased for τ. To compare these competing methods, we evaluate the efficiency of point estimators of treatment effect by comparing their “unconditional” variances. Since the hypothesis testing of no treatment effect is based on dividing the point estimator by its standard error (i.e., variance divided by sample size) and rejecting the null hypothesis when this ratio exceeds a given threshold, the method that produces unbiased point estimate with the smallest unconditional variance is preferred because standard error in the dominator of statistical test determines the statistical power.

When study population is homogeneous

ANCOVAI is a more efficient alternative to ANOVA because varβ^1,ols2varβ^1,ols1 (Table 1). This advantage of ANCOVA over ANOVA can also be observed from the fact that the residual error variance of ANCOVAI is less than the residual error variance of ANOVA (i.e.,1ρ2σ12σ12). When the correlation coefficient ρ becomes larger, the ANCOVAI estimator has smaller variance. Since Yijt1and Yijt0are highly correlated in general, the inclusion of Yijt0 in ANCOVAI explains away some variability inYijt1 and thus reduces the residual variance and yields a more efficient estimator of treatment effect than ANOVA.

ANOVA-Change and RM have exactly same point estimators of τ and thus have the same variances (Table 1). To compare ANOVA-Change or RM with ANOVA, we can derive the difference between the unconditional variances of their treatment effect estimators as follows:

1=σ012ρσ1.

When ρ<12σ1, ∆1 > 0 and ANOVA outperforms ANOVA-Change and RM because the ANOVA estimator has smaller variance. When ρ>12σ1, ∆1 < 0 and ANOVA underperforms the other two methods.

It can be shown that the difference between the unconditional variances of the ANCOVAI or cRM estimators and those of theANOVA-Change or RM estimators are always nonnegative:

2=σ12+σ022ρσ0σ11ρ2σ12=σ0ρσ120

Thus, ANOVA-Change or RM is less efficient than either ANCOVAI or cRM because their estimators have larger variances. Intuitively ANCOVAI or cRM assumes that mean baseline weights in two arms are equal in a randomized study but ANOVA-Change or RM assumes that there is a baseline difference and needs to estimate an extra parameter.

As shown in Table 1, the ANCOVAI and cRM estimators of τ are equivalent because β1,ols2 = ρσ0σ1σ02. However, ANCOVAI plugs in the OLS estimatorsβ^1,ols2, whereas cRM plugs in the REML estimators of the variance and covariance parameters. The numerical difference between β^1,ols2 and γ^3,gls4 becomes negligible as sample size increases. Because of this equivalence between β^1,ols2 and γ^3,gls4,varβ^1,ols2 and varγ^3,gls4 are equal [3]. As discussed previously, ANCOVAI is a conditional model assuming fixed baseline covariates. Even though the model-based variance estimates are conditional, they are unbiased for the unconditional variance and thus the usual model-based conditional inference is still valid for unconditional hypothesis testing. ANCOVAI performs comparably to cRM [3, 17].

When study population is heterogeneous

A heterogeneous study population justifies the inclusion of a treatment by baseline weight interaction term. Thus, ANCOVAII is the correctly specified model, whereas ANCOVAI is a mis-specified model. In this case, the “conditional” treatment effect is not constant across different values of baseline weight. The “marginal” treatment effect τ is simply the average of the conditional treatment effect over the distribution of the baseline weight and measures an overall treatment effect. As shown previously, both ANCOVA models can be used to estimate τ even though ANCOVAI is mis-specified. Then, what is the advantage of using a more complex interaction model over a main effect model? It turns out the ANCOVAII estimator β^1,ols6 is more efficient than the ANCOVAI estimator β^1,ols7 because varβ^1,ols6varβ^1,ols7 [5]. Only in a balanced design varβ^1,ols6=varβ^1,ols7 and the two ANCOVA models perform comparably. Note that the OLS model-based variance estimates for ANCOVAI and II are both biased for the corresponding unconditional variances, but the HC-variance estimators provide simple fixes.

The ANCOVAII and cRM estimators of τ are equivalent because β26+β36=ρ0σ0σ01σ02 and β26=ρ1σ0σ11σ02 (Table 2). Two methods only differ in the way two estimators are estimated. ANCOVAII plugs in the OLS estimatorsβ^2,ols6 and β^3,ols6, whereas cRM plugs in the REML estimators of the variance and covariance parameters. The numerical difference between the ANCOVAII and cRM estimators becomes smaller as sample size increases. As discussed previously, standard statistical softwares such as SAS does not output unconditional variance for ANCOVAII directly but the usual OLS model-based standard errors and p-values are biased for unconditional inference in heterogeneous scenario. The adjusted HC-variance estimator fixes this bias. Corrected ANCOVAII provides valid unconditional inference and performs comparably to cRM. Another alternative approach to estimate variances of the ANCOVAI and II estimators is to use bootstrap method [20].

Data example

No human data was used in this study. Instead we simulated three weight loss trial data sets based on a published study for three scenarios: homogeneous data, heterogeneous data with balanced and unbalanced designs as follows [21]:

  1. The baseline weights for the control and treatment arms were generated from normal distribution with mean 88 kg and standard deviation 14 kg. Weights at 6 month after treatment for the control arm have mean 86 kg and standard deviation 15 kg. This gives a ~ 2.3% change from baseline. The mean and standard deviation of body weight at the sixth month in the treatment arm are 83 kg and 15 kg, respectively; This corresponds to a 5.7% change from baseline.

  2. In the homogeneous data, the correlation coefficient between the pre- and post-treatment weights is 0.9. One hundred eighty subjects were assigned to the treatment and control arms equally. In the heterogeneous data, the correlation coefficient between the pre- and post-treatment weights in the control arm is 0.9 and 0.7 in the treatment arm. Sample sizes are (n0 = 90, n1 = 90) for the balanced design and (n0 = 60, n1 = 120) for the unbalanced design. We analyzed the data examples using the methods outlined in section Methods. The statistical results were reported in Table 3 (SAS programs are provided in the Additional file 1).

Table 3.

Statistical analysis of the three simulated data examples

Scenario Method Estimate Standard error p-value
Homogeneous ANOVA −3.089 2.106 0.144
ANCOVAI −2.422 0.955 0.0121
ANOVA-Change −2.354 0.971 0.0163
RM −2.354 0.971 0.0163
cRM −2.434 0.944 0.0108

Heterogeneous

(n0 = 90, n1 = 90)

ANCOVAI −3.203 1.403a 0.0235
1.397b 0.0231
1.400d n/a
ANCOVAII −3.165 1.333a 0.0187
1.402c 0.0252
1.397d n/a
cRM −3.203 1.405 0.0241

Heterogeneous

(n0 = 60, n1 = 120)

ANCOVAI −3.416 1.415a 0.0167
1.279b 0.0083
1.281d n/a
ANCOVAII −3.399 1.376a 0.0145
1.258c 0.0076
1.260d n/a
cRM −3.396 1.262 0.0078

aOLS regression model-based standard error

bHC standard error for ANCOVA I (main effect) model

cModified HC standard error for ANCOVA II (interaction) model

dBootstrapping standard error (n = 5000)

In the first data example, ANOVA produced the largest standard error and the largest p-value. ANOVA-Change and RM both outperformed ANOVA with much smaller standard errors and p-values. ANCOVAI and cRM outperformed ANOVA-Change and RM with smaller standard errors and p-values. Although ANCOVAI and cRM are equivalent when sample size is large, there are still minor numerical differences between the two in finite sample.

For the second data example with a balanced design, Fig. 2a shows that there is a strong baseline weight by treatment interaction. Both ANCOVAI and II have heteroscedastic errors by treatment arm (Fig. 2b and c). As shown in Table 2, the OLS model-based standard error of ANCOVAI is very similar to its HC and bootstrap standard errors. Thus, heteroscedasticity does not bias the model-based standard error of ANCOVAI. Although ANCOVAII is robust against heteroscedasticity in the balanced design, the OLS model-based standard error of ANCOVAII (s.e = 1.333) is still not correct because OLS fails to consider the variability of estimating the overall mean baseline weight. The adjusted HC standard error for ANCOVAII is 1.402, which is closer to the model-based and HC standard errors of ANCOVAI. The bootstrapping standard errors for ANCOVAI and II are close to their HC or adjusted HC standard errors, which suggests the HC and adjusted HC variances perform well in estimating the unconditional variances. The cRM estimate and its standard error are close to those from ANCOVAI and II.

Fig. 2.

Fig. 2

Diagnosis plots of ANCOVA main and interaction models in heterogeneous scenario. a Scatter plot of baseline and follow-up weights in balanced design. Black and red solid dots are data points in the treatment and control arms. Black and red solid lines are the regression slopes of baseline weight against follow-up weight in the treatment and control arms. b Boxplot of residuals from the treatment and control arms from ANCOVAI model in balanced design; c Boxplot of residuals from the treatment and control arms from ANCOVAII model in balanced design; d Scatter plot of baseline and follow-up weights in unbalanced design. Black and red solid dots are data points in the treatment and control arms. Black and red solid lines are the regression slopes of baseline weight against follow-up weight in the treatment and control arms. e Boxplot of residuals from the treatment and control arms from ANCOVAI model in an unbalanced design; f Boxplot of residuals from the treatment and control arms from ANCOVAII model in an unbalanced design

For the third example with an unbalanced design, Fig. 2d also reveals a baseline weight by treatment interaction. Both ANCOVA models have heteroscedastic errors by treatment arm (Fig. 2e and f). The model-based standard errors of ANCOVAI and II are not valid. The model-based standard errors were larger than the HC standard errors and thus overestimated the true conditional variances. Compared with ANCOVAI, ANCOVAII has a smaller HC standard error (also smaller p-value) and thus is slightly more efficient. The adjusted HC standard error for ANCOVAII is very close to the model-based standard error for cRM. The bootstrapping standard errors for ANCOVAI and II are very close to their HC or adjusted HC standard errors.

Discussion

In this study we compare the efficiency of six unbiased methods analyzing pre-post designs. We found ANCOVA and cRM are the equally most efficient methods compared with other alternatives in homogeneous and heterogeneous scenarios. In this study, we focus on the scenario in which randomization is properly performed and these competing methods all target the same causal quantity. In the scenarios where the treatment is not properly randomized or not randomized at all (e.g., in an observational study), the baseline score will not be balanced by design. In this case these competing methods may target different causal quantities. Debate over using change-score analysis (or RM) verse ANCOVA in the non-randomized setting, generally known as the lord’s paradox, is a well-known example [22, 23].

The majority of previous studies has only examined homogeneous study population. In this setting, ANOVA is one of the least efficient approaches for analyzing pre-post designs because it does not utilize any baseline information. ANOVA-Change and RM incorporate the baseline score as part of outcome, whereas ANCOVAI includes the baseline score as a covariate. ANCOVAI outperforms ANOVA-Change and RM because ANCOVAI utilizes the assumption that the baseline scores are balanced between two arms in a randomized study. Thus, change score is a less efficient way to utilize the baseline score than including the baseline score as a covariate. Since we seldom can control the values of the baseline score in randomized trials, the OLS assumption that the baseline score is fixed casts doubt on the validity of ANCOVA for hypothesis testing [6, 12]. Crager proved ANCOVAI is valid for unconditional inference in homogeneous scenario [6]. This conclusion can be simply attributed to that the conditional variance of the ANCOVAI estimator is an unbiased estimate for its unconditional variance [3].

A few studies investigated further a heterogeneous scenario [3, 4, 10, 12, 24]. Although the heterogeneity justifies the inclusion of the baseline measurement by treatment interaction term, ANCOVAI and II are both unbiased. Yang and Tsiatis showed that ANCOVAII has a smaller unconditional variance estimator than that of ANCOVAI unless in a balanced design [9]. However, the OLS model-based variances of the ANCOVAI and II estimators, reported by standard statistical softwares, are conditional variances, not unconditional variances. The OLS model-based standard errors and associated p-values for ANCOVAII are generally questionable for unconditional inference, and the model-based inference for ANCOVAI is biased only when the design is unbalanced [3, 4, 10, 24]. With the corrected HC variance estimators, both models provide valid unconditional inference. Choosing between ANCOVAI and II then becomes an evaluation of a trade-off between simplicity and some gains in efficiency.

In homogenous setting, cRM was suggested as a superior choice to ANCOVAI because the unconditional variance of the cRM estimator is smaller than the conditional variance of the ANCOVAI estimator [25]. Kenward et al. pointed out that such direct comparison between the conditional and unconditional variances is not meaningful. Since both estimators are equivalent, it can be shown that cRM coupled with REML and Kenward-roger adjustment performs almost identically to ANCOVAI in finite samples [17]. In heterogeneous scenario, cRM is comparable to ANCOVAII [3]. In presence of missing data, applied researchers often prefer cRM over ANCOVA because it can utilize all observed data but ANCOVA uses only complete cases. However, imputation methods which utilize the strong pre-post correlation, such as weighting and regression imputation, can improve the statistical power for ANCOVA without biasing estimates, making it comparable to cRM [17].

Furthermore, ANCOVA has several advantages over cRM: first, outcome should only be the variable that can be influenced by treatment. Baseline measurement is certainly not an outcome by this definition. It is conceptually more appropriate to include the baseline score as covariate, not model it as outcome [5]; Second, it is very convenient to include other baseline variables in a regression model for more efficient estimates of treatment effect. Third, it is easy to adjust for other patterns of heteroscedastic errors in an OLS regression. For example, we may expect larger variability in the post-treatment weights associated with larger baseline weights. cRM cannot handle this more complex type of heteroscedasticity easily. HC-variance estimators for ANCOVA are simple fixes and readily implemented in statistical softwares.

Conclusion

Comparing with other alternative methods, ANCOVA is a simple and the most efficient approach analyzing a pre-post randomized design. When there exists a baseline score by treatment interaction, we need to assess the heteroscedasticity of ANCOVA particularly when the design is not balanced. The HC-variances should be used for valid inference when heteroscedasticity is present. Adding an interaction term in ANCOVA can gain some efficiency but not including this term does not bias results.

Supplementary Information

Additional file 1. (21.7KB, zip)

Acknowledgements

Not applicable

Abbreviations

ANOVA

Analysis of variance model

ANCOVAI

Analysis of covariance model adjusting for the baseline measurement

ANCOVAII

Analysis of covariance model adjusting for the baseline measurement by treatment interaction

RM

Constrained repeated measure model

cRM

Constrained repeated measure model

HC

Heteroscadascity-consistent

Author’s contributions

FW developed the idea for the paper, performed analysis, and drafted the manuscript. The author(s) read and approved the final manuscript.

Funding

Not applicable

Availability of data and materials

SAS code is provided as the Additional file 1. There is no real data used. All data generated or analyzed during this study are included in this published article [and its supplementary information files].

Declarations

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Competing interests

The author declare that he has no competing risk

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Vickers AJ. Analysis of variance is easily misapplied in the analysis of randomized trials: a critique and discussion of alternative statistical approaches. Psychosom Med. 2005;67(4):652–655. doi: 10.1097/01.psy.0000172624.52957.a8. [DOI] [PubMed] [Google Scholar]
  • 2.O'Connell NS, Dai L, Jiang Y, Speiser JL, Ward R, Wei W, Carroll R, Gebregziabher M. Methods for analysis of pre-post data in clinical research: a comparison of five common methods. J Biom Biostat. 2017;8(1):1–8. doi: 10.4172/2155-6180.1000334. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Wan F. Analyzing pre-post randomized studies with one post-randomization score using repeated measures and ANCOVA models. Stat Methods Med Res. 2019;28(10-11):2952–2974. doi: 10.1177/0962280218789972. [DOI] [PubMed] [Google Scholar]
  • 4.Wan F. Analyzing pre-post designs using the analysis of covariance models with and without the interaction term in a heterogeneous study population. Stat Methods Med Res. 2020;29(1):189–204. doi: 10.1177/0962280219827971. [DOI] [PubMed] [Google Scholar]
  • 5.Senn S. Change from baseline and analysis of covariance revisited. Stat Med. 2006;25(24):4334–4344. doi: 10.1002/sim.2682. [DOI] [PubMed] [Google Scholar]
  • 6.Crager MR. Analysis of covariance in parallel-group clinical trials with pretreatment baseline. Biometrics. 1987;43(4):895–901. doi: 10.2307/2531543. [DOI] [PubMed] [Google Scholar]
  • 7.Frison L, Pocock SJ. Repeated measures in clinical trials: analysis using mean summary statistics and its implications for design. Stat Med. 1992;11(13):1685–1704. doi: 10.1002/sim.4780111304. [DOI] [PubMed] [Google Scholar]
  • 8.Brogan DR, Kutner MH. Comparative analyses of pretest-posttest research designs. Am Stat. 1980;34:229–232. [Google Scholar]
  • 9.Yang L, Tsiatis AA. Efficiency study of estimators for a treatment effect in a pretest-posttest trial. Am Stat. 2001;55(4):314–321. doi: 10.1198/000313001753272466. [DOI] [Google Scholar]
  • 10.Winkens B, van Breukelen GJ, Schouten HJ, Berger MP. Randomized clinical trials with a pre- and a post-treatment measurement: repeated measures versus ANCOVA models. Contemp Clin Trials. 2007;28(6):713–719. doi: 10.1016/j.cct.2007.04.002. [DOI] [PubMed] [Google Scholar]
  • 11.Dimitrov DM, Rumrill J, Phillip D. Pretest-posttest designs and measurement of change. Work. 2003;20:159–165. [PubMed] [Google Scholar]
  • 12.Chen X. The adjustment of random baseline measurements in treatment effect estimation. J Stat Plan Inference. 2006;136(12):4161–4175. doi: 10.1016/j.jspi.2005.08.046. [DOI] [Google Scholar]
  • 13.Jennings E. Models for pretest-posttest data: repeated measures ANOVA revisited. J Educ Behav Stat. 1988;13(3):273–280. doi: 10.3102/10769986013003273. [DOI] [Google Scholar]
  • 14.Liang K, Zeger S. Longitudinal data analysis of continuous and discrete responses for pre-post designs. Sankhya. 2000;62:134–148. [Google Scholar]
  • 15.Donald AB, Gregory DA. Symmetrized percent change for treatment comparisons. Am Stat. 2006;60:27–31. doi: 10.1198/000313006X90684. [DOI] [Google Scholar]
  • 16.Cole TJ, Altman DG. Statistics notes: what is a percentage difference? BMJ. 2017;358:j3663. doi: 10.1136/bmj.j3663. [DOI] [PubMed] [Google Scholar]
  • 17.Kenward MG, White IR, Carpenter JR. Re: should baseline be a covariate or dependent variable in analyses of change from baseline in clinical trials? (Liu GF et al., Stat Med 2009; 28: 2509–30) Stat Med. 2010;29(13):1455–1456. doi: 10.1002/sim.3868. [DOI] [PubMed] [Google Scholar]
  • 18.Kenward MG, Roger JH. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics. 1997;53(3):983–997. doi: 10.2307/2533558. [DOI] [PubMed] [Google Scholar]
  • 19.Long J, Ervin L. Using heteroscedasticity: consistent standard errors in the linear regression model. Am Stat. 2000;54:217–224. [Google Scholar]
  • 20.Efron B, Tibshirani RJ. An introduction to the bootstrap. New York: Chapman & Hall; 1993. [Google Scholar]
  • 21.Boozer CN, Daly PA, Homel P, et al. Herbal ephedra/caffeine for weight loss: a 6-month randomized safety and efficacy trial. Int J Obes Relat Metab Disord. 2002;6:593–604. doi: 10.1038/sj.ijo.0802023. [DOI] [PubMed] [Google Scholar]
  • 22.Lord FM. A paradox in the interpretation of group comparisons. Psychol Bull. 1967;68(5):304–305. doi: 10.1037/h0025105. [DOI] [PubMed] [Google Scholar]
  • 23.Egbewale BE, Lewis M, Sim J. Bias, precision and statistical power of analysis of covariance in the analysis of randomized trials with baseline imbalance: a simulation study. BMC Med Res Methodol. 2014;14(1):49. doi: 10.1186/1471-2288-14-49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Senn S. Various varying variances: the challenge of nuisance parameters to the practicing biostatistician. Stat Methods Med Res. 2015;24(4):403–419. doi: 10.1177/0962280214520728. [DOI] [PubMed] [Google Scholar]
  • 25.Liu GF, Lu K, Mogg R, Mallick M, Mehrotra DV. Should baseline be a covariate or dependent variable in analyses of change from baseline in clinical trials? Stat Med. 2009;28(20):2509–2530. doi: 10.1002/sim.3639. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Additional file 1. (21.7KB, zip)

Data Availability Statement

Availability of data and materials

SAS code is provided as the Additional file 1. There is no real data used. All data generated or analyzed during this study are included in this published article [and its supplementary information files].

SAS code is provided as the Additional file 1. There is no real data used. All data generated or analyzed during this study are included in this published article [and its supplementary information files].


Articles from BMC Medical Research Methodology are provided here courtesy of BMC

RESOURCES