Statistical Evaluation of Absolute Change versus Responder Analysis in Clinical Trials

Peijin Wang; Sarah Peskoe; Rebecca Byrd; Patrick Smith; Rachel Breslin; Shein-Chung Chow

doi:10.15212/amm-2022-0020

. Author manuscript; available in PMC: 2023 Jun 2.

Published in final edited form as: Acta Mater Med. 2022 Aug 26;1(3):320–332. doi: 10.15212/amm-2022-0020

Statistical Evaluation of Absolute Change versus Responder Analysis in Clinical Trials

Peijin Wang ¹, Sarah Peskoe ¹, Rebecca Byrd ², Patrick Smith ³, Rachel Breslin ², Shein-Chung Chow ¹

PMCID: PMC10237148 NIHMSID: NIHMS1833344 PMID: 37274016

Abstract

In clinical trials, the primary analysis is often either a test of absolute/relative change in a measured outcome or a corresponding responder analysis. Though each of these tests may be reasonable, determining which test is most suitable for a particular research study is still an open question. These tests may require different sample sizes, define different clinically meaningful differences, and most importantly, lead to different study conclusions. This paper aims to compare a typical non-inferiority test using absolute change as the study endpoint to the corresponding responder analysis in terms of sample size requirements, statistical power, and hypothesis testing results. From numerical analysis, using absolute change as an endpoint generally requires a larger sample size; therefore, when the sample size is the same, the responder analysis has higher power. The cut-off value and non-inferiority margin are critical which can meaningfully impact whether the two types of endpoints yield conflicting conclusions. Specifically, an extreme cut-off value is more likely to cause different conclusions. However, this impact decreases as population variance increases. One important reason for conflicting conclusions is that the population distribution is not normal. To eliminate conflicting results, researchers should pay attention to the population distribution and cut-off value selection.

Keywords: Primary Endpoints, Responder Analysis, Threshold Selection

Graphical Abstract

graphic file with name nihms-1833344-f0005.jpg

1. Introduction

In clinical trials, an analysis of a primary study endpoint is often conducted to determine whether the intended studies will achieve study objective with a desired statistical power. In practice, investigators can consider four different kinds of primary endpoints or outcomes based on this single study objective: (i) absolute change (i.e., endpoint absolute change from baseline), (ii) relative change (e.g., endpoint percent change from baseline), (iii) responder analysis based on absolute change (i.e., an individual subject is defined as a responder if his/her absolute change in the primary endpoint has exceeded a pre-specified threshold known as a clinically meaningful improvement), and (iv) responder analysis based on relative change. Although analyses based on these endpoints all sound reasonable, the following statements are often of great concern to the principal investigators (Chow, 2011). First, clinically meaningful differences (improvement) of these derived endpoints may not directly translate to one another. Second, these derived endpoints generally have different sample size requirements. Third, and most importantly, these derived endpoints may not arrive the same statistical conclusion (based on the same data set). As a result, it is of particular interest to determine which type of primary endpoint is most appropriate and can best inform the disease status and treatment effect.

Some scholars criticized responder analysis due to loss of information, i.e., statistical power of a trial will be reduced if we categorize a continuous outcome into a binary variable (Snapinn and Jiang, 2007; Henschke et al., 2014). Though responder analysis may cost in power, it still has its own implement. For example, we need to use the original scale (continuous) outcome to make binary decision, such as whether the patient should be hospitalized. In a heterogeneous disease, a subset of patients may have more benefit than others, and then the distribution of the outcome variable would be not normal (Jones et al., 2016). If the trial is to investigate an additional second agent and the proportion of patients with more benefit may be of greater interest, a responder analysis is more suitable (Jones et al., 2016). According to its the benefits and unfavorable drawbacks, Henschke et al. (2014) suggested to use responder analysis as the secondary analysis to better interpret findings from the main analysis. However, it should be noted that analysis using absolute change as endpoint and corresponding responder analysis have different statistical properties. Hence, it is of great importance to investigate their differences in terms of statistical power, sample size and conclusion.

To study the relative performance of these derived endpoints, in addition to mathematical derivations, we further conduct numerical study and real case study, using data from a recent rehabilitation program study in lung transplant candidates and recipients (Byrd et al., 2022). One of the commonly used clinical indicators for patients with pulmonary disease is 6-minute walk distance (6MWD), which can be used as not only a prognostic factor but also as a health outcome variable (Tuppin et al., 2008). For example, 6MWD has been used to measure functional status and exercise capacity of lung transplant recipients or patients (Martinu et al, 2008; Munro et al, 2009). Some studies have used endpoint as change from baseline (absolute change) of 6MWD as the outcome variable to evaluate the performance of pulmonary disease treatment (Ryerson et al, 2014), while some considered a responder analysis using 6MWD (Gilbert et al., 2009; Stoilkova-Hartmann et al, 2015). An individual may be defined as a responder if he/she meets a pre-specified threshold of improvement in 6MWD, otherwise he/she will be defined as a non-responder. As an example, Stoilkova-Hartmann et al. (2015) considered classifying patient performance after rehabilitation using the following criteria: 6MWD increment ≥ 50m is considered good, ≥ 25 to < 50m is moderate, and < 25m a non-responder. However, in another study, Holland et al. (2014) reported that the minimal important difference of change of 6MWD in chronic respiratory disease was 25 to 33m. Holland et al. (2017) used 25m as the threshold for equivalence in the change of 6MWD. Though the wide range of 6MWD is generally accepted as 25 to 30m, the exact threshold of change in 6MWD which is considered of clinically meaningful is debatable.

In this case study, for simplicity, we will focus on statistical evaluation of rehabilitation program in lung transplant candidates and recipients in term of absolute change of 6MWD and the responder analysis based on a pre-specified threshold (improvement) of 6MWD using absolute change. A comparison between the absolute change and responder analysis with various pre-specified thresholds is made in terms of sample size requirement and statistical power. In the next section, we present statistical methods for an analysis using absolute change as the study endpoint as well as the corresponding responder analysis. Additionally, we compare the performances of these study endpoints in terms of statistical power, sample size and study results/conclusions. In Section 3, we discuss a numerical analysis of the comparison between absolute change and responder analysis and the case study of the rehabilitation program in lung transplant candidate and recipients. Brief concluding remarks and recommendations are given in the last section of this article.

2. Methods

2.1. Hypothesis Testing for Efficacy

In a randomized clinical trial evaluating the performance of a new drug or a new treatment as compared to an active control (e.g., standard of care), non-inferiority testing is commonly considered. The success of a non-inferiority trial depends upon the selection of study endpoint and the non-inferiority margin. As indicated earlier, for a given study endpoint, there are four types of primary endpoints, namely, absolute change (e.g., endpoint change from baseline), relative change (e.g., endpoint percent change from baseline), responder analysis based on a pre-specified improvement (threshold) of absolute change, and responder analysis based on a pre-specified improvement (threshold) of relative change. As a result, the inference from a responder analysis is very sensitive to the pre-specified threshold (cutoff) value (Chow and Song, 2015). For simplicity and illustration purposes, in this artice, we will examine the performaces of the first two primary endpoints: absolute change and a corresponding responder analysis.

We assume a two-arm parallel randomized clinical trial comparing a test treatment (T) and an active control (C) with 1:1 treatment allocation ratio. Let W_1ij and W_2ij be the original response of ith patient in jth treatment group at baseline and post-treatment, where i = 1, …, n_j and j = C, T, respectively. Furthermore, W_1ij is assumed to follow lognormal distribution $L N (μ_{j}, σ_{j}^{2})$ , and W_2ij = W_1ij(1 + Δ_ij), where $Δ_{i j} ~ L N (μ_{Δ_{i j}}, σ_{Δ_{i j}}^{2})$ . Hence, the absolute change from baseline is

W_{2 i j} - W_{1 i j} = W_{1 i j} Δ_{i j} ~ L N (μ_{j} + μ_{Δ_{j}}, σ_{j}^{2} + σ_{Δ_{j}}^{2}),

(1)

where W_1ij and Δ_ij are assumed to be independent. Let X_ij = log(W_2ij − W_1ij) represents the log absolute change, then $X_{i j} ~ N (μ_{j} + μ_{Δ_{j}}, σ_{j}^{2} + σ_{Δ_{j}}^{2})$ . Let x_ij denote the observations of random variable X_ij. The reason to use W_1ij and W_2ij instead directly use X_ij following normal distribution is that the same notation can be used to denote relative change. For example, $Y_{i j} = log (\frac{W_{2 i j} - W_{1 i j}}{W_{1 i j}})$ can represent log relative change, which follows $N (μ_{Δ_{i j}}, σ_{Δ_{i j}}^{2})$ . Though relative change endpoint is not the focus of this paper, this notation will benefit future studies.

The outcome variable for responder analysis based on a pre-specified absolute change is then given by $r_{A_{j}} = \frac{# {x_{i j} > c_{1}}}{n_{j}}$ , where c₁ is the cutoff value. Then the endpoint for the responder analysis becomes p_{A_j} = E[r_{A_j}]. For sufficiently large sample size, it can be verified that r_{A_j} asymptotically follows $N (p_{A_{j}}, \frac{p_{A_{j}} (1 - p_{A_{j}})}{n_{j}})$ (Chow, 2011). According to the definition of p_{A_j},

p_{A_{j}} = E [r_{A_{j}}] = P (X_{i j} > c_{1}) = P (\frac{X_{i j} - (μ_{j} + μ_{Δ_{j}})}{\sqrt{σ_{j}^{2} + σ_{Δ_{j}}^{2}}} > \frac{c_{1} - (μ_{j} + μ_{Δ_{j}})}{\sqrt{σ_{j}^{2} + σ_{Δ_{j}}^{2}}}) = 1 - Φ (\frac{c_{1} - (μ_{j} + μ_{Δ_{j}})}{\sqrt{σ_{j}^{2} + σ_{Δ_{j}}^{2}}}),

(2)

where Φ(·) is the cumulative distribution function (CDF) of standard normal distribution. The hypotheses for non-inferority testing based on the derived endpoint of absolute change and the corresponding responder analysis can be set up as follows.

Absolute change:
$H_{0} : (μ_{C} + μ_{Δ_{C}}) - (μ_{T} + μ_{Δ_{T}}) \geq δ_{1} v.s. H_{A} : (μ_{C} + μ_{Δ_{C}}) - (μ_{T} + μ_{Δ_{T}}) < δ_{1},$ (3)
where δ₁ is the non-inferiority margin in hypothesis teting using absolute change.
Responder analysis based on a pre-specified threshold (improvement) of absolute change:
$H_{0} : p_{A_{C}} - p_{A_{T}} \geq δ_{2} v.s. H_{A} : p_{A_{C}} - p_{A_{T}} < δ_{2} .$ (4)
where δ₂ is the non-inferiority margin in hypothesis testing using responder analysis.

For a non-inferority test based on the derived endpoint of absolute difference, the Z test statistic under null hypothesis in Equation (3) is given by

Z_{1} = \frac{{\bar{x}}_{T} - {\bar{x}}_{C} + δ_{1}}{\sqrt{\frac{σ_{T}^{2} + σ_{Δ_{T}}^{2}}{n_{1}} + \frac{σ_{C}^{2} + σ_{Δ_{C}}^{2}}{n_{1}}}} = \frac{{\bar{x}}_{T} - {\bar{x}}_{C} + δ_{1}}{\sqrt{\frac{σ_{T}^{2} + σ_{Δ_{T}}^{2} + σ_{C}^{2} + σ_{Δ_{C}}^{2}}{n_{1}}}} ~ N (0, 1),

(5)

where ${\bar{x}}_{T}$ and ${\bar{x}}_{C}$ are the sample mean of absolute change in treatment and control group, and n₁ is the sample size of the treatment or control group, assuming the allocation ratio is 1:1. Let δ_1A denote the true sample mean difference. The corresponding statistical power can be written as

{power}_{1} = P ({Reject H}_{0} {|H}_{A} is true) = P (Z_{1} > z_{1 - α} | {\bar{x}}_{T} - {\bar{x}}_{C} = δ_{1 A}) = P ({\bar{x}}_{T} - {\bar{x}}_{C} > z_{1 - α} \sqrt{\frac{σ_{T}^{2} + σ_{Δ_{T}}^{2} + σ_{C}^{2} + σ_{Δ_{C}}^{2}}{n_{1}}} - δ_{1} | {\bar{x}}_{T} - {\bar{x}}_{C} = δ_{1 A}) = P (\frac{{\bar{x}}_{T} - {\bar{x}}_{C} - δ_{1 A}}{\sqrt{\frac{σ_{T}^{2} + σ_{Δ_{T}}^{2} + σ_{C}^{2} + σ_{Δ_{C}}^{2}}{n_{1}}}} > \frac{z_{1 - α} \sqrt{\frac{σ_{T}^{2} + σ_{Δ_{T}}^{2} + σ_{C}^{2} + σ_{Δ_{C}}^{2}}{n_{1}}} - δ_{1} - δ_{1 A}}{\sqrt{\frac{σ_{T}^{2} + σ_{Δ_{T}}^{2} + σ_{C}^{2} + σ_{Δ_{C}}^{2}}{n_{1}}}}) = 1 - Φ (z_{1 - α} - \frac{δ_{1} + δ_{1 A}}{\sqrt{\frac{σ_{T}^{2} + σ_{Δ_{T}}^{2} + σ_{C}^{2} + σ_{Δ_{C}}^{2}}{n_{1}}}}) .

(6)

The sample size requirement for the non-inferiority test using absolute difference can then be obtained as follows

n_{1} = \frac{2 {(z_{1 - α} + z_{β})}^{2} (σ_{T}^{2} + σ_{Δ_{T}}^{2} + σ_{C}^{2} + σ_{Δ_{C}}^{2})}{{[(μ_{C} + μ_{Δ_{C}}) - (μ_{T} + μ_{Δ_{T}}) - δ_{1}]}^{2}} .

(7)

For a non-inferiority test of responder analysis based on a pre-specified threshold (improvement) of absolute difference, the Z test statistic under null hypothesis in Equation (4) can be derived as follows

Z_{2} = \frac{r_{A_{T}} - r_{A_{C}} + δ_{2}}{\sqrt{\frac{r_{A_{T}} (1 - r_{A_{T}})}{n_{2}} + \frac{r_{A_{C}} (1 - r_{A_{C}})}{n_{2}}}} = \frac{r_{A_{T}} - r_{A_{C}} + δ_{2}}{\sqrt{\frac{r_{A_{T}} (1 - r_{A_{T}}) + r_{A_{C}} (1 - r_{A_{C}})}{n_{2}}}} ~ N (0, 1),

(8)

where r_{A_T} and r_{A_C} are the sample proportions in the treatment and the control group, respectively, and n₂ is the sample size of the treatment or control group, assuming the allocation ratio is 1:1. Similarly, let δ_2A denote the true proportion difference, then the corresponding statistical power is

p o w e r_{2} = P (R e j e c t H_{0} | H_{A} i s t r u e) = P (Z_{2} > z_{1 - α} | r_{A_{T}} - r_{A_{C}} = δ_{2 A}) = P (r_{A_{T}} - r_{A_{C}} > z_{1 - α} \sqrt{\frac{r_{A_{T}} (1 - r_{A_{T}}) + r_{A_{C}} (1 - r_{A_{C}})}{n_{2}}} - δ_{2} | r_{A_{T}} - r_{A_{C}} = δ_{2 A}) = 1 - Φ (z_{1 - α} - \frac{δ_{2} + δ_{2 A}}{\sqrt{\frac{r_{A_{T}} (1 - r_{A_{T}}) + r_{A_{C}} (1 - r_{A_{C}})}{n_{2}}}}) \approx 1 - Φ (z_{1 - α} - \frac{δ_{2} + δ_{2 A}}{\sqrt{\frac{p_{A_{T}} (1 - p_{A_{T}}) + p_{A_{C}} (1 - p_{A_{C}})}{n_{2}}}}),

(9)

where the last approximate equation holds using Slutsky’s theorem (Chow, 2011). The sample size requirement for non-inferiority test for the responder analysis based on a pre-specified threshold (improvement) of absolute difference is then given by

n_{2} = \frac{2 {(z_{1 - α} + z_{β})}^{2} (p_{A_{C}} (1 - p_{A_{C}}) + p_{A_{T}} (1 - p_{A_{T}}))}{{(p_{A_{C}} - p_{A_{T}} - δ_{2})}^{2}} .

(10)

2.2. Statistical Power Comparison in Non-inferiority Tests

Many previous studies have suggested avoiding relative difference due to statistical inefficiency (Vickers, 2001). Following their ideas, we instead consider a comparison of non-inferiority tests using absolute change and a responder analysis using absolute change in terms of statistical power. The required sample sizes and conclusion comparison for non-inferiority tests are also shown in this section. Here, let AC denote absolute change, and PAC denote responder analysis using absolute change.

From the formula of statistical power of non-inferiority test shown in Section 2, the power difference can be computed using the cumulative distribution function (CDF) of N(0,1). Using Taylor expansion, the CDF of N(0,1) Φ(·) can be written as

Φ (x) = \frac{1}{\sqrt{2 π}} \sum_{i = 0}^{n} \frac{{(- 1)}^{n}}{n! 2^{n} (2 n + 1)} x^{2 n + 1} + \frac{1}{2} .

(11)

Keeping the first term of Taylor expansion in Equation (11), then Φ(x₁) − Φ(x₂) can be simplified as

Φ (x_{1}) - Φ (x_{2}) = \frac{1}{\sqrt{2 π}} (\sum_{i = 0}^{n} \frac{{(- 1)}^{n}}{n! 2^{n} (2 n + 1)} x_{1}^{2 n + 1} - \sum_{i = 0}^{n} \frac{{(- 1)}^{n}}{n! 2^{n} (2 n + 1)} x_{2}^{2 n + 1}) = \frac{1}{\sqrt{2 π}} \sum_{i = 0}^{n} \frac{{(- 1)}^{n}}{n! 2^{n} (2 n + 1)} (x_{1}^{2 n + 1} - x_{2}^{2 n + 1}) \approx \frac{1}{\sqrt{2 π}} (x_{1} - x_{2}) .

(12)

To compare the statistical power of a non-inferiority test using the absolute change endpoint with the statistical power for a responder analysis using absolute change endpoint, we start with first simplify p_{A_j}. Using Equation (11), p_{A_j} can be written as

p_{A_{j}} = 1 - Φ (\frac{c_{1} - (μ_{j} + μ_{Δ_{j}})}{\sqrt{σ_{j}^{2} + σ_{Δ_{j}}^{2}}}) = 1 - \frac{1}{\sqrt{2 π}} \cdot \frac{c_{1} - (μ_{j} + μ_{Δ_{j}})}{\sqrt{σ_{j}^{2} + σ_{Δ_{j}}^{2}}} - \frac{1}{2} = \frac{1}{2} - \frac{1}{\sqrt{2 π}} \cdot \frac{c_{1} - (μ_{j} + μ_{Δ_{j}})}{\sqrt{σ_{j}^{2} + σ_{Δ_{j}}^{2}}} .

(13)

And

p_{A_{j}} (1 - p_{A_{j}}) = (\frac{1}{2} - \frac{1}{\sqrt{2 π}} \cdot \frac{c_{1} - (μ_{j} + μ_{Δ_{j}})}{\sqrt{σ_{j}^{2} + σ_{Δ_{j}}^{2}}}) (\frac{1}{2} + \frac{1}{\sqrt{2 π}} \cdot \frac{c_{1} - (μ_{j} + μ_{Δ_{j}})}{\sqrt{σ_{j}^{2} + σ_{Δ_{j}}^{2}}}) = \frac{1}{4} - \frac{1}{2 π} {(\frac{c_{1} - (μ_{j} + μ_{Δ_{j}})}{\sqrt{σ_{j}^{2} + σ_{Δ_{j}}^{2}}})}^{2} = \frac{1}{4} - \frac{{[c_{1} - (μ_{j} + μ_{Δ_{j}})]}^{2}}{2 π (σ_{j}^{2} + σ_{Δ_{j}}^{2})} .

(14)

Hence,

p_{A_{C}} - p_{A_{T}} = \frac{1}{4} - \frac{{[c_{1} - (μ_{C} + μ_{Δ_{C}})]}^{2}}{2 π (σ_{C}^{2} + σ_{Δ_{C}}^{2})} - (\frac{1}{4} - \frac{{[c_{1} - (μ_{T} + μ_{Δ_{T}})]}^{2}}{2 π (σ_{T}^{2} + σ_{Δ_{T}}^{2})}) = \frac{{[c_{1} - (μ_{T} + μ_{Δ_{T}})]}^{2}}{2 π (σ_{T}^{2} + σ_{Δ_{T}}^{2})} - \frac{{[c_{1} - (μ_{C} + μ_{Δ_{C}})]}^{2}}{2 π (σ_{C}^{2} + σ_{Δ_{C}}^{2})},

(15)

and

p_{A_{C}} (1 - p_{A_{C}}) + p_{A_{T}} (1 - p_{A_{T}}) = \frac{1}{4} - \frac{{[c_{1} - (μ_{C} + μ_{Δ_{C}})]}^{2}}{2 π (σ_{C}^{2} + σ_{Δ_{C}}^{2})} + \frac{1}{4} - \frac{{[c_{1} - (μ_{T} + μ_{Δ_{T}})]}^{2}}{2 π (σ_{T}^{2} + σ_{Δ_{T}}^{2})} = \frac{1}{2} - \frac{{[c_{1} - (μ_{T} + μ_{Δ_{T}})]}^{2}}{2 π (σ_{T}^{2} + σ_{Δ_{T}}^{2})} - \frac{{[c_{1} - (μ_{C} + μ_{Δ_{C}})]}^{2}}{2 π (σ_{C}^{2} + σ_{Δ_{C}}^{2})} .

(16)

If we assume the sample sizes of non-inferority test using absolute change and corresponding responder analysis are the same, denoted as n, using Equation (12), the difference between power₁ and power₂ can be written as

p o w e r_{1} - p o w e r_{2} = 1 - Φ (z_{1 - α} - \frac{δ_{1} + δ_{1 A}}{\sqrt{\frac{σ_{T}^{2} + σ_{Δ_{T}}^{2} + σ_{C}^{2} + σ_{Δ_{C}}^{2}}{n}}}) - 1 + Φ (z_{1 - α} - \frac{δ_{2} + δ_{2 A}}{\sqrt{\frac{p_{A_{T}} (1 - p_{A_{T}}) + p_{A_{C}} (1 - p_{A_{C}})}{n}}}) = Φ (z_{1 - α} - \frac{δ_{2} + δ_{2 A}}{\sqrt{\frac{p_{A_{T}} (1 - p_{A_{T}}) + p_{A_{C}} (1 - p_{A_{C}})}{n}}}) - Φ (z_{1 - α} - \frac{δ_{1} + δ_{1 A}}{\sqrt{\frac{σ_{T}^{2} + σ_{Δ_{T}}^{2} + σ_{C}^{2} + σ_{Δ_{C}}^{2}}{n}}}) = \frac{1}{\sqrt{2 π}} (z_{1 - α} - \frac{δ_{2} + δ_{2 A}}{\sqrt{\frac{p_{A_{T}} (1 - p_{A_{T}}) + p_{A_{C}} (1 - p_{A_{C}})}{n}}} - z_{1 - α} + \frac{δ_{1} + δ_{1 A}}{\sqrt{\frac{σ_{T}^{2} + σ_{Δ_{T}}^{2} + σ_{C}^{2} + σ_{Δ_{C}}^{2}}{n}}}) = \frac{1}{\sqrt{2 π}} (\frac{δ_{1} + δ_{1 A}}{\sqrt{\frac{σ_{T}^{2} + σ_{Δ_{T}}^{2} + σ_{C}^{2} + σ_{Δ_{C}}^{2}}{n}}} - \frac{δ_{2} + δ_{2 A}}{\sqrt{\frac{p_{A_{T}} (1 - p_{A_{T}}) + p_{A_{C}} (1 - p_{A_{C}})}{n}}}) = \sqrt{\frac{n}{2 π}} (\frac{δ_{1} + δ_{1 A}}{\sqrt{σ_{T}^{2} + σ_{Δ_{T}}^{2} + σ_{C}^{2} + σ_{Δ_{C}}^{2}}} - \frac{δ_{2} + δ_{2 A}}{\sqrt{\frac{1}{2} - \frac{{[c_{1} - (μ_{T} + μ_{Δ_{T}})]}^{2}}{2 π (σ_{T}^{2} + σ_{Δ_{T}}^{2})} - \frac{{[c_{1} - (μ_{C} + μ_{Δ_{C}})]}^{2}}{2 π (σ_{C}^{2} + σ_{Δ_{C}}^{2})}}}) .

(17)

2.3. Sample Size Comparison in Non-inferiority Tests

From Equation (15) and (16), the sample size for the responder analysis using absolute change endpoint in Equation (10) can be written as

n_{2} = \frac{2 {(z_{1 - α} + z_{β})}^{2} (\frac{1}{2} - \frac{{[c_{1} - (μ_{T} + μ_{Δ_{T}})]}^{2}}{2 π (σ_{T}^{2} + σ_{Δ_{T}}^{2})} - \frac{{[c_{1} - (μ_{C} + μ_{Δ_{C}})]}^{2}}{2 π (σ_{C}^{2} + σ_{Δ_{C}}^{2})})}{{(\frac{{[c_{1} - (μ_{T} + μ_{Δ_{T}})]}^{2}}{2 π (σ_{T}^{2} + σ_{Δ_{T}}^{2})} - \frac{{[c_{1} - (μ_{C} + μ_{Δ_{C}})]}^{2}}{2 π (σ_{C}^{2} + σ_{Δ_{C}}^{2})} - δ_{2})}^{2}} .

(18)

When the significance level and desired statistical power are the same, we can compare the necessary sample size for a responder analysis to a test from absolute change with the ratio

\frac{n_{2}}{n_{1}} = \frac{\frac{2 {(z_{1 - α} + z_{β})}^{2} (p_{A_{C}} (1 - p_{A_{C}}) + p_{A_{T}} (1 - p_{A_{T}}))}{{(p_{A_{C}} - p_{A_{T}} - δ_{2})}^{2}}}{\frac{2 {(z_{1 - α} + z_{β})}^{2} (σ_{T}^{2} + σ_{Δ_{T}}^{2} + σ_{C}^{2} + σ_{Δ_{C}}^{2})}{{[(μ_{C} + μ_{Δ_{C}}) - (μ_{T} + μ_{Δ_{T}}) - δ_{1}]}^{2}}} = \frac{\frac{p_{A_{C}} (1 - p_{A_{C}}) + p_{A_{T}} (1 - p_{A_{T}})}{{(p_{A_{C}} - p_{A_{T}} - δ_{2})}^{2}}}{\frac{σ_{T}^{2} + σ_{Δ_{T}}^{2} + σ_{C}^{2} + σ_{Δ_{C}}^{2}}{{[(μ_{C} + μ_{Δ_{C}}) - (μ_{T} + μ_{Δ_{T}}) - δ_{1}]}^{2}}} = \frac{p_{A_{C}} (1 - p_{A_{C}}) + p_{A_{T}} (1 - p_{A_{T}})}{σ_{T}^{2} + σ_{Δ_{T}}^{2} + σ_{C}^{2} + σ_{Δ_{C}}^{2}} \cdot {[\frac{(μ_{C} + μ_{Δ_{C}}) - (μ_{T} + μ_{Δ_{T}}) - δ_{1}}{p_{A_{C}} - p_{A_{T}} - δ_{2}}]}^{2} = \frac{\frac{1}{2} - \frac{{[c_{1} - (μ_{C} + μ_{Δ_{C}})]}^{2}}{2 π (σ_{C}^{2} + σ_{Δ_{C}}^{2})} - \frac{{[c_{1} - (μ_{T} + μ_{Δ_{T}})]}^{2}}{2 π (σ_{T}^{2} + σ_{Δ_{T}}^{2})}}{σ_{T}^{2} + σ_{Δ_{T}}^{2} + σ_{C}^{2} + σ_{Δ_{C}}^{2}} {[\frac{(μ_{C} + μ_{Δ_{C}}) - (μ_{T} + μ_{Δ_{T}}) - δ_{1}}{\frac{c_{1} - (μ_{T} + μ_{Δ_{T}})}{\sqrt{σ_{T}^{2} + σ_{Δ_{T}}^{2}}} - \frac{c_{1} - (μ_{C} + μ_{Δ_{C}})}{\sqrt{σ_{C}^{2} + σ_{Δ_{C}}^{2}}} - δ_{2}}]}^{2} .

(19)

2.4. Conflict Probability in Non-inferiority Tests

In this section, we aim to investigate the probabilities of a non-inferority test using absolute change as the endpoint and the corresponding responder analysis having similar or different conclusions. We assume the samples used to conduct these two types of non-inferority test are the same. Thus, there are four possible types of events:

Both AC and PAC reject H₀
$P ({AC reject H}_{0} {and PAC reject H}_{0}) = P (Z_{1} > z_{1 - α}, Z_{2} > z_{1 - α}) = P (\frac{{\bar{x}}_{T} - {\bar{x}}_{C} + δ_{1}}{\sqrt{\frac{σ_{T}^{2} + σ_{Δ_{T}}^{2} + σ_{C}^{2} + σ_{Δ_{C}}^{2}}{n}}} > z_{1 - α}, \frac{r_{A_{T}} - r_{A_{C}} + δ_{2}}{\sqrt{\frac{r_{A_{T}} (1 - r_{A_{T}}) + r_{A_{C}} (1 - r_{A_{C}})}{n}}} > z_{1 - α}) .$ (20)
AC fail to reject H₀, whereas PAC reject H₀
$P ({AC fail to reject H}_{0} {and PAC reject H}_{0}) = P (Z_{1} \leq z_{1 - α}, Z_{2} > z_{1 - α}) = P (\frac{{\bar{x}}_{T} - {\bar{x}}_{C} + δ_{1}}{\sqrt{\frac{σ_{T}^{2} + σ_{Δ_{T}}^{2} + σ_{C}^{2} + σ_{Δ_{C}}^{2}}{n}}} \leq z_{1 - α}, \frac{r_{A_{T}} - r_{A_{C}} + δ_{2}}{\sqrt{\frac{r_{A_{T}} (1 - r_{A_{T}}) + r_{A_{C}} (1 - r_{A_{C}})}{n}}} > z_{1 - α}) .$ (21)
AC reject H₀, whereas PAC fail to reject H₀
$P ({AC reject H}_{0} {and PAC fail to reject H}_{0}) = P (Z_{1} > z_{1 - α}, Z_{2} \leq z_{1 - α}) = P (\frac{{\bar{x}}_{T} - {\bar{x}}_{C} + δ_{1}}{\sqrt{\frac{σ_{T}^{2} + σ_{Δ_{T}}^{2} + σ_{C}^{2} + σ_{Δ_{C}}^{2}}{n}}} > z_{1 - α}, \frac{r_{A_{T}} - r_{A_{C}} + δ_{2}}{\sqrt{\frac{r_{A_{T}} (1 - r_{A_{T}}) + r_{A_{C}} (1 - r_{A_{C}})}{n}}} \leq z_{1 - α}) .$ (22)
Both AC and PAC fail to reject H₀
$P ({AC fail to reject H}_{0} {and PAC fail to reject H}_{0}) = P (Z_{1} \leq z_{1 - α}, Z_{2} \leq z_{1 - α}) = P (\frac{{\bar{x}}_{T} - {\bar{x}}_{C} + δ_{1}}{\sqrt{\frac{σ_{T}^{2} + σ_{Δ_{T}}^{2} + σ_{C}^{2} + σ_{Δ_{C}}^{2}}{n}}} \leq z_{1 - α}, \frac{r_{A_{T}} - r_{A_{C}} + δ_{2}}{\sqrt{\frac{r_{A_{T}} (1 - r_{A_{T}}) + r_{A_{C}} (1 - r_{A_{C}})}{n}}} \leq z_{1 - α}) .$ (23)

3. Results

In this section, a numerical analysis using simulated data is conducted to investigate the difference between using absolute change as an endpoint and the corresponding responder analysis in terms of sample size requirement, statistical power, and non-inferiority test conclusion. Responses are assumed to follow a normal distribution. The allocation ratio is 1:1. The simulation is conducted 1000 times. Additionally, a case study is established to investigate the difference between a typical non-inferiority test and responder analysis using real clinical data from Byrd et al. (2022). Again, AC denotes typical non-inferiority test using absolute change as the endpoint, and PAC denotes the corresponding responder analysis. The significance level is 0.05, and the desired power is 0.80.

3.1. Numerical analysis

According to Equation (7), the sample size of AC is associated with the population mean, population variance, and the non-inferiority margin. Treatment group population mean is set to 0.2 and 0.3, control group population mean is set to 0, and population variance of both groups as 1.0, 2.0 and 3.0. Table 1 presents the required sample size of AC to achieve 80% statistical power. The sample size is associated with the effect size and the non-inferiority margin. When the effect size is fixed, a larger non-inferiority margin will lead to a smaller sample size in AC; when the non-inferiority margin is fixed, a larger effect size will lead to a smaller sample size in AC. Similarly, from Equation (10), the sample size of PAC is additionally related to the cut-off value (threshold) used to determine responders. As shown in Table 2, the impact of effect size and non-inferiority margin on sample size is the same as Table 1, when cut-off value is fixed. However, the impact of the cut-off value on the sample size calculation is quite complex, since its impact is associated with not only its absolute value but also the population mean and variance.

Table 1.

Sample sizes for non-inferiority test using absolute change endpoint (AC).

	μ_T + μ_{Δ_T} = 0.2									μ_T + μ_{Δ_T} = 0.3
$σ_{T}^{2} + σ_{Δ_{T}}^{2}$	1.0			2.0			3.0			1.0			2.0			3.0
$σ_{C}^{2} + σ_{Δ_{C}}^{2}$	1.0	2.0	3.0	1.0	2.0	3.0	1.0	2.0	3.0	1.0	2.0	3.0	1.0	2.0	3.0	1.0	2.0	3.0
δ₁ = 0.25	246	368	490	368	490	612	490	612	734	164	246	328	246	328	410	328	410	492
δ₁ = 0.30	198	298	396	298	396	496	396	496	594	138	208	276	208	276	344	276	344	414
δ₁ = 0.35	164	246	328	246	328	410	328	410	492	118	176	236	176	236	294	236	294	352
δ₁ = 0.40	138	208	276	208	276	344	276	344	414	102	152	202	152	202	254	202	254	304
δ₁ = 0.45	118	176	236	176	236	294	236	294	352	88	132	176	132	176	220	176	220	264
δ₁ = 0.50	102	152	202	152	202	254	202	254	304	78	116	156	116	156	194	156	194	232
δ₁ = 0.55	88	132	176	132	176	220	176	220	264	70	104	138	104	138	172	138	172	206
δ₁ = 0.60	78	116	156	116	156	194	156	194	232	62	92	124	92	124	154	124	154	184
δ₁ = 0.65	70	104	138	104	138	172	138	172	206	56	84	110	84	110	138	110	138	166
δ₁ = 0.70	62	92	124	92	124	154	124	154	184	50	76	100	76	100	124	100	124	150

Open in a new tab

Table 2.

Sample sizes for responder analysis using absolute change endpoint (PAC).

	μ_T + μ_{Δ_T} = 0.2									μ_T + μ_{Δ_T} = 0.3
	1.0			2.0			3.0			1.0			2.0			3.0
$σ_{C}^{2} + σ_{Δ_{C}}^{2}$	1.0	2.0	3.0	1.0	2.0	3.0	1.0	2.0	3.0	1.0	2.0	3.0	1.0	2.0	3.0	1.0	2.0	3.0
cut-off value = 0.1
δ₂ = 0.25	114	122	126	122	132	136	126	136	142	90	38	100	104	110	114	110	118	122
δ₂ = 0.30	86	92	94	92	98	100	94	100	104	70	96	76	80	84	86	84	88	92
δ₂ = 0.35	68	72	74	72	76	78	74	78	80	56	74	60	62	66	68	66	70	72
δ₂ = 0.40	54	58	58	58	60	62	58	62	64	46	60	50	50	54	54	54	56	56
δ₂ = 0.45	44	46	48	46	50	50	48	50	52	90	48	40	42	44	44	44	46	46
cut-off value = 0.2
δ₂ = 0.25	114	132	142	114	132	142	114	132	142	90	104	110	96	110	118	100	114	122
δ₂ = 0.30	86	98	104	86	98	104	86	98	104	70	80	84	74	84	88	76	86	92
δ₂ = 0.35	68	76	80	68	76	80	68	76	80	56	62	66	60	66	70	60	68	72
δ₂ = 0.40	54	60	62	54	60	62	54	60	62	46	50	54	48	54	56	50	54	56
δ₂ = 0.45	44	48	52	44	48	52	44	48	52	38	42	44	40	44	46	40	44	46
cut-off value = 0.3
δ₂ = 0.25	112	142	158	104	132	146	102	126	140	90	110	122	90	110	122	90	110	122
δ₂ = 0.30	84	104	114	80	98	106	78	94	104	70	84	92	70	84	92	70	84	92
δ₂ = 0.35	66	80	86	64	74	82	62	74	80	56	66	70	56	66	70	56	66	70
δ₂ = 0.40	54	62	68	52	60	64	50	58	62	46	54	56	46	54	56	46	54	56
δ₂ = 0.45	44	50	54	42	48	52	42	48	50	38	44	46	38	44	46	38	44	46
cut-off value = 0.4
δ₂ = 0.25	110	150	176	96	130	150	92	122	140	88	118	134	84	110	124	82	106	120
δ₂ = 0.30	84	108	124	74	96	108	70	90	102	68	88	100	66	82	94	64	80	90
δ₂ = 0.35	64	82	92	58	74	82	56	70	78	56	68	76	52	66	72	52	64	70
δ₂ = 0.40	52	64	72	48	58	64	46	56	62	46	56	60	44	52	58	42	52	56
δ₂ = 0.45	42	52	58	40	48	52	38	46	50	38	46	50	36	44	48	36	42	46

Open in a new tab

Comparing sample sizes in Table 1 and Table 2, we find that when the non-inferiority margin is fixed, the required sample size of AC is much larger than the one of PAC. One important assumption we used here is that the non-inferiority margins of two tests are the same. The reason for us to make this assumption is that many scholars have suggested to use responder analysis as a secondary analysis (Henschke et al., 2014), i.e., the sample size is computed based on the primary analysis. In other words, statistical analysis of a typical non-inferiority test and the responder analysis will be conducted using the same dataset. However, in practice, it is more likely that the non-inferiority margins in these two tests are different, since these two tests have different meanings. Therefore, while conducting responder analysis as the secondary analysis, it would be possible that we do not have enough power for this secondary analysis.

Next, we compare the statistical power of AC and PAC using Equation (6) and (9), when the sample size is fixed. In the simulation process, the sample size used to generate random samples is the minimal of all possible sample sizes, given the population mean and standard deviation. Here, the population mean of treatment and control group are 0.2 and 0, the population variance of treatment group is 2, the population variance of control group ranges from 1 to 3, and the cut-off value ranges from 0.1 to 0.8. To make the power comparable, we assume the non-inferiority margin of AC and PAC are the same. As shown in Figure 1, with a fixed sample size, the statistical power of AC is the smallest, suggesting it requires a larger sample size to achieve desired power than PAC. This finding is consistent with results in Table 1 and Table 2. Additionally, in PAC, the statistical power when using different cut-off values become closer to each other as the population variance increases. The statistical power of PAC is either slightly lower than 80% or over 80%, regardless of cut-off value. The statistical power of AC is always below 60%. Hence, if the researchers conduct a sample size calculation based on responder analysis but end up with using typical non-inferiority test, they will not be able to achieve enough statistical power.

Figure 1. — Statistical power comparison of non-inferiority test using absolute change as endpoint (AC) and corresponding responder analysis (PAC).

To illustrate the relationship among required sample sizes, we assume the non-inferiority margin of using absolute change endpoint and responder analysis are the same. The setting is the same as the one in Figure 1. In Figure 2, the ratio of PAC sample size to AC is used to represent the relationship between AC and PAC’s sample size, where N₁ denote the sample size of AC, and N₂ denote the sample size of PAC. Under the setting used here, N₂/N₁ is always smaller than 0.35, suggesting the sample size AC is much larger than the one of PAC. When the non-inferiority margin increases, the ratios with different cut-off values not only become smaller but also closer to each other. Comparing Figure 2 (A), (B) and (C), we find that the ratio of sample size decreases, and the sample size ratios with different cut-off value become closer to each other when the variance of control group increases.

Figure 2. — Sample size comparison of non-inferiority test using absolute change as endpoint (AC) and corresponding responder analysis (PAC).

Another essential parameter of interest in responder analysis is the cut-off value (threshold) to determine whether an observation is responder or not. Let the population mean of treatment group range from 0.10 to 0.30. To make the results comparable, the non-inferiority margin in AC and PAC are set as 0. The range of cut-off value is set larger than previously, which is from −3 to 3. The simulation process is randomly generated continuous samples from normal distribution at first, where the sample size is computed using AC’s sample size formula in Equation (7). Then using the cut-off value, we label each subject as either a responder or a non-responder. As shown in Figure 3, the cut-off value can indeed drive the conclusion in a different direction. In Figure 3 (A), a negative cut-off value will provide conflict results; in Figure 3 (B), a more extreme cut-off value will provide conflict results; the same findings are found in Figure 3 (C). Additionally, the influence of cut-off value on the hypothesis test result is related to the population mean and variance; however, the overall pattern is the similar. Hence, a more extreme cut-off value, i.e., a cut-off which is further away from the population mean, is more likely to lead to conflict conclusions.

Figure 3. — Non-inferiority test results comparison of typical test using absolute change as endpoint (AC) and corresponding responder analysis (PAC).

3.2. Case study

In Section 3.1, we study the impact of essential parameters on sample size requirement, statistical power, and test conclusions using simulated data. To have a clearer illustration of the impact of cut-off value on non-inferiority test results, we conduct a case study using real clinic data from an observational study about rehabilitation in lung transplant patients (Byrd et al., 2022). The primary aim of Byrd et al. (2022) is to compare the performance of individual rehabilitation to group rehabilitation in both pre-operative and post-operative participants, measured by primary outcome variable, change in 6-minute walk distance (6MWD). Detailed change in 6MWD information of pre-operative and post-operative patients are presented in Table 3.

Table 3.

Change of 6MWD of pre-operative and post-operative participants in Byrd et al. (2022).

	Pre-operative		Post-operative
Rehabilitation	Group	Individual	Group	Individual
Sample size	93	81	110	105
Mean (SD)	51.6 (81.3)	56.6 (62.9)	174 (97.6)	160 (89.4)
Median [Q1, Q3]	44.5 [6.40,102]	59.7 [25.0,93.9]	168 [106,232]	159 [104,208]

Open in a new tab

In this section, the non-inferiority test is used to study under what circumstances AC and PAC may lead to different conclusions. According to previous studies (Holland et al. 2014; Holland et al, 2017), a clinically meaningful change in 6MWD is between 25m and 33m. The cut-off value used in here ranges from 20m to 35 m to have a more comprehensive understanding about the impact of cut-off value selection on study conclusions. The non-inferiority margin of AC ranges from −0.3 to 0.3, and the non-inferiority margin of PAC ranges from 0 to 0.03. As shown in Figure 4, for pre-operative patients, some cut-off values may lead to different conclusions. For instance, in Figure 4 (A), a cut-off value larger than 27 will yield conflicting results. However, for post-operative patients, if the cut-off value is between 20 and 35, both AC and PAC will always give consistent results. Having a closer look at the data, we find that for most post-operative patients, their change in 6MWD either extremely large (larger than 35) or extremely small (smaller than 20). In other words, in this scenario, an extreme cut-off value (ranging from 20 to 35) cannot significantly impact the proportion of responders in post-operative patients. It suggests that cut-off value selection may cause responder analysis and typical non-inferiority test to provide conflicting findings only under certain circumstances.

Figure 4. — Non-inferiority test results comparison of typical test using absolute change as endpoint (AC) and corresponding responder analysis (PAC) in Rehabilitation Program in Lung Transplant Study (Byrd et al, 2022).

As we mentioned in Section 1, a responder analysis answers a different question from the typical non-inferiority test. Specifically, if we pick an extreme cut-off value, responder analysis investigates whether the test treatment could bring substantially clinical benefit to patients. For example, in Figure 4 (A), AC gives an insignificant conclusion, i.e., individual rehabilitation is inferior to group rehabilitation, whereas PAC gives significant conclusion when the cut-off value is large. It suggests that individual rehabilitation is non-inferior to group rehabilitation only for a small proportion of patients and benefits them to have great improvement. In other words, the large cut off value allows us to focus on a smaller proportion of patients who had a substantial improvement; this difference may not be detectable in typical non-inferiority tests, yielding conflicting findings.

4. Discussion

One of the most important steps of any clinical trial is to determine the primary study endpoint, which may influence the process of establishing hypotheses, selecting statistics models, calculating sample size etc. Generally speaking, there are four types of study endpoints: (i) absolute change, (ii) relative change, (iii) responder analysis using absolute change, and (iv) responder analysis using relative change. This paper focuses on the comparison of endpoint (i) and (iii) in non-inferiority test in terms of sample size requirement, statistical power and whether different endpoints may lead to different conclusions, as example to illustrate how to compare different study endpoints. The comparison process in this study can also be generalized to compare any two study endpoints mentioned above.

In the numerical study section, both simulation study and case study using data in Byrd et al (2022) are conducted. According to the simulation study, the required sample size of a non-inferiority test using absolute change endpoint (AC) is associated with the population mean and variance of treatment and control group and the non-inferiority margin. The sample size of the corresponding responder analysis (PAC) is additionally related to the cut-off value used to determine responders. Fixing all parameters, we find that PAC requires a smaller sample size compared to AC. In other words, when the sample size is the same, PAC will always have a larger statistical power than AC, as shown in Figure 1. When the desired statistical power is the same, the sample size ratio of PAC to AC is always smaller than 1, which is also related to the non-inferiority margin and cut-off value. However, the impact of these two parameters decreases as the population variance increases. As the cut-off value becomes more extreme, the likelihood of obtaining conflicting conclusions from a non-inferiority hypothesis test increases. This was seen both in the simulation study and the case study. We find that the cut-off value selection is of great importance, which may cause conflicting results when the mean and median in treatment and control groups are closer to the cut-off value.

Without loss of generalizability, similar conclusions could be found in superiority and equivalence test. The fundamental reason for typical non-inferiority/superiority/equivalence test using absolute change as endpoint and corresponding responder analysis provide conflict conclusion is the distribution for the target population. If the samples follow normal distribution, it is very likely that typical test and responder analysis give the same conclusion, when the cut-off value is close to the population mean. Otherwise, these two types of analysis would provide conflict results, especially when the cut-off value is further away from the population mean.

Due to the great importance of cut-off value selection and the possibility of obtaining conflict conclusions, we suggest determining a cut-off value using domain knowledge in combination with statistics of the collected sample, while conducting responder analysis. Though clinically important difference (MCID) is always used as the cut-off value (Jones et al., 2016), some literatures have proposed some guidance or approaches on cut-off value selection (Farrar et al., 2006; Harrell, 2017). Additionally, since the sample size requirement of AC and PAC are different, it is necessary to check whether the sample size is large enough to achieve the desired statistical power. It should be noted that, not only may the typical test and responder analysis require different sample sizes, yield different power, and result in different study conclusions, but a test using absolute instead of relative change as study endpoints are prone to the same challenges. Some studies reported that absolute and relative change endpoints may lead to conflict conclusions (Chow, 2011; Curran-Everett and Williams, 2015). In addition, these endpoints are viewed differently by by drug approval administrations. According to the non-inferiority test guidance from the US Food and Drug Administration (FDA), constancy assumption of a study has been expected to based on constancy of relative effects, not absolute effects. (FDA, 2016). However, European Medicines Agency’s (EMA) guidance on non-inferiority test used absolute difference to illustrate instructions on non-inferiority test (EMA 2005). Hence, it is possible that one drug approved by the FDA may not be approved by EMA or wise versa, since the required sample size and statistical power of using absolute change and relative change as study endpoint are different (Chow, 2011). Hence, it would be useful to further provide the confidence interval of cut-off values, where the typical non-inferiority test and responder analysis may lead to consistent conclusions, and investigate under what circumstance both absolute and relative change endpoint will provide the same non-inferiority test results.

Acknowledgements

This research is in part supported by Grant Number UL1TR002553 from the National Center for Advancing Translational Sciences (NCATS) of the National Institutes of Health (NIH), and NIH Roadmap for Medical Research. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of NCATS or NIH.

Footnotes

Conflict of interest

There is no conflict of interest.

References

[1].Byrd R, Breslin R, Wang P, Peskoe S, Chow SC, Lowers S, Snyder LD, Pastva AM: Group versus Individual Rehabilitation in Lung Transplantation: A Retrospective Non-Inferiority Assessment. 2022. [Manuscript submitted for publication]. [Google Scholar]
[2].Chow SC and Song F: On Controversial Statistical Issues in Clinical Research. Open Access Journal of Clinical Trials 2015, 7, 43–51. [Google Scholar]
[3].Chow CS: Controversial statistical issues in clinical trials. Boca Raton, FL, USA: CRC Press; 2011: 135–147. [Google Scholar]
[4].Curran-Everett D and Williams CL: Explorations in Statistics: the Analysis of Change. Advances in physiology education 2015, 39(2), 49–54. [DOI] [PubMed] [Google Scholar]
[5].EMA. Guideline on the Choice of the Non-inferiority Margin; 2015. [Google Scholar]
[6].Farrar JT, Dworkin RH, and Max MB: Use of the Cumulative Proportion of Responders Analysis Graph to Present Pain Data over a Range of Cut-Off Points: Making Clinical Trial Data More Understandable. Journal of pain and symptom management 2006, 31(4), 369–377. [DOI] [PubMed] [Google Scholar]
[7].FDA. Non-Inferiority Clinical Trials to Establish Effectiveness; 2016. [Google Scholar]
[8].Gilbert C, Brown MC, Cappelleri JC, Carlsson M, and McKenna SP: Estimating a Minimally Important Difference in Pulmonary Arterial Hypertension Following Treatment with Sildenafil. Chest 2009, 135(1), 137–142. [DOI] [PubMed] [Google Scholar]
[9].Harrell FE: Regression Modeling Strategies. Springer International Publishing. [Google Scholar]
[10].Henschke N, van Enst A, Froud R and WG Ostelo R: Responder Analyses in Randomised Controlled Trials for Chronic Low Back Pain: An Overview of Currently Used Methods. European Spine Journal, 2014, 23(4), 772–778. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].Holland AE, Spruit MA, Troosters T, Puhan MA, Pepin V, Saey D, … and Singh SJ: An Official European Respiratory Society/American Thoracic Society Technical Standard: Field Walking Tests in Chronic Respiratory Disease. European Respiratory Journal 2014, 44(6), 1428–1446. [DOI] [PubMed] [Google Scholar]
[12].Holland AE, Mahal A, Hill CJ, Lee AL, Burge AT, Cox NS, … and McDonald CF: Home-Based Rehabilitation for COPD Using Minimal Resources: A Randomised, Controlled Equivalence Trial. Thorax 2017, 72(1), 57–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
[13].Jones PW, Rennard S, Tabberer M, Riley JH, Vahdati-Bolouri M and Barnes NC: Interpreting Patient-Reported Outcomes from Clinical Trials in COPD: A Discussion. International Journal of Chronic Obstructive Pulmonary Disease 2016, 11, 3069. [DOI] [PMC free article] [PubMed] [Google Scholar]
[14].Munro PE, Holland AE, Bailey M, Button BM, and Snell GI: Pulmonary Rehabilitation Following Lung Transplantation. Transplantation proceedings 2019, 41(1), 292–295. [DOI] [PubMed] [Google Scholar]
[15].Martinu T, Babyak MA, O’Connell CF, Carney RM, Trulock EP, Davis RD, … and INSPIRE Investigators: Baseline 6-Min Walk Distance Predicts Survival in Lung Transplant Candidates. American Journal of Transplantation 2008, 8(7), 1498–1505. [DOI] [PMC free article] [PubMed] [Google Scholar]
[16].Ryerson CJ, Cayou C, Topp F, Hilling L, Camp PG, Wilcox PG, … and Garvey C: Pulmonary Rehabilitation Improves Long-Term Outcomes In Interstitial Lung Disease: a Prospective Cohort Study. Respiratory medicine 2014, 108(1), 203–210. [DOI] [PubMed] [Google Scholar]
[17].Snapinn SM, and Qi J: Responder Analyses and the Assessment of a Clinically Relevant Treatment Effect. Trials 2007, 8(1). 1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
[18].Stoilkova-Hartmann A, Janssen DJ, Franssen FM, and Wouters EF: Differences in Change in Coping Styles between Good Responders, Moderate Responders and Non-Responders to Pulmonary Rehabilitation. Respiratory medicine 2015, 109(12), 1540–1545. [DOI] [PubMed] [Google Scholar]
[19].Tuppin MP, Paratz JD, Chang AT, Seale HE, Walsh JR, Kermeeen FD, … and Hopkins PM: Predictive Utility of the 6-Minute Walk Distance on Survival in Patients Awaiting Lung Transplantation. The Journal of heart and lung transplantation 2008, 27(7), 729–734. [DOI] [PubMed] [Google Scholar]
[20].Vickers AJ: The Use of Percentage Change from Baseline as an Outcome in a Controlled Trial is Statistically Inefficient: a Simulation Study. BMC medical research methodology 2001, 1(1), 1–4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] [1].Byrd R, Breslin R, Wang P, Peskoe S, Chow SC, Lowers S, Snyder LD, Pastva AM: Group versus Individual Rehabilitation in Lung Transplantation: A Retrospective Non-Inferiority Assessment. 2022. [Manuscript submitted for publication]. [Google Scholar]

[R2] [2].Chow SC and Song F: On Controversial Statistical Issues in Clinical Research. Open Access Journal of Clinical Trials 2015, 7, 43–51. [Google Scholar]

[R3] [3].Chow CS: Controversial statistical issues in clinical trials. Boca Raton, FL, USA: CRC Press; 2011: 135–147. [Google Scholar]

[R4] [4].Curran-Everett D and Williams CL: Explorations in Statistics: the Analysis of Change. Advances in physiology education 2015, 39(2), 49–54. [DOI] [PubMed] [Google Scholar]

[R5] [5].EMA. Guideline on the Choice of the Non-inferiority Margin; 2015. [Google Scholar]

[R6] [6].Farrar JT, Dworkin RH, and Max MB: Use of the Cumulative Proportion of Responders Analysis Graph to Present Pain Data over a Range of Cut-Off Points: Making Clinical Trial Data More Understandable. Journal of pain and symptom management 2006, 31(4), 369–377. [DOI] [PubMed] [Google Scholar]

[R7] [7].FDA. Non-Inferiority Clinical Trials to Establish Effectiveness; 2016. [Google Scholar]

[R8] [8].Gilbert C, Brown MC, Cappelleri JC, Carlsson M, and McKenna SP: Estimating a Minimally Important Difference in Pulmonary Arterial Hypertension Following Treatment with Sildenafil. Chest 2009, 135(1), 137–142. [DOI] [PubMed] [Google Scholar]

[R9] [9].Harrell FE: Regression Modeling Strategies. Springer International Publishing. [Google Scholar]

[R10] [10].Henschke N, van Enst A, Froud R and WG Ostelo R: Responder Analyses in Randomised Controlled Trials for Chronic Low Back Pain: An Overview of Currently Used Methods. European Spine Journal, 2014, 23(4), 772–778. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] [11].Holland AE, Spruit MA, Troosters T, Puhan MA, Pepin V, Saey D, … and Singh SJ: An Official European Respiratory Society/American Thoracic Society Technical Standard: Field Walking Tests in Chronic Respiratory Disease. European Respiratory Journal 2014, 44(6), 1428–1446. [DOI] [PubMed] [Google Scholar]

[R12] [12].Holland AE, Mahal A, Hill CJ, Lee AL, Burge AT, Cox NS, … and McDonald CF: Home-Based Rehabilitation for COPD Using Minimal Resources: A Randomised, Controlled Equivalence Trial. Thorax 2017, 72(1), 57–65. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] [13].Jones PW, Rennard S, Tabberer M, Riley JH, Vahdati-Bolouri M and Barnes NC: Interpreting Patient-Reported Outcomes from Clinical Trials in COPD: A Discussion. International Journal of Chronic Obstructive Pulmonary Disease 2016, 11, 3069. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] [14].Munro PE, Holland AE, Bailey M, Button BM, and Snell GI: Pulmonary Rehabilitation Following Lung Transplantation. Transplantation proceedings 2019, 41(1), 292–295. [DOI] [PubMed] [Google Scholar]

[R15] [15].Martinu T, Babyak MA, O’Connell CF, Carney RM, Trulock EP, Davis RD, … and INSPIRE Investigators: Baseline 6-Min Walk Distance Predicts Survival in Lung Transplant Candidates. American Journal of Transplantation 2008, 8(7), 1498–1505. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] [16].Ryerson CJ, Cayou C, Topp F, Hilling L, Camp PG, Wilcox PG, … and Garvey C: Pulmonary Rehabilitation Improves Long-Term Outcomes In Interstitial Lung Disease: a Prospective Cohort Study. Respiratory medicine 2014, 108(1), 203–210. [DOI] [PubMed] [Google Scholar]

[R17] [17].Snapinn SM, and Qi J: Responder Analyses and the Assessment of a Clinically Relevant Treatment Effect. Trials 2007, 8(1). 1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] [18].Stoilkova-Hartmann A, Janssen DJ, Franssen FM, and Wouters EF: Differences in Change in Coping Styles between Good Responders, Moderate Responders and Non-Responders to Pulmonary Rehabilitation. Respiratory medicine 2015, 109(12), 1540–1545. [DOI] [PubMed] [Google Scholar]

[R19] [19].Tuppin MP, Paratz JD, Chang AT, Seale HE, Walsh JR, Kermeeen FD, … and Hopkins PM: Predictive Utility of the 6-Minute Walk Distance on Survival in Patients Awaiting Lung Transplantation. The Journal of heart and lung transplantation 2008, 27(7), 729–734. [DOI] [PubMed] [Google Scholar]

[R20] [20].Vickers AJ: The Use of Percentage Change from Baseline as an Outcome in a Controlled Trial is Statistically Inefficient: a Simulation Study. BMC medical research methodology 2001, 1(1), 1–4. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Statistical Evaluation of Absolute Change versus Responder Analysis in Clinical Trials

Peijin Wang

Sarah Peskoe

Rebecca Byrd

Patrick Smith

Rachel Breslin

Shein-Chung Chow

Abstract

Graphical Abstract

1. Introduction