Abstract
Background
Recently, three randomized clinical trials on coronavirus disease (COVID-19) treatments were completed: one for lopinavir-ritonavir and two for remdesivir. One trial reported that remdesivir was superior to placebo in shortening the time to recovery, while the other two showed no benefit of the treatment under investigation.
Objective
The aim of this paper is to, from a statistical perspective, identify several key issues in the design and analysis of three COVID-19 trials and reanalyze the data from the cumulative incidence curves in the three trials using more appropriate statistical methods.
Methods
The lopinavir-ritonavir trial enrolled 39 additional patients due to insignificant results after the sample size reached the planned number, which led to inflation of the type I error rate. The remdesivir trial of Wang et al failed to reach the planned sample size due to a lack of eligible patients, and the bootstrap method was used to predict the quantity of clinical interest conditionally and unconditionally if the trial had continued to reach the originally planned sample size. Moreover, we used a terminal (or cure) rate model and a model-free metric known as the restricted mean survival time or the restricted mean time to improvement (RMTI) to analyze the reconstructed data. The remdesivir trial of Beigel et al reported the median recovery time of the remdesivir and placebo groups, and the rate ratio for recovery, while both quantities depend on a particular time point representing local information. We use the restricted mean time to recovery (RMTR) as a global and robust measure for efficacy.
Results
For the lopinavir-ritonavir trial, with the increase of sample size from 160 to 199, the type I error rate was inflated from 0.05 to 0.071. The difference of RMTIs between the two groups evaluated at day 28 was –1.67 days (95% CI –3.62 to 0.28; P=.09) in favor of lopinavir-ritonavir but not statistically significant. For the remdesivir trial of Wang et al, the difference of RMTIs at day 28 was –0.89 days (95% CI –2.84 to 1.06; P=.37). The planned sample size was 453, yet only 236 patients were enrolled. The conditional prediction shows that the hazard ratio estimates would reach statistical significance if the target sample size had been maintained. For the remdesivir trial of Beigel et al, the difference of RMTRs between the remdesivir and placebo groups at day 30 was –2.7 days (95% CI –4.0 to –1.2; P<.001), confirming the superiority of remdesivir. The difference in the recovery time at the 25th percentile (95% CI –3 to 0; P=.65) was insignificant, while the differences became more statistically significant at larger percentiles.
Conclusions
Based on the statistical issues and lessons learned from the recent three clinical trials on COVID-19 treatments, we suggest more appropriate approaches for the design and analysis of ongoing and future COVID-19 trials.
Keywords: coronavirus, COVID-19, cure rate model, sample size adjustment, terminal event, type I error rate, restricted mean survival time
Introduction
Background
The novel coronavirus disease (COVID-19) has spread all over the world at an unprecedented rate since its outbreak in December 2019. More than 200 countries or territories have confirmed cases, and over 8.4 million individuals have been infected, leading to more than 45,0000 deaths as of June 18, 2020. COVID-19 was declared a Public Health Emergency of International Concern by the World Health Organization (WHO) on January 30 and declared a pandemic on March 11, 2020.
As recommended by the WHO R&D Blueprint expert group, clinical improvements for patients with COVID-19 can be classified in a seven-category ordinal scale [1]:
Not hospitalized with resumption of normal activities
Not hospitalized, but unable to resume normal activities
Hospitalized, not requiring supplemental oxygen
Hospitalized, requiring supplemental oxygen
Hospitalized, requiring nasal high-flow oxygen therapy, noninvasive mechanical ventilation, or both
Hospitalized, requiring extracorporeal membrane oxygenation, invasive mechanical ventilation, or both
Death
So far, there are only eight clinical trials for COVID-19 completed with results published. Among them, two trials were for hydroxychloroquine with relatively small sample sizes (30 patients for the trial of Chen et al [2] and 36 patients for the trial of Gautret et al [3]). Although the trial conducted by Gautret et al [3] yielded a significant result, the sample size was too small to draw any convincing conclusion. The trial of Cai et al [4] compared favipiravir and lopinavir-ritonavir with a total sample size of 80 patients, leading to a significant result (P=.004). Chen et al [5] conducted a trial comparing favipiravir with arbidol, which had a total sample size of 240 patients and yielded an insignificant result. The trial of Grein et al [6] was a single-arm trial for remdesivir, and the estimated clinical improvement rate at day 18 was 0.68. To determine the efficacy of Lianhuaqingwen (LHQW) capsule, a compounded Chinese herb medicine, Hu et al [7] conducted an open-label randomized controlled trial and reported a statistically significant difference in the symptom (fever, fatigue, coughing) recovery rate between the treatment group and the control group (91.5% vs 82.4%; P=.022). However, the trial did not include a placebo in the control group to implement a double-blinding scheme. Despite the urgency nature of the pandemic, their argument for unblinding due to ethical reasons seems to be unsound. Due to the conscious and subconscious psychological tendencies of humans including both clinicians and patients, bias often arises in an open-label study. Not only does unblinding lead to potential selection bias, but it may also cause placebo effects for patients who took LHQW [8-11], which thus shed doubts on the clinical benefits of LHQW. In particular, the rate of symptom recovery is related to disease relief or symptomatic manifestations such as fever, fatigue, and coughing (“soft” end points), for which placebo effects are known to be strong and more discernible [10]. However, the LHQW and control groups did not differ in the rate of conversion to severe cases or viral assay findings (“hard” end points), for which placebo effects are less perceptible because generally placebos can neither alter the pathophysiology of the disease nor cure it. We take the three randomized clinical trials conducted by Cao et al [12] on lopinavir-ritonavir and by Wang et al [13] and Beigel et al [14] on remdesivir as examples to illustrate statistical issues and lessons learned, as they have drawn great attention in the clinical community.
Lopinavir-Ritonavir Trial
The Lopinavir Trial for Suppression of Severe Acute Respiratory Syndrome Coronavirus 2 in China [12] was conducted with record speed from January 18 to February 3, 2020 (the date of enrollment of the last patient). Patient recruitment up to a planned sample size is often the bottle neck of trial conduct. This was not the case with severe COVID-19 due to the abundance of hospitalized patients during that period of time. In this trial, eligible patients were randomized at a 1:1 ratio to either the lopinavir-ritonavir treatment group (400 mg and 100 mg orally, twice daily) plus the standard care or the standard care alone for 14 days. No placebo was used for blinding because no placebo was prepared due to the urgency of the trial; therefore, both patients and investigators were aware of the treatment identity each patient received. Following the WHO seven-ordinal scale [1], the primary end point adopted by the trial [12] was the time to clinical improvement, which was defined as the time from randomization to an improvement of two points from the status at randomization (eg, from point 6 to point 4 or from point 5 to point 3) or live discharge from the hospital, whichever came first. The sample size was increased from 160 to 199 since the result with the enrolled 160 patients did not reach statistical significance. As a final conclusion, Cao et al [12] reported no benefit with the lopinavir-ritonavir treatment beyond the standard care with a hazard ratio (HR) of 1.24 and the associated 95% CI 0.90-1.72.
Remdesivir Trial 1
Wang et al [13] conducted a randomized, double-blind, placebo-controlled, multicenter trial with remdesivir at ten hospitals in Hubei, China. Overall, 236 patients were enrolled from February 6 to March 12, 2020, and were randomly assigned to the remdesivir group (200 mg on day 1 followed by 100 mg on days 2-10) and the placebo group at a 2:1 ratio. In the original design, the trial planned to recruit 453 patients with 302 to remdesivir and 151 to placebo, but no patients were enrolled after March 12 due to no eligible patients being available in the Hubei Province. As a consequence, the statistical power of the study was reduced from 80% to 58%. The primary clinical end point was the time to improvement within 28 days. Clinical improvement was defined as a two-point improvement from an adjusted six-category ordinal scale from the WHO seven-category ordinal scale. In conclusion, remdesivir did not show statistically significant clinical benefit compared with the placebo in terms of the HR 1.23 (95% CI 0.87-1.75).
Remdesivir Trial 2
Beigel et al [14] reported a randomized, double-blind, placebo-controlled trial of intravenous remdesivir in adults hospitalized with COVID-19 and evidence of lower respiratory tract infection. This trial had a total sample size of 1059 patients (538 assigned to remdesivir and 521 to placebo). The median recovery time of the remdesivir group was 11 (95% CI 9-12) days and 15 (95% CI 13-19) days for the placebo group. The rate ratio for recovery was 1.32 (95% CI 0.47-1.04; P<.001), which was statistically significant in favor of remdesivir. The Kaplan-Meier estimates of mortality at 14 days were 7.1% with remdesivir and 11.9% with the placebo, and the HR for death was 0.70 (95% CI 0.47-1.04). Remdesivir was shown to be superior to the placebo in shortening the time to recovery in adults hospitalized with COVID-19, and, in terms of the HR for death, there was no significant difference between the two groups.
So far, only one treatment, remdesivir, has been shown to be effective by a randomized clinical trial, but the other remdesivir trial failed to demonstrate its superiority over the placebo. As the pandemic of COVID-19 will not be controlled anytime soon, the aforementioned three clinical trials [12-14] provide extremely valuable information on the treatments of COVID-19 and the corresponding trial design and analysis. However, several important issues have been identified in the statistical analysis, design, and implementation of the three trials. We point out the statistical problems that arose in the three trials [12-14] and reanalyze the data from the cumulative incidence curves for the time to improvement or recovery using more appropriate approaches. Our in-depth and comprehensive analyses yield new insights on the design and analysis for ongoing and future COVID-19 clinical trials.
Methods
Inflation of the Type I Error
The log-rank test [15] is the most commonly used method in survival analysis and clinical trial design to compare the survival benefit of two arms. Consider a randomized clinical trial with a planned sample size N1 using a two-sided log-rank test. If the hypothesis test indicates no significant survival difference between the two groups under the significance level α but the trial decides to continue to enroll more patients up to a larger sample size N2, this would inflate the overall type I error of the trial. Any adjustment to the sample size during the trial should be planned and evaluated in advance to maintain the overall type I error rate.
Let Z1 and Z2 denote the log-rank test statistics with sample sizes N1 and N2, respectively. It holds that under the null hypothesis [16,17] Z1 and Z2 jointly follow a multivariate normal distribution:
(1) |
D1 = dN1 and D2 = dN2 are the expected numbers of events with sample sizes N1 and N2, and d is the proportion of patients experiencing the event. Thus, the overall type I error rate α overall with the significance level α is:
(2) |
is the (1 – )th quantile of the standard normal distribution.
Terminal (or Cure) Rate Model
For clinical studies with a survival end point, we are interested in the distribution of event time T. In general, patients will eventually experience the event with a long enough follow-up; although, the exact event time might not be observed due to censoring. However, for some diseases with long-term survivors, it may happen that the event will never occur in a fraction of subjects (ie, the event time for cured subjects is infinity [18-21]). Under this situation, patients can be divided into two groups: the terminal (or cure) group (the specified event would never occur) and the nonterminal group (the specified event would occur but possibly censored due to the end time of the study). Thus, the distribution of the event time T has a point probability mass η at ∞:
T = (1 –η)T* + η∞ (3) |
η is the group label taking a value of 1 if the individual is in the terminal group and 0 otherwise; γ = P(η = 1) = P(T = ∞) is the terminal rate and T* follows a proper distribution with P(T* < ∞) = 1. For the COVID-19 trials [12,13], the cumulative incidence curve of T can be expressed by
(4) |
FT and FT* are the cumulative distribution functions of T and T*, respectively. Note that P(T < ∞) = 1 – γ < 1.
Restricted Mean Survival Time
Restricted mean survival time (RMST) [16,22-26] is an alternative measure for the mean survival time that is not estimable due to the presence of censoring. The RMST is equal to the expectation of the minimum value of event time T and the specified time point τ, which can be calculated as the area under the survival curve from 0 to τ. It can be estimated by the area under the Kaplan-Meier survival curve, which has gained enormous popularity due to its robustness feature.
Although the HR is the most popular statistic to quantify the survival difference in randomized clinical trials, it is no longer an interpretable quantity if the proportional hazards (PH) assumption is violated [25]. By contrast, the RMST has the advantages of being nonparametric and model-free yet carrying clinically meaningful interpretations. Given the prespecified time point τ, the estimate of the RMST difference between two groups can be interpreted as the extra survival gain on average during the time τ follow-up period.
Predicted Trial Outcome With Sample Size Projection
Clinical trials during the epidemic of an infectious disease might fail to reach the planned sample size due to a lack of eligible patients if the outbreak can be quickly controlled [27]. However, early termination of a clinical trial would inevitably lead to loss of power and thus unconvincing findings. Based on the collected data, the bootstrap method can be used to predict what would happen if the trial had continued to reach the desired sample size. Let N denote the desired sample size and N0 (N0 < N) the actual number of patients enrolled. The statistic of interest prediction can be conducted under either conditional or unconditional schemes. The unconditional prediction draws N samples (sampling with replacement from the original data with N0 observations), while the conditional prediction draws N – N0 samples from the original N0 observations and keeps the original N0 samples intact. By repeating the sampling procedure for a large number of times, one can estimate the predicted mean and the corresponding confidence interval for the statistic of interest if the trial had continued to reach the sample size of N.
Results
Lopinavir-Ritonavir Trial of Cao et al
In the original analysis of Cao et al [12], the time to clinical improvement was assessed after all patients had reached day 28, and failure to reach clinical improvement or death before day 28 were considered as right-censored at day 28. In contrast to the usual survival analysis where death (or a bad event such as disease progression) is used as the event of interest, a good event (clinical improvement) was adopted as the end point in this trial. As a result, the shorter time to reach clinical improvement, the better. Cao et al [12] concluded no benefit of using the lopinavir-ritonavir treatment beyond the standard care with an HR of 1.24 (95% CI 0.90-1.72).
We carried out an in-depth and comprehensive investigation of the trial design in Cao et al [12] and identified several key issues with the trial that might have hindered its success. First, the unplanned sample size increment from 160 to 199 would inflate the type I error rate. For this trial, we have N1=160, N2=199, d=0.75, D1 = 160 × 0.75 = 120, D2 = 199 × 0.75 = 149.25, and based on equation 2, αoverall=.071 when the nominal significance level is set as α=.05. That is, the false-positive rate for this trial increased as high as 7.1% in contrast to the nominal level of 5%. Any sample size alteration or re-estimation should be planned in advance to control the type I error rate and maintain the integrity of a trial. When the sample size reached 199, the trial was halted for enrollment because of the availability of another treatment, remdesivir. Such termination of a trial was again unplanned and immature; if there were not another agent available, would the trial continue recruitment? Interestingly, the remdesivir trial by Wang et al [13] (the same group of investigators as the lopinavir-ritonavir trial) started 3 days later after the lopinavir-ritonavir trial was terminated.
In terms of the primary end point, clinical improvement using two-level increment on a seven-category ordinal scale from baseline is ad hoc due to uneven clinical differences between adjacent scales. For example, it is ambiguous whether the status of a patient changing from point 5 to point 3 is equivalent to that of changing from point 6 to point 4. In addition, live discharge from the hospital may occur from point 3 to point 2 or point 4 to point 2, which cannot be considered equivalent either. Thus, choosing 2-point improvement on the clinical outcome scale is not a precise end point, which ignores the 1-point improvement and the difference between 2-point and 3-point improvement. Instead, we recommend death as a single and clean end point for such trials, given the mortality rate was not low with patients who were hospitalized with severe COVID-19 (19.2% in the lopinavir-ritonavir group and 25.0% in the standard care group).
The original analysis [12] treated death before day 28 as right-censored at day 28, no matter when death had occurred. This may cause ambiguity because it cannot distinguish the situations where all deaths in one group occurred earlier while those in the other group occurred later. As death is a terminal event, a terminal (or cure) rate model would be more appropriate for analysis of such data. A terminal rate model can be viewed as the counterpart of the traditional mixture cure rate model [18-21], which can be developed by slight modifications. As death is a terminal event, patients who died during the 28-day follow-up period would never reach the clinical improvement (ie, the time to clinical improvement was infinity) denoted as ∞. Death can also be viewed as a competing risk for clinical improvement.
The upper panel of Table 1 shows that there was neither any significant difference in the terminal rates between the lopinavir-ritonavir and standard care groups or in the HR (after excluding the terminal subjects who would eventually be absorbed in the death state) from the mixture terminal rate model. In particular, the terminal rates (including observed deaths as well as unobserved deaths that would occur after day 28 but were censored at day 28) were 21.17% for the lopinavir-ritonavir group and 29.91% for the standard care group with P=.16, and the HR for nonterminal subjects was 1.05 (95% CI 0.78-1.42; P=.74).
Table 1.
Terminal rate modelb | Lopinavir-ritonavir | Standard care | Difference | P value | Hazard ratio (95% CI) | P value | |
Terminal rate, % (95% CI) | 21.17 (15.77-28.42) | 29.91 (4.40-36.66) | –8.74 (–21.04 to 3.55) | .16 | 1.05 (0.78-1.42) | .74 | |
RMTIc (95% CI) | |||||||
|
Day 7 | 6.91 (6.79-7.00) | 6.98 (6.94-7.00) | –0.07 (–0.19 to 0.05) | .26 | N/Ad | N/A |
|
Day 14 | 12.58 (12.11-13.04) | 13.25 (12.92-13.58) | –0.67 (–1.24 to –0.11) | .02 | N/A | N/A |
|
Day 28 | 17.19 (15.78-18.60) | 18.86 (17.51-20.21) | –1.67 (3.62 to 0.28) | .09 | N/A | N/A |
aCumulative incidence curves were extracted and reconstructed from the second figure in Cao et al [12] using the “digitize” package [28] in R software (R Foundation for Statistical Computing).
bThe mixture terminal rate model was performed using the “smcure” package.
cThe RMTI (restricted mean time to improvement) was estimated by calculating the area above the cumulative incidence curve using the “survRM2” package.
dNot applicable.
Moreover, the crossings of the cumulative event curves for the lopinavir-ritonavir and standard care groups at days 10 and 16 in the second figure of Cao et al [12] imply possible violation of the PH assumption. When the PH assumption is not satisfied, the HR from a Cox model [29] is not clinically meaningful. As an alternative, the area above the curve in the second figure of Cao et al [12] or the area under the inverted curve as shown in our Figure 1, referred to as the restricted mean time to improvement (RMTI), can be used to quantify treatment effect that requires no assumption such as PH [16,22-26]. As a model-free quantity, the RMTI up to 28 days can be interpreted as the average time to reach improvement in 28 days, for which the shorter is the better. The 28-day RMTI difference between the two groups was 1.67 days (95% CI –3.62 to 0.28; P=.09) in favor of lopinavir-ritonavir but not statistically significant. The 7-day and 14-day RMTIs are also presented in the lower panel of Table 1, where the 14-day RMTI showed some promising results for lopinavir-ritonavir, yet further confirmation is needed.
Tables 2 and 3 show the numbers on mortality and clinical improvement by day 28 across the two treatment groups, respectively. We carried out chi-square tests (or Fisher exact tests if some of the cell counts were smaller than 5) to examine any association between the outcomes and treatments. For Table 2 with 2×3 cells, there is no association with P=.53, and if combining deaths in both earlier and later stages, this leads to 2×2 cells with P=.32 and odds ratio 0.71 (95% CI 0.36-1.40). Patients treated with lopinavir-ritonavir had 0.71 times odds to die by day 28 in comparison to those in the standard care group. For Table 3 with 2×4 cells, there is no association with P=.11, and if combining all clinical improvement cases, this leads to 2×2 cells with P=.53 and odds ratio 1.24 (95% CI 0.64-2.40). Patients treated with lopinavir-ritonavir had 1.24 times odds to achieve clinical improvement by day 28 in comparison to those in the standard care group. However, none of the results are statistically significant.
Table 2.
Treatment | Deaths | Survivors, n | ||
|
Earlier, n | Later, n |
|
|
Lopinavir-ritonavir | 8 | 11 | 80 | |
Standard care | 13 | 12 | 75 |
Table 3.
Treatment | Clinical Improvement | No improvement, n | |||
|
Days 1-7, n | Days 8-14, n | Days 15-28, n |
|
|
Lopinavir-ritonavir | 6 | 39 | 33 | 22 | |
Standard care | 2 | 28 | 40 | 30 |
Remdesivir Trial of Wang et al
Wang et al [13] reported a randomized, double-blind, placebo-controlled remdesivir trial for patients with severe COVID-19. Based on an adjusted six-point ordinal scale of clinical status, the primary end point was the time to clinical improvement, defined as a 2-level decline from randomization (similar to that in Cao et al [12]; in fact, the two trials were conducted by the same group of investigators), for which the shorter is the better. Patients were permitted concomitant use of lopinavir-ritonavir, interferons, and corticosteroids. The HR between the remdesivir and placebo groups was 1.23 (95% CI 0.87-1.75), indicating no significant difference. Overall, 237 eligible patients were enrolled, with 158 patients assigned to the remdesivir group and 78 patients to the placebo group under the intent-to-treat (ITT) scheme. The trial was stopped early and thus failed to reach the designated sample size 453 due to a lack of eligible patients.
Similar to the trial by Cao et al [12], deaths before day 28 were treated as right-censored observations at day 28, regardless of the actual occurrence time of deaths in Wang et al [13]. Moreover, a clinical improvement might not be observed due to death (ie, death is a terminal event), and thus, the terminal or cure rate model introduced earlier should be recommended for the survival analysis rather than the standard Cox model.
The upper panel of Table 4 indicates no significant difference in the terminal rates between the remdesivir and placebo groups. In particular, the terminal rates were 31.49% for the remdesivir group and 40.71% for the placebo group with P=.19. With the terminal subjects excluded, the HR from the mixture terminal rate model was 0.92 (95% CI 0.63-1.35; P=.67), which also showed no significant difference between the two groups.
Table 4.
Terminal rate model | Remdesivir | Placebo | Difference | P Value | Hazard ratio (95% CI) | P value | |
Terminal rate, % (95% CI) | 0.31 (0.27-0.37) | 0.41 (0.32-0.51) | –9.22 (–22.9 to 4.45) | .19 | 0.92 (0.63-1.35) | .67 | |
RMTIa | |||||||
|
Day 7 | 6.95 (6.90-7.00) | 6.97 (6.92-7.00) | –0.03 (–0.10 to 0.05) | .49 | N/Ab | N/A |
|
Day 14 | 13.09 (12.78-13.40) | 13.29 (12.92-13.67) | –0.20 (–0.69 to 0.29) | .42 | N/A | N/A |
|
Day 28 | 20.42 (19.26-21.57) | 21.31 (19.73-22.88) | –0.89 (–2.84 to 1.06) | .37 | N/A | N/A |
aRMTI: restricted mean time to improvement.
bNot applicable.
Due to the competing risk from death, the end point might not be observed, and thus, the standard hazard concept is ambiguous, and the HR does not have a meaningful interpretation anymore [30]. In the second figure in Wang et al [13], the curve for the cumulative improvement event of remdesivir is uniformly higher than that of the control, indicating patients with remdesivir reached improvement faster than those in the control group. The area above the cumulative incidence curve or, equivalently, the area under the survival curve up to 28 days in our Figure 2 would be a reasonable quantity for evaluating the treatment efficacy. Using the reconstructed data from the second figure in Wang et al [13], the RMTI evaluated at day 28 was 20.42 (95% CI 19.26-21.57) days for the remdesivir group and 21.31 (95% CI 19.73-22.88) days for the placebo group. As shown in the lower panel of Table 4, the difference in RMTIs was –0.89 days (95% CI –2.84 to 1.06), numerically favoring remdesivir but not statistically significant. It can be interpreted that patients treated by remdesivir on average had an extra 0.89 days of improvement during the 28-day follow-up compared with those in the placebo group. The 7-day and 14-day RMTIs are also presented in the lower panel of Table 4, and neither showed statistically significant results.
The trial was terminated without reaching the originally planned sample size, 453, due to a lack of eligible patients. With only 236 patients in the ITT analysis, the estimated HR was 1.23 (95% CI 0.87-1.75), numerically favoring remdesivir, which might not be reliable due to the underpowered study. Using the bootstrap method, we can predict what would happen if the trial had continued to reach the full sample size or double the planned sample size. Table 5 shows both the unconditional and conditional predictions of the HR, similar to sample size re-estimation using conditional power [31] in a two-stage design. If the trial could have reached the designated sample size, the HR from the conditional prediction shows the significant treatment effect of remdesivir with P=.02, and if the trial had enrolled twice of the target sample size, both conditional and unconditional approaches result in significant differences under the 5% significance level. Thus, a larger sample size may be needed to show the significant difference between remdesivir and placebo.
Table 5.
Sample size | Sample size in each arm | Unconditional prediction | Conditional prediction | |||
|
Remdesivir, n | Placebo, n | HRa (95% CI) | P value | HR (95% CI) | P value |
Actual | 158 | 78 | 1.23 (0.87-1.75) | .24 | N/Ab | N/A |
Target | 302 | 151 | 1.24 (0.96-1.60) | .10 | 1.24 (1.03-1.48) | .02 |
Target×2 | 604 | 302 | 1.24 (1.03-1.48) | .02 | 1.24 (1.06-1.44) | .01 |
aHR: hazard ratio.
bNot applicable.
Remdesivir Trial of Beigel et al
Beigel et al [14] presented a preliminary report of the NCT04280705 trial, which is a randomized, double-blind, placebo-controlled trial of intravenous remdesivir in adults hospitalized with COVID-19 and evidence of lower respiratory tract involvement. This trial enrolled 1059 patients (538 assigned to remdesivir and 521 to placebo). The primary end point of the original analysis was the recovery time, defined by either discharge from the hospital or hospitalization for infection-control purposes only. The median recovery time of the remdesivir group was 11 (95% CI 9-12) days and that of the placebo group was 15 (95% CI 13-19) days. The rate ratio of recovery for remdesivir vs placebo was 1.32 (95% CI 1.12-1.55; P<.001), which demonstrated the superiority of remdesivir. In terms of the HR for death, there was no significant difference between the remdesivir and placebo groups with an HR of 0.70 (95% CI 0.47-1.04).
The remdesivir trial of Beigel et al [14] is essential to evaluate the efficacy of remdesivir, as it had a large sample size of 1059 patients under a well-designed randomized controlled trial scheme. In terms of the data analysis, Beigel et al [14] only reported the median recovery time without a P value. From the second figure in Beigel et al [14], the Kaplan-Meier curves of cumulative recoveries are initially intertwined and then diverge, so other percentiles of the time to recovery would provide more information on the efficacy of remdesivir. Meanwhile, a global and robust measurement, the restricted mean time to recovery (RMTR), can help to quantify the treatment efficacy in a more comprehensive way [16,22-26].
The upper panel of Table 6 presents the RMTRs up to day 30 for both the remdesivir and placebo groups. The RMTRs were 14.5 days and 17.2 days for remdesivir and placebo, respectively, indicating that patients with remdesivir on average had 2.7-day gains of recovery with 30-day follow-ups. The difference in RMTRs was statistically significant with P<.001, demonstrating the superiority of remdesivir. This is consistent with the original analysis in terms of the rate ratio of recovery [14]. Meanwhile in the bottom panel of Table 6, more percentiles of the time to recovery were reported with P values. The early difference for remdesivir vs placebo in the recovery time at the 25th percentile was –1 (95% CI –3 to 0; P=.65), which was not statistically significant. However, the differences manifested to be statistically significant later; for example, the 30th to 60th percentiles of the recovery time in the remdesivir group were all significantly shorter than those in the placebo group. It is reasonable for the treatment to take effect after a certain length of follow-up.
Table 6.
Statistical measure | Remdesivir | Placebo | Difference (95% CI) | P value | |
RMTRa (up to day 30) | 14.5 (13.6-15.5) | 17.2 (16.1-18.2) | –2.7 (–4.0 to –1.2) | <.001 | |
Percentiles of the time to recovery (95% CI) | |||||
|
25th | 5 (4-5) | 6 (6-7) | –1 (–3 to 0) | .65 |
|
30th | 6 (5-6) | 8 (7-9) | –2 (–4 to –1) | .002 |
|
40th | 8 (7-9) | 11 (9-13) | –3 (–5 to –1) | .007 |
|
50th (median) | 11 (9-12) | 15 (13-19) | –4 (–9 to –2) | .01 |
|
60th | 15 (13-19) | 22 (20-27) | –7 (–12 to –3) | .004 |
aRMTR: restricted mean time to recovery.
Discussion
When designing and conducting a clinical trial for new treatment, particularly for the COVID-19 pandemic without knowing much about the clinical outcomes, many things can go wrong if the design is not well thought out, the trial is not carefully conducted following the protocol, or the analysis is not properly carried out. Critical issues with such trials include but are not limited to the end point selection, the type I error rate control, double blinding or open label, early termination of a trial, the validity of the PH assumption in a Cox model, and assumptions for statistical tests and models. In contrast to searching for a needle in a haystack, the trial design should be more targeted, focused, and tailored for specific needs of patients with COVID-19 and particular disease characteristics and severities [32].
Given the emergency and the fast spread of the coronavirus around the world, it is crucial to design the right clinical trial and accelerate the development of a new treatment. With the high speed of enrollment and urgency of the trial outcome, it appears to be difficult to carry out any adaptation during the trial conduct. The trial outcomes unfold so fast that any adaptation may not be able to catch up with the speed of recruitment.
As a summary, our recommendations for COVID-19 trials are:
Adopt death as a single end point for patients hospitalized with severe COVID-19 or live discharge from the hospital for patients with moderately severe COVID-19
Conduct the gold standard trial scheme: a randomized, double-blind, controlled trial with equal randomization; 1:2 or 1:3 allocation ratio for control vs treatment
With multiple agents tested in one trial, allow the trial to drop certain treatment due to futility or toxicity
Adopt the RMST as the metric to quantify the treatment effect when the PH assumption is not satisfied; otherwise, standard approaches using the HRs and log-rank tests should be used
Control the type I error rate: Any sample size alternation during the trial must be planned and evaluated in advance with a strict control of the false-positive rate.
ITT analysis (or its modified version) is recommended for the final analysis.
Although adaptive design has gained much popularity and is playing an increasingly important role in clinical trials, particularly in oncology, the advantages of adaptive design may be mitigated to a large extent under such a fast patient enrollment because the impact of any adaptation may be too slow to manifest before the trial is completed. In such cases, the CONSORT (Consolidated Standards of Reporting Trials) statement [33,34] can provide a general guideline for the trial design and conduct. As a result, our recommendations follow the gold standard scheme of conventional trial design without much adaptation ingredient, which may help investigators to discriminate different treatments and identify the effective ones in an efficient way.
Acknowledgments
We would like to thank the referees, associate editor, and editor for their helpful comments that greatly improved the paper. The research was supported by a grant No 17307318 for GY from the Research Grants Council of Hong Kong.
Abbreviations
- CONSORT
Consolidated Standards of Reporting Trials
- COVID-19
coronavirus disease
- HR
hazard ratio
- ITT
intent-to-treat
- PH
proportional hazards
- RMST
restricted mean survival time
- RMTI
restricted mean time to improvement
- RMTR
restricted mean time to recovery
- WHO
World Health Organization
Footnotes
Conflicts of Interest: None declared.
References
- 1.R&D Blueprint and COVID-19. World Health Organization. 2020. http://www.who.int/blueprint/priority-diseases/key-action/novel-coronavirus/en/
- 2.Chen J, Liu D, Liu L, et al A pilot study of hydroxychloroquine in treatment of patients with common coronavirus disease-19 (COVID-19) J Zhejiang Univ (Med Sci) 2020 Mar 06;49(2):215–219. doi: 10.3785/j.issn.1008-9292.2020.03.03. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Gautret P, Lagier J, Parola P, Hoang VT, Meddeb L, Mailhe M, Doudier B, Courjon J, Giordanengo V, Vieira VE, Dupont HT, Honoré S, Colson P, Chabrière E, La Scola B, Rolain J, Brouqui P, Raoult D. Hydroxychloroquine and azithromycin as a treatment of COVID-19: results of an open-label non-randomized clinical trial. Int J Antimicrob Agents. 2020 Mar 20;:105949. doi: 10.1016/j.ijantimicag.2020.105949. http://europepmc.org/abstract/MED/32205204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Cai Q, Yang M, Liu D, Chen J, Shu D, Xia J, Liao X, Gu Y, Cai Q, Yang Y, Shen C, Li X, Peng L, Huang D, Zhang J, Zhang S, Wang F, Liu J, Chen L, Chen S, Wang Z, Zhang Z, Cao R, Zhong W, Liu Y, Liu L. Experimental treatment with favipiravir for COVID-19: an open-label control study. Engineering (Beijing) 2020 Mar 18; doi: 10.1016/j.eng.2020.03.007. http://europepmc.org/abstract/MED/32346491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Chen C, Huang J, Cheng Z, et al Favipiravir versus arbidol for COVID-19: a randomized clinical trial. medRxiv. 2020 Apr 15; doi: 10.1101/2020.03.17.20037432. [DOI] [Google Scholar]
- 6.Grein J, Ohmagari N, Shin D, Diaz G, Asperges E, Castagna A, Feldt T, Green G, Green ML, Lescure F, Nicastri E, Oda R, Yo K, Quiros-Roldan E, Studemeister A, Redinski J, Ahmed S, Bernett J, Chelliah D, Chen D, Chihara S, Cohen SH, Cunningham J, D’Arminio Monforte A, Ismail S, Kato H, Lapadula G, L’Her E, Maeno T, Majumder S, Massari M, Mora-Rillo M, Mutoh Y, Nguyen D, Verweij E, Zoufaly A, Osinusi AO, DeZure A, Zhao Y, Zhong L, Chokkalingam A, Elboudwarej E, Telep L, Timbs L, Henne I, Sellers S, Cao H, Tan SK, Winterbourne L, Desai P, Mera R, Gaggar A, Myers RP, Brainard DM, Childs R, Flanigan T. Compassionate use of remdesivir for patients with severe covid-19. N Engl J Med. 2020 Jun 11;382(24):2327–2336. doi: 10.1056/nejmoa2007016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Hu Ke, Guan Wei-Jie, Bi Ying, Zhang Wei, Li Lanjuan, Zhang Boli, Liu Qingquan, Song Yuanlin, Li Xingwang, Duan Zhongping, Zheng Qingshan, Yang Zifeng, Liang Jingyi, Han Mingfeng, Ruan Lianguo, Wu Chaomin, Zhang Yunting, Jia Zhen-Hua, Zhong Nan-Shan. Efficacy and safety of Lianhuaqingwen capsules, a repurposed Chinese herb, in patients with coronavirus disease 2019: A multicenter, prospective, randomized controlled trial. Phytomedicine. 2020 May 16;:153242. doi: 10.1016/j.phymed.2020.153242. http://europepmc.org/abstract/MED/32425361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Brody Howard, Miller Franklin G. Lessons from recent research about the placebo effect--from art to science. JAMA. 2011 Dec 21;306(23):2612–3. doi: 10.1001/jama.2011.1850. [DOI] [PubMed] [Google Scholar]
- 9.Wartolowska Karolina, Judge Andrew, Hopewell Sally, Collins Gary S, Dean Benjamin J F, Rombach Ines, Brindley David, Savulescu Julian, Beard David J, Carr Andrew J. Use of placebo controls in the evaluation of surgery: systematic review. BMJ. 2014 May 21;348:g3253. doi: 10.1136/bmj.g3253. http://www.bmj.com/cgi/pmidlookup?view=long&pmid=24850821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Kaptchuk Ted J, Miller Franklin G. Placebo Effects in Medicine. N Engl J Med. 2015 Jul 02;373(1):8–9. doi: 10.1056/NEJMp1504023. [DOI] [PubMed] [Google Scholar]
- 11.Colloca Luana, Barsky Arthur J. Placebo and Nocebo Effects. N Engl J Med. 2020 Mar 06;382(6):554–561. doi: 10.1056/NEJMra1907805. [DOI] [PubMed] [Google Scholar]
- 12.Cao B, Wang Y, Wen D, Liu W, Wang J, Fan G, Ruan L, Song B, Cai Y, Wei M, Li X, Xia J, Chen N, Xiang J, Yu T, Bai T, Xie X, Zhang L, Li C, Yuan Y, Chen H, Li H, Huang H, Tu S, Gong F, Liu Y, Wei Y, Dong C, Zhou F, Gu X, Xu J, Liu Z, Zhang Y, Li H, Shang L, Wang K, Li K, Zhou X, Dong X, Qu Z, Lu S, Hu X, Ruan S, Luo S, Wu J, Peng L, Cheng F, Pan L, Zou J, Jia C, Wang J, Liu X, Wang S, Wu X, Ge Q, He J, Zhan H, Qiu F, Guo L, Huang C, Jaki T, Hayden FG, Horby PW, Zhang D, Wang C. A trial of lopinavir–ritonavir in adults hospitalized with severe covid-19. N Engl J Med. 2020 May 07;382(19):1787–1799. doi: 10.1056/nejmoa2001282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Wang Y, Zhang D, Du G, Du R, Zhao J, Jin Y, Fu S, Gao L, Cheng Z, Lu Q, Hu Y, Luo G, Wang K, Lu Y, Li H, Wang S, Ruan S, Yang C, Mei C, Wang Y, Ding D, Wu F, Tang X, Ye X, Ye Y, Liu B, Yang J, Yin W, Wang A, Fan G, Zhou F, Liu Z, Gu X, Xu J, Shang L, Zhang Y, Cao L, Guo T, Wan Y, Qin H, Jiang Y, Jaki T, Hayden FG, Horby PW, Cao B, Wang C. Remdesivir in adults with severe COVID-19: a randomised, double-blind, placebo-controlled, multicentre trial. Lancet. 2020 May;395(10236):1569–1578. doi: 10.1016/s0140-6736(20)31022-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Beigel JH, Tomashek KM, Dodd LE, Mehta AK, Zingman BS, Kalil AC, Hohmann E, Chu HY, Luetkemeyer A, Kline S, Lopez de Castilla D, Finberg RW, Dierberg K, Tapson V, Hsieh L, Patterson TF, Paredes R, Sweeney DA, Short WR, Touloumi G, Lye DC, Ohmagari N, Oh M, Ruiz-Palacios GM, Benfield T, Fätkenheuer G, Kortepeter MG, Atmar RL, Creech CB, Lundgren J, Babiker AG, Pett S, Neaton JD, Burgess TH, Bonnett T, Green M, Makowski M, Osinusi A, Nayak S, Lane HC, ACTT-1 Study Group Members Remdesivir for the treatment of covid-19 - preliminary report. N Engl J Med. 2020 May 22; doi: 10.1056/NEJMoa2007764. http://europepmc.org/abstract/MED/32445440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Peto R, Peto J. Asymptotically efficient rank invariant test procedures. J R Stat Soc A. 1972;135(2):185. doi: 10.2307/2344317. [DOI] [Google Scholar]
- 16.Yin G. Clinical Trial Design: Bayesian and Frequentist Adaptive Methods. New York: John Wiley & Sons; 2012. [Google Scholar]
- 17.Mantel N. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemother Rep. 1966 Mar;50(3):163–70. [PubMed] [Google Scholar]
- 18.Berkson J, Gage RP. Survival curve for cancer patients following treatment. J Am Stat Assoc. 1952 Sep;47(259):501–515. doi: 10.1080/01621459.1952.10501187. [DOI] [Google Scholar]
- 19.Farewell VT. Mixture models in survival analysis: are they worth the risk? Can J Stat. 1986 Sep;14(3):257–262. doi: 10.2307/3314804. [DOI] [Google Scholar]
- 20.Bejan-Angoulvant T, Bouvier A, Bossard N, Belot A, Jooste V, Launoy G, Remontet L. Hazard regression model and cure rate model in colon cancer relative survival trends: are they telling the same story? Eur J Epidemiol. 2008 Feb 9;23(4):251–259. doi: 10.1007/s10654-008-9226-6. [DOI] [PubMed] [Google Scholar]
- 21.Yin G, Ibrahim JG. Cure rate models: a unified approach. Can J Stat. 2005 Dec;33(4):559–570. doi: 10.1002/cjs.5550330407. [DOI] [Google Scholar]
- 22.Yuan Y, Yin G. Bayesian dose finding by jointly modelling toxicity and efficacy as time‐to‐event outcomes. J R Stat Soc C. 2009 Dec;58(5):719–736. doi: 10.1111/j.1467-9876.2009.00674.x. [DOI] [Google Scholar]
- 23.Uno H, Claggett B, Tian L, Inoue E, Gallo P, Miyata T, Schrag D, Takeuchi M, Uyama Y, Zhao L, Skali H, Solomon S, Jacobus S, Hughes M, Packer M, Wei L. Moving beyond the hazard ratio in quantifying the between-group difference in survival analysis. JCO. 2014 Aug 01;32(22):2380–2385. doi: 10.1200/jco.2014.55.2208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Zhao L, Claggett B, Tian L, Uno H, Pfeffer MA, Solomon SD, Trippa L, Wei LJ. On the restricted mean survival time curve in survival analysis. Biometrics. 2015 Aug 24;72(1):215–221. doi: 10.1111/biom.12384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Tian L, Fu H, Ruberg SJ, Uno H, Wei L. Efficiency of two sample tests via the restricted mean survival time for analyzing event time observations. Biometrics. 2018 Jun;74(2):694–702. doi: 10.1111/biom.12770. http://europepmc.org/abstract/MED/28901017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Royston P, Parmar MK. Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome. BMC Med Res Methodol. 2013 Dec 07;13:152. doi: 10.1186/1471-2288-13-152. https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-13-152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Kahn R, Rid A, Smith PG, Eyal N, Lipsitch M. Choices in vaccine trial design in epidemics of emerging infections. PLoS Med. 2018 Aug 7;15(8):e1002632. doi: 10.1371/journal.pmed.1002632. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Guyot P, Ades A, Ouwens MJ, Welton NJ. Enhanced secondary analysis of survival data: reconstructing the data from published Kaplan-Meier survival curves. BMC Med Res Methodol. 2012 Feb 01;12:9. doi: 10.1186/1471-2288-12-9. https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-12-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Cox DR. Regression models and life-tables. J R Stat Soc B. 2018 Dec 05;34(2):187–202. doi: 10.1111/j.2517-6161.1972.tb00899.x. [DOI] [Google Scholar]
- 30.Zhao L, Tian L, Claggett B, Pfeffer M, Kim DH, Solomon S, Wei L. Estimating treatment effect with clinical interpretation from a comparative clinical trial with an end point subject to competing risks. JAMA Cardiol. 2018 Apr 01;3(4):357–358. doi: 10.1001/jamacardio.2018.0127. http://europepmc.org/abstract/MED/29541747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Proschan MA, Hunsberger SA. Designed extension of studies based on conditional power. Biometrics. 1995 Dec;51(4):1315. doi: 10.2307/2533262. [DOI] [PubMed] [Google Scholar]
- 32.Kalil AC. Treating COVID-19-off-label drug use, compassionate use, and randomized clinical trials during pandemics. JAMA. 2020 Mar 24; doi: 10.1001/jama.2020.4742. [DOI] [PubMed] [Google Scholar]
- 33.Moher D. The CONSORT Statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. JAMA. 2001 Apr 18;285(15):1987. doi: 10.1001/jama.285.15.1987. [DOI] [PubMed] [Google Scholar]
- 34.Schulz K, Altman D, Moher D, CONSORT Group CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials. Trials. 2010 Mar 24;11:32. doi: 10.1186/1745-6215-11-32. https://trialsjournal.biomedcentral.com/articles/10.1186/1745-6215-11-32. [DOI] [PMC free article] [PubMed] [Google Scholar]