Skip to main content
American Journal of Respiratory and Critical Care Medicine logoLink to American Journal of Respiratory and Critical Care Medicine
editorial
. 2021 Dec 15;205(3):267–269. doi: 10.1164/rccm.202111-2593ED

Making Sense of Phase II Trials for Investigational Agents in COVID-19: The Case of Ilomedin in Mechanically Ventilated Patients

Fernando G Zampieri 1, Adit Ginde 2
PMCID: PMC8887006  PMID: 34910598

Feasibility trials are underappreciated, although they are an essential part of building a solid evidence-based practice (1). Such studies are designed to test important aspects of future larger clinical trials, including plausible inclusion rate, logistic procedures (drug supply and data collection), site monitoring, and early safety in the population of interest, among others. By their own nature and comparatively small sample size, feasibility trials can be challenging to interpret for preliminary efficacy results. A “positive” finding from a small trial can be overinflated (“winners curse” [2] or publication bias), whereas a “neutral” result is frequently only the reflex of low power. Reckless interpretation of data leads to euphoria or deception, and the former inevitably leads to the latter.

In this issue of the Journal, Johansson and colleagues (pp. 324–329) present the results of a well-conducted feasibility/phase II trial that excels at being exactly what it was designed to be. The authors randomized 80 patients with coronavirus disease (COVID-19) on mechanical ventilation and with high (>4 ng/ml) thrombomodulin to receive intravenous prostacyclin or placebo (3). The trial’s rationale is based on the premise that prostacyclin could attenuate endotheliopathy, mainly through local vasodilatation and platelet adhesion inhibition; therefore, the use of a serum thrombomodulin threshold for inclusion is a clever predictive enrichment strategy used by the authors. The primary endpoint was days alive and free of mechanical ventilation, an endpoint that is patient-centered, may maximize power owing to greater granularity than binary or ordinal endpoints, and is aligned with the expected mechanism of action of the drug. The study was powered to detect a 20% increase in days free of mechanical ventilation (from 10 to 12, assuming an SD of 3), which is a bold assumption. The trial succeeded in demonstrating that the intervention can be feasibly given to mechanically ventilated patients with COVID-19, with apparently no exceedingly major safety issues, although therapy for five patients was discontinued without a clear reason described in the manuscript. Point estimates favored the intervention, with the results for the sequential organ failure assessment score passing through the heralded P < 0.05 threshold, though without adjustment for multiple comparisons.

These clinical trials are challenging to conduct, especially during COVID-19, and the authors should be commended for pursuing this biologically driven approach to identify a new treatment for critical COVID-19, an area that remains remarkably lacking in effective treatment options. It should also be noticed, however, that several issues challenge proper conclusions from the results. The report lacks a clear definition and reporting on mechanical ventilation–free days, including how deaths were considered in its calculation and a lack of presentation of duration of mechanical ventilation among survivors (4). There are issues with hypothesis testing with inferential statistics for secondary endpoints (especially mortality) that can be misleading to readers. The authors grossly underestimated the anticipated variability for the primary endpoint—although the observed difference in the primary endpoint was well higher than predicated (11 days, compared with 2 days in the sample size calculation), the trial was still dramatically underpowered for this massive effect size due primarily to the SD being three- to fourfold higher than predicted. Most secondary endpoints are implicitly composite with mortality (days alive without X), but without reporting the treatment effect in survivors, which leads to an overcounting of mortality across most of these endpoints. Also, in critical care studies, how mortality is handled is critically important for many physiological and clinical secondary endpoints that have high potential for survivor biases, in this case sequential organ failure assessment scores. Using survivor average causal effects is one way to mitigate, estimating the effect of the intervention in patients who would have survived both groups (5). Finally, the lack of biomarker endpoints such as markers of endotheliopathy and lung injury represents a missed opportunity to understand the underlying biological effect, which can be critically helpful in these smaller phase IIa trials

Clinicians may have difficulties in making sense of results of small clinical trials. Much of the confusion arises from wrong interpretation of P values and an underappreciation of issues that arise from small samples and secondary endpoints. Publication bias, or the higher likelihood of journals to accept “positive” appearing clinical trials instead of null results, further increases the likelihood of readers seeing such studies and creates a greater potential for type 1 error where promising phase II trials are often not confirmed in larger trials. The trial by Johansson and colleagues presents exciting data for primary and secondary endpoints; there is an important numerical difference for days alive and free of mechanical ventilation and mortality (the difference in mortality, 9 of 41 [21.9%] vs. 17 of 39 [43.6%], which may render P values <0.05 depending on the method used for analysis [6]). These results by no means are definitive, as acknowledged by the authors. A word of caution is needed when interpreting “positive” results, such as may be the case for mortality in this trial. At such a low sample size, a favorable result would be associated with an important magnification of the effect size (magnification—Type M—error, shown in Figure 1A, based on Reference 2). If date is confronted with regularizing information under a Bayesian framework that makes the model skeptic to effect sizes (in odds ratio scale) <0.5 or >2.0 (7), the overall probability of benefit remains <0.90 (Figure 1B).

Figure 1.


Figure 1.

(A) Design plot assuming standard difference between groups of 0.102. For different values of the possible difference in mortality (x-axis), the magnification effect is presented. Unless the actual difference in mortality is higher than a 4% result, the trial sample size would have magnified the effect size by a factor >1.5. (B) Trial results for hospital mortality under a Bayesian framework, assuming a neutral moderate strength skeptical prior for the intervention. The overall probability of benefit is (left of the dashed line) ∼88%, which warrants further investigation.

The authors should be commended for running this trial in such challenging times, the substance of the biological plausibility, and the willingness to provide further data on an important topic. Methodological issues from the trial, especially on analyses, preclude more conclusive findings. Although neither safety nor efficacy can be clearly concluded from a trial with such a small sample size and interpretation should be approached with appropriate skepticism, the necessity of a larger trial, perhaps on acute respiratory distress syndrome lato sensu, is fully justified by present data.

Footnotes

Originally Published in Press as DOI: 10.1164/rccm.202111-2593ED on December 15, 2021

Author disclosures are available with the text of this article at www.atsjournals.org.

References

  • 1.Bowen DJ, Kreuter M, Spring B, Cofta-Woerpel L, Linnan L, Weiner D, et al. How we design feasibility studies. Am J Prev Med. 2009;36:452–457. doi: 10.1016/j.amepre.2009.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Gelman A, Carlin J. Beyond power calculations: assessing Type S (sign) and Type M (magnitude) errors. Perspect Psychol Sci . 2014;9:641–651. doi: 10.1177/1745691614551642. [DOI] [PubMed] [Google Scholar]
  • 3. Johansson PI, Søe-Jensen P, Bestle MH, Clausen NE, Kristiansen KT, Lange T, et al. Prostacyclin in intubated patients with COVID-19 and severe endotheliopathy: a multicenter, randomized clinical trial. Am J Respir Crit Care Med . 2022;205:324–329. doi: 10.1164/rccm.202108-1855OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Yehya N, Harhay MO, Curley MAQ, Schoenfeld DA, Reeder RW. Reappraisal of ventilator-free days in critical care research. Am J Respir Crit Care Med . 2019;200:828–836. doi: 10.1164/rccm.201810-2050CP. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Hayden D, Pauler DK, Schoenfeld D. An estimator for treatment comparisons among survivors in randomized trials. Biometrics . 2005;61:305–310. doi: 10.1111/j.0006-341X.2005.030227.x. [DOI] [PubMed] [Google Scholar]
  • 6.Graffelman J, Moreno V. The mid p-value in exact tests for Hardy-Weinberg equilibrium. Stat Appl Genet Mol Biol. 2013;12:433–448. doi: 10.1515/sagmb-2012-0039. [DOI] [PubMed] [Google Scholar]
  • 7. Zampieri FG, Casey JD, Shankar-Hari M, Harrell FE, Jr, Harhay MO. Using Bayesian methods to augment the interpretation of critical care trials. An overview of theory and example reanalysis of the alveolar recruitment for acute respiratory distress syndrome trial. Am J Respir Crit Care Med . 2021;203:543–552. doi: 10.1164/rccm.202006-2381CP. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from American Journal of Respiratory and Critical Care Medicine are provided here courtesy of American Thoracic Society

RESOURCES