Abstract
Comparing median outcomes to gauge treatment effectiveness is widespread practice in clinical and other investigations. While common, such difference-in-median characterizations of effectiveness are but one way to summarize how outcome distributions compare. This paper explores properties of median treatment effects as indicators of treatment effectiveness. The paper's main focus is on decisionmaking based on median treatment effects and it proceeds by considering two paths a decisionmaker might follow. Along one, decisions are based on point-identified differences in medians alongside partially identified median differences; along the other decisions are based on point-identified differences in medians in conjunction with other point-identified parameters. On both paths familiar difference-in-median measures play some role yet in both the traditional standards are augmented with information that will often be relevant in assessing treatments' effectiveness. Implementing either approach is straightforward. In addition to its analytical results the paper considers several policy contexts in which such considerations arise. While the paper is framed by recently reported findings on treatments for COVID-19 and uses several such studies to explore empirically some properties of median-treatment-effect measures of effectiveness, its results should be broadly applicable.
Keywords: median treatment effects, treatment effectiveness, COVID-19
We might be anywhere but are in one place only…
Derek Mahon, A Garage in Co. Cork
1. Introduction
In a study published online on March 18, 2020, comparing lopinavir–ritonavir treatment for COVID-19 with standard care, Cao et al., 2020, report:
Patients assigned to lopinavir–ritonavir did not have a time to clinical improvement different from that of patients assigned to standard care alone in the intention-to-treat population (median, 16 days vs. 16 days; hazard ratio for clinical improvement, 1.31; 95% confidence interval [CI], 0.95 to 1.85; P=0.09)…
On April 29, 2020, the U.S. National Institute on Allergy and Infectious Diseases (NIAID, 2020) announced in a news release summarizing a separate study:
Hospitalized patients with advanced COVID-19 and lung involvement who received remdesivir recovered faster than similar patients who received placebo.… Specifically, the median time to recovery was 11 days for patients treated with remdesivir compared with 15 days for those who received placebo.1
On the basis of that study the U.S. Food and Drug Administration (FDA) two days later issued an Emergency Use Authorization for remdesivir (U.S. FDA, 2020), noting:
While there is limited information known about the safety and effectiveness of using remdesivir to treat people in the hospital with COVID-19, the investigational drug was shown in a clinical trial to shorten the time to recovery in some patients.
The preliminary report of this study, the Adaptive Covid-19 Treatment Trial or ACTT Study (Beigel et al., 2020a), was subsequently published online on May 22, 2020.
In a third study, published online May 8, 2020, Hung et al., 2020, report on a trial comparing combination therapy for COVID-19 with lopinavir–ritonavir alone. The authors note:
The combination group had a significantly shorter median time from start of study treatment to negative nasopharyngeal swab (7 days [IQR 5–11]) than the control group (12 days [8–15]; hazard ratio 4·37 [95% CI 1·86–10·24], p=0·0010).
Apart from their focus on treating COVID-19, a common feature of these three studies is that they summarize their respective primary endpoints as medians of each arm's time-to-event (TTE) distribution, and then judge treatments' efficacy or effectiveness by the difference between those medians.2 Such median-based comparisons are common in clinical research, so common perhaps that they and the information they convey may be taken for granted.
Much is at stake on the manner in which the effectiveness of candidate COVID-19 treatments and vaccines is assessed, using clinical trial data or otherwise. Loading this burden onto two parameters—median outcomes under treatment and comparator—is a high-stakes gamble. Consider for instance figure 1 whose top panel depicts by the two indicated points the results reported in the NIAID news release (NIAID, 2020); until the publication of the Beigel study over three weeks later this was the only information about the study's effectiveness known by the public. (The bottom panel of figure 1 reproduces the results reported by Beigel.) Is this information alone sufficient for a decisionmaker to conclude that remdesivir is effective since it will "shorten the time to recovery in some patients"? Analogously does the finding that median outcomes do not differ in the Cao study mean that decisionmakers ought not prefer one treatment (see figure 2)? Or might in both instances regulators, clinicians, patients, and others wish to know more about the distributions of outcomes before arriving at such conclusions?
Figure 1 —
Time-to-Recovery Outcomes in the Beigel et al., 2020, ACTT Remdesivir Study: Outcomes reported in NIAID, 2020 (top); Reproduction of Beigel's figure 2A (bottom)
Figure 2 —
Time-to-Improvement Outcomes in the Cao et al., 2020, Lopinavir–Ritonavir Study (Reproduction of Cao's figure 2)
The perhaps-obvious points are that there are many ways subject-level outcome data can be summarized or aggregated over a population or sample3 and that these alternatives may not yield the same conclusion about treatments' effectiveness, nor should they. Understanding the nature, merits, and shortcomings of the methods chosen and how those methods stack up against alternatives should thus be instructive for COVID-19 decisionmaking and beyond.
Numerous comparisons of candidate COVID-19 vaccines and treatments are being contemplated, studied, and reported. It is thus timely to assess the properties and merits of treatment effect (TE) measures defined by distributions' medians—like those featuring in the studies described above—as summaries of treatments' effectiveness: In a nutshell, what might decisionmakers learn about treatments' effectiveness by comparing two distributions' medians?4
The paper argues that a reasonable answer to this question is: "Probably something but often little." To learn more about treatments' effectiveness the paper suggests two paths decisionmakers might follow. Along path 1 decisions are based on point-identified differences in median outcomes—as is customary—in conjunction with medians of the distribution of differences in outcomes, i.e. the treatment effect distribution (section 2). The latter medians are typically partially identified since the distribution of differences may not be observable.5 Along path 2 decisions are again based on point-identified differences in medians in conjunction with additional point-identified parameters whose nature will be discussed in section 5. On either path it is proposed that the basis of decisions may be enhanced by considering features of distributions in addition to differences in median outcomes. Importantly, in both cases conventional differences in medians play a central role so that decisionmakers accustomed to relying on such metrics should find nothing particularly foreign in the proposed strategies.
Given the urgency of discovering effective treatments and vaccines for COVID-19 it is timely to consider measures of effectiveness that align with what is important to decisionmakers and those affected by their decisions. What is proposed here may be helpful to this end, particularly since the paper strives to offer intuitive and easily implemented strategies.
Section 2 offers definitions and other preliminaries. Section 3 discusses basic elements of decisions based on median treatment effects. Sections 4 and 5 present the key issues and results for paths 1 and 2. Section 6 illustrates the applicability of these ideas using three COVID-19 clinical studies. Section 7 summarizes. While the discussion throughout is motivated and framed by attempts to discover the effectiveness of COVID-19 treatments, the issues addressed should be broadly applicable in contexts where considerations of median treatment effects arise.6
2. Definitions and Other Preliminaries
Consider two subject-specific outcomes, y0 and y1 , which might be health status under treatment 0 ( T0) and treatment 1 (T1) and which may or may not—depending on particulars— be potential outcomes where only one of y0 or y1 is observed. Suppose throughout that each yj is non-negative and has either continuous or discrete (integer) measurement. Whether or not both yj are observed for each subject, the subject-level treatment effect is y1 – y0 .7 Unless noted otherwise smaller values of y will correspond to better health outcomes.
Let Pr(…) denote a probability mass or density and F(…) its corresponding cumulative. Define Fj(y) = Pr(yj ≤ y) for j∈{0, 1}.8 Fj(y) will sometimes be abbreviated as Fj. Each yj has support Yj , which may be continuous or discrete. Y = Y0 ∪ Y1 is the common support of y0 and y1; often Y0 = Y1 . Define the α–quantile of Fj (y) as9
| (1) |
mj is shorthand used henceforth for qj (.5) .10 Define the interquartile range (IQR) of each Fj as
| (2) |
Note that IQRj is defined as a two-element set not as an interval as perhaps more conventional.
It is assumed at this point that the Fj are point identified over relevant subsets of Y; requiring identification to be over all y ∈ Y or only over subsets will be considered in context.11 Whether identification results from a randomized trial or some other method is unimportant; indeed each Fj can be learned from a different data source.
Define the difference-in-y-probabilities treatment effect as
| (3) |
a familiar endpoint in TTE and other studies (e.g. difference in 12-month survival probabilities). Define the difference-in- α -quantiles treatment effect as
| (4) |
A special case of (4) is the difference-in-medians treatment effect
| (5) |
Δm is a prominent measure used to compare treatments' effectiveness in clinical studies, particularly albeit not exclusively for TTE outcomes. 12 Δm is point identified under the assumption that both Fj(y) are point identified at least for {y∣Fj(y)≤.5}.
The treatment effect distribution is F(y1 –y0) , which is derived from the joint distribution Pr (y0,y1). In potential-outcomes contexts Pr(y0,y1) is not point identified so that F(y1 –y0) is consequently not point identified.13 When F(y1 –y0) is not point identified, that it can generally be partially identified is an important consideration in what follows.
Quantiles of the treatment effect distribution are denoted
| (6) |
with the median-difference treatment effect given by14
| (7) |
This paper focuses mostly on Δm and mΔ rather than general quantile treatment effects, this owing to the prominence15 of medians in empirical clinical and health research; most results generalize readily to other quantiles. Manski (2007, chapter 7) notes the inequality
| (8) |
and proceeds to suggest that "a researcher reporting a median treatment effect must be careful to state whether the object of interest is the left- or right-hand side [of (8)]."
As will be seen in section 4 understanding and computing bounds on mΔ is facilitated by reference to the inequality probabilities (IPs) Pr(y1 > y0) . Like mΔ IPs are generally not point identified but are partially identified using information on the Fj. Define
| (9) |
The Djk are the largest vertical differences between Fj(y) and Fk(y) over the common support Y.16 These measures, known commonly as Kolmogorov's D-statistics, are depicted in figure 3. Note that at most one of the Djk can exceed .5, a result that will prove useful later.
Figure 3 —
Defining D01 and D10
3. How Δm May Inform Decisions
Imagine that a decisionmaker selects treatment Tp ∈ {T0,T1} where p is determined by
| (10) |
wherein ψ are features of Pr (y0,y1) beyond Δm that may matter to the decisionmaker. For instance with smaller values of y representing better health conventional null-hypothesis-test-based decisionmaking that gives deference to status quo (say T0) over innovation (say T1) might set g. Decisions that treat T0 and T1 symmetrically (see Manski and Tetenov, 2020) might set g (Δm, ψ) = −Δm and h=0. In either case ψ is ignored and Δm is the only feature of Pr(y0,y1) that influences choice.
Of course there are many standards by which two outcome distributions might be compared: moments, quantiles, dominance, etc. On what principles might Δm be advocated as the sole basis of decisions? While Δm may have reasonable sampling properties this is a different matter from it possessing well grounded conceptual properties. For instance comparison of the medians (or other quantiles) of observed marginal distributions are uninformative about the distribution of gains and losses—i.e. F(y1 –y0)—that owe to a change from T0 to T1.
Imbens and Wooldridge, 2009, raise two issues to support their claim that Δm may be of greater interest than mΔ or other features of F(y1 –y0). First, they assert that it is natural for decisionmakers to compare policies via differences between outcome distributions, which "can often be summarized by differences in the quantiles." Second, they note that F(y1 –y0) and associated parameters like mΔ typically cannot be point identified. Each claim has merit.
Yet two counterclaims might be advanced to support consideration of mΔ. The first is statistical: While mΔ cannot generally be point identified it can generally be partially identified informatively, and in a subset of such cases can be sign identified (section 4 and appendix C). The second is conceptual: While decisionmakers may assess treatments' effectiveness by comparison of resulting marginal outcome distributions, they may also care about policies' distributional consequences, e.g. the fractions of the population that benefit or suffer from policy change and how much they benefit or suffer. The latter concerns are not informed by examination of the marginal distributions but require consideration of the joint distribution Pr(y0,y1) or, specifically, F(y1–y0) . Heterogeneous response to policy in direction and magnitude may be important considerations: As articulated nicely by Koenker and Bilias, 2000, "treatment may make otherwise weak subjects especially robust, and turn the strong to jello."
Decisionmakers may or may not feel that the basis of their decisions is enhanced by information on F(y1–y0) or mΔ . Either way a revealed preference argument suggests that Δm measures of treatments' effectiveness provide them with at least some useful information about their choices (see footnote 15). Ease of computation, applicability with right-censored data, and parsimony in summarizing data are three plausible reasons for such popularity. The following two sections discuss decisionmaking that does (path 1; section 4) and does not (path 2; section 5) admit roles for partially identified parameters like mΔ working in conjunction with Δm . In both cases it is proposed that the basis of decisions may be enhanced by considering parameters beyond Δm . To implement such strategies analysts need appeal only to information contained in F0 and F1 ; how that information is used differs along the two paths.17
Finally, while the analytical results reported in sections 4 and 5 may be of interest per se, considering how they may be applicable in real-world decision and policy settings is appropriate. Rather than disrupting the flow of the paper here with lengthy discussion, appendix E describes several policy contexts wherein considerations of median or more general quantile TEs are, or at least arguably should be, prominent.18 To anticipate that discussion it is worth noting one example of regulatory language that naturally motivates such considerations. The regulatory language governing FDA's determination of biological products' effectiveness is:
Effectiveness means a reasonable expectation that, in a significant proportion of the target population, the pharmacological or other effect of the biological product, when used under adequate directions, for use and warnings against unsafe use, will serve a clinically significant function in the diagnosis, cure, mitigation, treatment, or prevention of disease in man. [21 CFR 601.25(d)(2)] (emphasis added)
Note that this standard entails considerations of both population quantiles ("significant proportion") and treatment effect magnitudes ("clinically significant function"). Appendix E suggests statistical formalization of this language and suggests further how the discussion in sections 4 and 5 may usefully address such regulatory questions.
4. Path 1: Assessing Treatments' Effectiveness via Δm and Partially Identified mΔ
The essence of decision problems in this context is captured by a simple picture. Figure 4 depicts some population's discrete Pr (y0,y1) that puts probability mass 1/3 on three (y0,y1) values, {(3,8), (15,11), (17,21)}. Imagine Pr(y0,y1) is known. There is nothing particularly peculiar about the depicted probability structure (Pearson correlation .77, rank correlation 1.0). Yet Δm = −4 < 0 and mΔ = 4 > 0. A decisionmaker knowing nothing more than the pictured information must select T0 or T1. Which should be chosen? Which would you choose?
Figure 4.

Defining Δm and mΔ with Known Pr(y0,y1)
In general one would appeal to the decisionmaker's utility or loss function to answer this question. But this is not the main issue here: Important here is whether the assumed knowledge of mΔ might influence even to some degree a decisionmaker's attitudes about the relative merits of T0 and T1 given that Δm is presumed known. Would such knowledge be even partially influential then the obstacle to decisionmaking based on mΔ in settings where Pr(y0,y1) is not known is mΔ's point identification not, as Imbens and Wooldridge hint, irrelevance of mΔ per se. In these cases partial identification may support such decisionmaking.
Partial Identification of and Bounds on mΔ
Since mΔ is generally partially but not point identified, a decisionmaker whose choice would depend to at least some degree on knowing it must accept that knowing a range of possible values is the best to be hoped for. Whether the knowable range of its possible values suffices for decisionmaking is context dependent.
A parameter θ is partially identified when it is known to reside in the closed, half-open, or open bounds interval b(θ) . Define the bounds set B(θ) = {L(θ),U(θ)} where L(θ) = inf (b(θ)) and U(θ) = sup(b(θ)).19 L(θ) and U(θ) are valid bounds since θ ∈ b(θ) . The bounds B(θ) are sharp when L(θ) and U(θ) are the largest and smallest values consistent with knowledge of θ that are revealed by the data. These tightest bounds and their corresponding b(θ) are denoted B*(θ) = {L*(θ),U*(θ)} and b*(θ) . θ is sign identified if b*(θ) excludes zero.20,21 In passing it may be of interest to note that B*(mΔ) = {−6,6} for the data shown in figure 4, with B*(mΔ) computed here using the permutation method described in appendix C.
Bounds with Partially Observed Outcome Data
Because they are relevant in the case studies appearing in section 6, a few considerations involved in computing bounds on mΔ under two forms of partially observed outcomes—right-censored data and IQR data—are sketched here and discussed more fully in appendix C.
Right-censoring of outcome data is common in TTE studies. Given noninformative right-censoring at yc (outcomes unobserved if they exceed yc ) three scenarios can be considered:
both F0(yc) and F1(yc) exceed .5 (both m0 and m1 are point identified)
only one of F0(yc) and F1(yc) exceeds .5 (one of m0 or m1 is point identified,)
Neither F0(yc) nor F1(yc) exceeds .5 (neither m0 nor m1 is point identified)
These cases, depicted in the top panel of figure 5, have different implications for the identification of Δm and mΔ:
Figure 5 —
Identifying Δm and mΔ with Right-Censored Outcomes: Three right-censoring scenarios (top); Degree of right censoring affects mΔ bounds (bottom)
Δm is point identified; mΔ is partially identified with finite bounds
both Δm and mΔ are partially identified with one finite bound
neither Δm nor mΔ is informatively bounded (no finite bounds)
None of these results is surprising. Yet even under scenario (a) it is noteworthy that while the particular right-censoring threshold yc is of no consequence for point identifying Δm —i.e. m0 , m1 , and Δm are invariant with respect to any yc that exceeds both m0 and m1 —this is not so for computation of mΔ bounds. The extent of right-censoring affects the amount of information used to compute the mΔ bounds. Reducing right-censoring—i.e. increasing yc — can never widen mΔ bounds but may narrow them. See appendix C for discussion.22
The second form of partially observed data of interest here is interquartile-range data. IQRs are often reported alongside medians, sometimes in settings where data are also right censored. For instance IQRs are reported for the primary endpoints in the Cao study that reported entire data (their figure 2 and table 3) as well as in the Hung study that reported the two arms' quartiles only (their table 2; figure 6 here). Indeed it is not unusual to find the only outcome information reported to be that on each outcome's three sample quartiles.
Table 2 —
Empirical Results, Cao et al., 2020, Lopinavir–Ritonavir Trial (Output from medte Stata Program)
| Median Treatment Effects and Related Parameters | |
| Outcome variable: (integer-valued) tti | |
| Group variable (g): tx | |
| Uncensored Obs. for g=0: | 70 |
| Uncensored Obs. for g=1: | 77 |
| Right-censored Obs. for g=0: | 30 |
| Right-censored Obs. for g=1: | 22 |
| Median Treatment Effects | |
| Sample median for g=0 (m0): | 16 |
| Sample median for g=1 (m1): | 16 |
| Diff. in medians m1-m0: | 0 |
| Lower bound on med(F(y1-y0)): | −9 |
| Upper bound on med(F(y1-y0)): | 7 |
| Central Aperture Measures | |
| a1(m*,.5): | 0.0000 |
| a2(.5): | 0.0000 |
| Bounds on Inequality Probabilities | |
| Lower bound on Prob(y1>y0): | 0.0000 |
| Upper bound on Prob(y1>y0): | 0.8259 |
Figure 6 —
Time-to-Negative-Test Outcomes in Hung et al., 2020, Combination-Treatment Study (Quartiles and IQR reported in Hung's table 2)
It turns out that mΔ can be bounded informatively using only the IQRj (2). When the yj are continuously distributed and the Fj everywhere-increasing the IQR-based bounds are
| (11) |
where the qj(.25) and qj(.75) are the observed data. When outcomes are measured as integers the corresponding IQR-based bounds are (see appendix C for details):
| (12) |
Relationships between Δm and B* (mΔ)
While Δm and mΔ describe different quantities, understanding relationships between them may enhance understanding of treatment effectiveness when knowledge of Δm alone inadequately informs choice. If so, whether knowing B*(mΔ) instead of mΔ itself will satisfy a decisionmaker will depend on context.
This subsection presents results on relationships between Δm and B*(mΔ) . Knowledge of Δm turns out to be to some degree informative about B*(mΔ) and vice versa; since both Δm and mΔ derive from Pr(y0,y1) this is unsurprising. Appendix D explains these results.
Result 1 (R1): L*(mΔ) ≤ Δm ≤ U*(mΔ)
Result 2 (R2): Suppose Δm < 0 . Then:
0 ≤ D01 ≤ .5
0 < D10 ≤ 1
L*(mΔ) ≤ 0
R2(c) means that knowing sign (Δm) suffices to sign L*(mΔ) when Δm < 0 . An interpretation is that the data cannot reject sign(mΔ) = sign (Δm). Alternatively, if mΔ is sign identified—with Δm < 0 this means U*(mΔ) < 0 —then its sign must be the same as ΔM's , i.e. mΔ and Δm cannot be sign identified in opposite directions.
Result 3 (R3): Suppose D10 > .5. Then:
Pr(y1 ≤ y0)> .5
Δm < 0
mΔ < 0
Finding Djk > .5 is powerful, sufficing to sign identify mΔ and offer a strong statement about Pr(yj < yk).23 Indeed, mΔ is sign identified if and only if one of the Djk exceeds .5.
Figure 7 integrates these ideas. Shown are two different joint probabilities. Pr(a)(y0,y1) , where y0 and y1 are proximate to each other, has Δm = 2 , mΔ = 2 , B*(mΔ) = {−1,4} , D01 = .4 , and B*(Pr(y1 > y0)) = {.4,1}. In contrast Pr(b)(y0,y1), where y0 and y1 are distant from each other, has Δm = 6 , mΔ = 6 , B*(mΔ) = {3,8} , D01 = 1 (zero-order dominance; Castagnoli, 1984), and B*(Pr(y1 > y0)) = {1,1}. Joint distributions placing probability mass further northwest of the y0 = y1 locus yield stronger identifying information with larger median treatment effects; in such instances the marginals F0 and F1 are "farther apart" in an important sense explored further in section 5.
Figure 7 —
Comparing Implications of Two Different Joint Distributions
Informing Decisions with Δm and Partially Identified mΔ
Imagine a decisionmaker for whom both Δm and mΔ are decision-relevant parameters. Suppose ψ = mΔ in (10) so that treatment choice is Tp1 ∈ {T0,T1} , where
| (13) |
β ∈ [0,1] is presumed known and reflects the relative importance to the decisionmaker of Δm and mΔ . Of course g(Δm, mΔ) = βΔm + (1 – β)mΔ is only partially identified, with
| (14) |
Note from (13) that sign identification of g (Δm, mΔ) suffices for unambiguous decisionmaking. Consider three cases:
Suppose Δm = 0. Then for any β ∈ [0,1] g(Δm, mΔ) would be sign-identified only if mΔ is sign-identified. But from R1 mΔ is not sign identified when Δm = 0.
Suppose Δm ≠ 0 and mΔ is sign-identified. Then g(Δm, mΔ) is sign-identified for any β ∈ [0,1] since from R1 the signs of Δm and mΔ coincide when mΔ is sign-identified.
Suppose Δm ≠ 0 and mΔ is not sign-identified. Then g(Δm, mΔ) is sign-identified for some range of β values. E.g. suppose Δm < 0 and U*(mΔ) > 0 . Then g(Δm, mΔ) is sign identified if β∈(U*(mΔ)/(U*(mΔ)–Δm), 1].
Decisionmakers may be positioned to make defensible choices even when they know only (14) and not (13). If g (Δm, mΔ) is sign identified then the appropriate decision is unambiguous. If not then it is proper to admit reservations about whatever choice is made since its ex ante merits are uncertain; acting otherwise implies a certitude that lacks credibility (Manski, 2020a). Either way to the extent that mΔ is at least minimally important to a decisionmaker (e.g. as in (13)) its partial identification needn't hinder information on its bounds supporting or cautioning choices that must be made.24,25
5. Path 2: Assessing Treatments' Effectiveness via Δm and Other Point-Identified Parameters
Recall from section 3 that Imbens and Wooldridge, 2009, argued that typical policy choices will appropriately consider quantiles of F0 and F1, writing: "choice should be governed by preferences of the policymaker over [ F0 and F1] (which can often be summarized by differences in the quantiles)." One might thus imagine a decision function wherein various quantiles hold different importance for a decisionmaker; with smaller y being better health,
| (15) |
where J is the set of quantiles of interest and the ωα ≥ 0 are importance weights. Should all the Δq (α) in (15) be point identified then Imbens and Wooldridge's claim suggests that the value of Pω,J will determine treatment choice Tpω,J ∈ {T0,T1} Basing choice on Δm alone is a specific version of (15).
Still in the spirit of the Imbens and Wooldridge suggestion of comparing marginal distributions, policy choices might alternatively depend on a set of probability treatment effects,
| (16) |
where K is the set of y-values of interest (e.g. 6-, 12-, and 24-month survival) and πk ≥ 0 are importance weights. Like above pπ,K may determine treatment choice Tpπ,K ∈ {T0,T1}. Familiar decision criteria, e.g. differences in 12-month survival probabilities, are specific versions of (16).
Aperture and Weighted- Δm Measures of Treatment Effectiveness
Point-identified parameters beyond Δm may provide decisionmakers more-nuanced perspectives on treatment effectiveness than those offered by Δm alone. The search is for broadly applicable parameters that provide such perspectives while also being straightforward to implement in empirical studies. The perspective is that of a decisionmaker not prepared to entirely abandon Δm but willing to temper decisions by additional point-identified parameters. With reference to (10), consider
| (17) |
where δ are various point-identified parameters to be described shortly and γ is some threshold of Δm×δ that is meaningful for a decisionmaker. Treatment choice is Tp2 ∈ {T0,T1}.26
As used in the preceding paragraph "treatment effectiveness" is intended to describe a broad sense of the extent to which one treatment makes people better off than another. In the following discussion greater treatment effectiveness means some amalgam of larger α–quantile treatment effects Δq(α) over some α∈(0, 1) and larger y–probability treatment effects ΔF(y) over some y ∈ Y. In essence, greater treatment effectiveness means F0 and F1 are "farther apart" in some directions. The practical challenge is how to summarize this notion with a single parameter, i.e. how to define decision-informative yet simple and parsimonious representations of "farther apart" and "some direction."
Define the area between F0 and F1 over some interval y ∈ (r, s) as the local aperture of F0 and F1 and denote this area A(r, s) .27 "Aperture" signifies that A(r, s) measures the "opening" between F0 and F1 over the interval (r,s). Intuitively, the larger is A(r,s) the more effective is Tj relative to Tk in the vicinity of (r,s) as F0 and F1 are locally "farther apart."28
For any point (y,α) in the region bounded by F0 and F1 over y ∈(r,s) define two approximations to A(r, s) as
| (18) |
and
| (19) |
Using the idea of aperture and its approximations to devise broadly applicable and point-identifiable measures of treatment effectiveness, it is natural to consider aperture at a "central" location in the data, or central aperture. Letting (r,s) = (min{m0, m1}, max{m0, m1}) = (mmin, mmax), define central aperture as A(mmin,mmax). From (18) and (19) and defining29 m* = .5(m0 + m1) it follows that two easily computed approximations to central aperture are30:
| (20) |
and
| (21) |
a1(m*,.5) and a2(.5) are necessarily non-negative since for any y∈(mmin,mmax) has the same sign as the terms multiplying it in (20) and (21).31 For practical purposes it is useful to re-define a1(m*,.5) and a2(.5) so their signs indicate the Δm -direction of treatment effectiveness, i.e.
| (22) |
and
| (23) |
with A (mmin, mmax) analogously signed. These revised definitions are used henceforth.
Figure 8 depicts a1(m*,.5) and a2 (.5) when y0 and y1 are imagined to be gamma-distributed with mQ = 4 and m1 = 2 . In the top panel −a1(m*,.5) = .70 is the area of the shaded rectangle while in the bottom panel −a2(.5) = .64 is the area of the shaded trapezoid.32
Figure 8 —
Approximate Central Aperture, m0 = 4 , m1 = 2 , yj ~ gamma (γj, δj) with γ0 = 4 and γ1 = 1 : a1(m*,.5) = −.70 (top); a2(.5) = −.64 (bottom) (Note: (mj, γj) fix δj)
Noteworthy from (22) and (23) is that a1(m*,.5) and a2(.5) respect but augment Δm. Both are intuitive, easily computed indicators of treatment effectiveness,33 measures that provide broader characterizations of effectiveness than does Δm on its own. That is a1(m*,.5) and a2(.5) describe more comprehensively than does Δm the divergence of F0 and F1 in the "middle" of the data. In an important sense a1(m*,.5)and a2(.5) combine the motivations underlying (15) and (16); for instance
| (24) |
The sense in which central aperture and its approximations are indicators of treatment effectiveness may be appreciated with reference to figure 9 where m0 = 4 and m1 = 2 in both panels wherein F0 is the same but F1 differs (each Fj is gamma distributed). In the top panel δ1 = −.08 and (m*,.5) = −.16 while in the bottom panel δ1 = −.18 and a1(m*,.5) = −.35 . This difference in approximate central aperture conforms to a notion that the effectiveness of T1 relative to T0 is greater in the scenario depicted in the figure's bottom panel even though Δm = −2 in both cases. If a decisionmaker must choose among the treatment options whose outcomes are pictured in figure 9, which one do they select? Might knowledge of the respective a1(m*,.5) influence or support their choice?34
Figure 9 —
a1(m*,.5), with m0 = 4 , m1 = 2 , yj ~ gamma (γj,δj) (Note: (mj,γj) fix δj): γ = .3 , γ1 = .2 , a1(m*,.5) = −.16 (top); γ0 = .3 , γ1 = .9 , a1(m*,.5) = −.35) (bottom)
One might also consider tail aperture or quartile aperture to assess treatment effectiveness, akin to how one might consider comparisons of IQRs, e.g. A(qj(α),qk(α)) for α = .25 , α = .75 , etc., implemented using aj(…) approximations. A vector of such measures might complement the Imbens and Wooldridge, 2009, approach described in (15) and (16), e.g. . Such indicators can be instructive regarding treatment effectiveness if m0 or m1 would not be identified due to right censoring (e.g. consider time to disease onset in COVID-19 vaccine trials; see for instance figure 3 in Polack et al., 2020) or when the information conveyed to a decisionmaker by a1(m*,.5) or a2(.5) is ambiguous (as with the Cao et al., 2020, data where A(mj,mt) = a1(m*,.5) = a2(.5) = Δm = 0 (see figure 2)).
Inspection of (22) and (23) suggests that a1(m*,.5) and a2(.5) can be interpreted as weighted Δm . One might thus consider from (17) a class of weighted Δm , Δm ×δ , where δ∈[0,1] are point-identifiable weights describing decision-relevant aspects of the divergence of F0 and F1 . Conventional analysis implicitly sets δ = 1 but there is no reason to believe δ = 1 best informs a decisionmakers' choices. Such weighted Δm parameters build on Δm but downweight it the smaller is δ . Reasonable candidates for δ include δ1 and .5 × δ2 as in (22) and (23) as well as other parameters characterizing vertical distances between F0 and F1 , e.g. the Djk .35
Tentative Conclusions
As noted above the signs of central aperture and its approximations are given once the sign of Δm is known. As such a decisionmaker whose choice relies only on the sign of Δm revealed in the data needn't bother with aperture measures. But decisionmakers whose choices hinge on treatment effectiveness magnitudes may well care about the specific nature of such magnitudes—in particular about whether more than merely Δm matters—and may thus appreciate that some weighted Δm measure usefully informs their choices. Three prominent examples where treatment effectiveness magnitudes ( γ in (17)) matter fundamentally in decisionmaking are in determining if there are clinically significant differences between treatments' outcomes, in noninferiority trials, and in defining incremental cost-effectiveness ratios Δc/Δe wherein the magnitude of the Δe denominator figures prominently. These are settings where the treatments typically do not stand ex ante in equipoise but rather where one is status quo or standard practice, the other an innovation (see Manski and Tetenov, 2020).
The preceding is not advocating a particular aperture measure or weighted Δm as a decision standard. It is instead an appeal for future research to assess the merits of easily implementable and point-identified criteria like a1(m*,.5) and a2(.5) for decisionmaking using point-identified parameters, the premise being that decisions should be informed more comprehensively than by appealing to Δm alone. Consideration of central aperture might be a useful starting point for such exploration but is unlikely to be its destination. Moving beyond the intuition that parameters akin to a1(m*,.5) and a2(.5) describe treatment effectiveness to a formal assessment of welfare differences they may describe might also be a valuable step.
Determining what values of a1(m*,.5) or a2(.5) —or indeed any other parameters advanced in such discussions—correspond to clinically significant or economically meaningful differences between F0 and F1 will also require novel consideration. (For instance if ΔCS is considered a clinically significant Δm in a particular treatment context then might a magnitude of a1(m*,.5) or a2(.5) exceeding, say, .25×ΔCS suggest clinical significance? Or .1×ΔCS ? Or … ?) While ultimately essential, such pragmatic considerations ought not postpone exploring the merits of parameters that may better inform decisionmakers about treatment effectiveness than do those conventionally used, particularly when, like here, such novel parameters are straightforward to implement in practice.
6. COVID-19: Three Case Studies
Case Study 1: Remdesivir for COVID-19 Treatment (Beigel et al., 2020a)
The Beigel study reports results of a randomized, double-blind, placebo-controlled trial of intravenous remdesivir in adults hospitalized with COVID-19. Its primary outcome is time to recovery (TTR) measured in days, with recovery determined by a patient meeting a prespecified threshold on an ordinal scale. The intention-to-treat sample analyzed here36 consists of 538 and 521 observations on treated and control patients, respectively, of which 147 and 181 observations are right-censored after 28 days. The bottom panel of figure 1 depicts these data.
Table 1 summarizes the results. As noted earlier Δm = −4, favoring treatment over control, while B*(mΔ) = {−15,10} . The bounds interval is fairly wide as can be expected from the wide IP bounds. While the Δm result might lead a decisionmaker to favor treatment over control, wide bounds on mΔ might encourage that same decisionmaker to be conservative in advancing such a recommendation should considerations beyond Δm be relevant. Using earlier arguments, however, note that pβ is sign identified (negative) for β∈[5/7,1] , so that a decisionmaker who appeals to (13) and who weighs mΔ no more than 2/7 is positioned to recommend treatment over control. Finally, for the Beigel data the central aperture measures are A(m1,m0) = −.394 , a1(m*,.5) = −.386 , and a2(.5) = −.375 .
Table 1 —
Empirical Results, Beigel et al., 2020a, Remdesivir Trial (Output from medte Stata Program)
| Median Treatment Effects and Related Parameters | |
| Outcome variable: (integer-valued) ttr | |
| Group variable (g): tx | |
| Uncensored Obs. for g=0: | 340 |
| Uncensored Obs. for g=1: | 391 |
| Right-censored Obs. for g=0: | 181 |
| Right-censored Obs. for g=1: | 147 |
| Median Treatment Effects | |
| Sample median for g=0 (m0): | 15 |
| Sample median for g=1 (m1): | 11 |
| Diff. in medians m1-m0: | −4 |
| Lower bound on med(F(y1-y0)): | −15 |
| Upper bound on med(F(y1-y0)): | 10 |
| Central Aperture Measures | |
| a1(m*,.5): | −0.3862 |
| a2(.5): | −0.3747 |
| Bounds on Inequality Probabilities | |
| Lower bound on Prob(y1>y0): | 0.0000 |
| Upper bound on Prob(y1>y0): | 0.8856 |
Case Study 2: Lopinavir-Ritonavir for COVID-19 Treatment (Cao et al., 2020)
The Cao study reports the results of an RCT involving hospitalized patients with confirmed SARS-CoV-2 infection. Treatment consisted of receipt of lopinavir-ritonavir plus standard care, while control consisted exclusively of standard care. The trial's primary endpoint is time to clinical improvement (TTI), measured in days. The trial was open label; moreover the authors report that placebo treatment in the control group was not possible due to the trial's emergency nature. The intention-to-treat sample consists of 199 patients with 99 and 100 randomized to treatment and control, respectively. By the administrative censoring time of 28 weeks clinical improvement was observed for 77 of the 99 treatment subjects and 70 of the 100 control subjects; thus 22 and 30 subjects' data, respectively, are treated as right-censored. Figure 2 reproduces the data depicted in Cao's figure 2.
Median TTI is 16 days in both arms of the trial so Δm = 0. A decisionmaker concerned only with marginal median outcomes would favor neither treatment. But B*(mΔ) = {−9,7} , so the magnitude of mΔ could be quite substantial in either direction. Regarding central aperture A(mj, mk) = a1(m*,.5) = a2(.5) = 0 in the Cao sample as Δm = 0 .37,38
Case Study 3: Combination Therapy for COVID-19 Treatment (Hung et al., 2020)
The Hung study reports the results of an open-label randomized phase-2 trial comparing combination treatment for COVID-19 (interferon beta-1b plus lopinavir–ritonavir plus ribavirin) with lopinavir–ritonavir alone in a sample of hospitalized patients. The study's primary outcome is time to providing a nasopharyngeal swab negative for severe acute respiratory syndrome coronavirus 2 RT-PCR. 86 and 41 patients were randomly assigned to treatment and control. The time to negative test (TTNT) outcomes are reported as integers (days) and analyzed in an intention-to-treat context.39
Figure 6 depicts the medians and IQRs reported in Hung's table 2; this is the only information the study provides on the outcomes' marginal distributions. As such, the reported TTNT data are right-censored and interval-measured (e.g. q1(.25) = 5 implies F1(5)∈[.25,.5), as depicted by the closed and open symbols in figure 6). For these data Δm = −5 . Applying (12) to these data yields BIQR(mΔ) = {−11,4}.40
7. Summary
Ease of computation, broad applicability, and parsimony in summarizing outcome data are three plausible statistical reasons for the prominence of Δm in health research, whether Δm informs decisionmakers about treatments' effects on patients' health, on healthcare resource demands, or on any other endpoints that may be of concern. By a revealed preference argument, Δm on its own must routinely provide decisionmakers with useful information about the choices they face. Yet this popularity could arise either because parameters other than Δm — ψ in (10)—are not of interest or because parameters beyond Δm are of interest but the cost of using them to inform choices is perceived to be too great.
This paper has suggested that it is straightforward to augment the signals about treatment effectiveness sent by Δm with information about other features of outcomes' distributions in such ways as should more comprehensively inform decisionmakers who must choose on the basis of the data at hand, whether or not those other features are point identified. In essence the strategies presented in this paper may serve to reduce the perceived costs of appealing to parameters beyond just Δm for those decisionmakers who do find them of interest.
While one might imagine a portfolio of alternative approaches, the specific strategies developed here are designed to be broadly roadworthy since they are easily implemented and require—whether a decisionmaker follows path 1 or path 2—assumptions hardly more stringent than those needed for identification of Δm itself. That the measures of treatment effectiveness proposed here are all anchored to Δm should also be of comfort to decisionmakers who have traditionally and broadly relied on Δm in making choices. How these approaches perform in practice, whether they provide useful information to decisionmakers, and how corresponding standards for clinical significance should be defined are among the issues still to be resolved.
Finally the paper has focused on issues involving identification, ignoring considerations of inference used to understand implications of sampling variation. Such considerations may be of interest in some decisionmaking contexts and, if so, would be useful to tackle in future study.
Supplementary Material
Acknowledgments
Thanks are owed to Chris Adams, Marguerite Burns, Domenico Depalo, Mary Hamman, Chuck Manski, Dan Millimet, Rebecca Myerson, Ciaran O'Neill, Jon Skinner, Jeff Smith, participants in the UW-Madison Health Economics Workgroup, and two referees for helpful comments. NICHD grant P2CHD047873 to the Center for Demography and Ecology and NIA grant P30AG017266 to the Center for Demography of Health and Aging, both at UW-Madison, provided logistical support.
Footnotes
Mass media wasted no time reporting these findings. An April 29 headline in the Washington Post read: "Gilead’s remdesivir improves recovery time of coronavirus patients in NIH trial."
The Cao et al., 2020, Beigel et al., 2020a, and Hung et al., 2020, studies will be referenced henceforth simply as Cao, Beigel, and Hung. They will be revisited later in the paper.
See Zarin et al., 2011, who discuss data aggregation in the context of clinical-trial reporting.
It is often the case that even when median TTE is a study's primary outcome the test of differences is not specifically a test of differences in medians but rather a hazard ratio test or log-rank test, often based Cox proportional hazards model estimates. For example, see the results of the Beigel study's final analysis (Beigel et al., 2020b) reported in clinicaltrials.gov (NCT04280705). The conceptual rationales for these approaches are generally unclear, and in any event these are essentially tests for not measures of differences in treatment effectiveness.
Partial identification strategies have been used recently to understand COVID-19 prevalence, consequences, and treatment (see Depalo, 2020, Manski, 2020b, and Manski and Molinari, 2020).
To streamline matters technical details and discussions appear in appendixes and footnotes.
In this paper the term "treatment effect" when unqualified has the precise meaning indicated here. The expressions "median treatment effect," "quantile treatment effect," "treatment effectiveness," and others are generic; their meaning depends on particular context.
For the most part this paper will focus only on issues of identification, leaving considerations of inference involving sampling variation for future study. (Goldman and Kaplan, 2018, offer innovative ideas on testing in contexts like those considered here.) It is thus useful to conceive of samples as being large enough that sampling variation in parameter estimates is negligible for decisionmaking yet small enough that the fundamental reason for sampling—drawing inferences about effectiveness that can subsequently be applied to treatment decisions affecting a broader population—is preserved. Distinguishing population from sample parameters as might be typical is thus of little consequence; as such notation can be streamlined as one needn't distinguish population distributions Pr(…) and Fj (…) from empirical distributions PrN (…) and Fj,N (…).
"min" suffices for most cases covered here but "inf" is technically appropriate (see Hansen, 2020, section 11.13).
See appendix A for discussion of medians' computation and measurement.
In some cases (e.g. estimating mj) point identification need hold only over some subset of Y.
Other characterizations of median TEs based on ratios or percentage changes instead of differences have been considered; see Lee and Kobayashi, 2001, and Rogawski et al., 2017.
In some instances Pr(y0,y1) is point identified even though the corresponding F(y1 – y0) would not readily admit interpretation as a distribution of counterfactual outcome differences. Examples include pre-post and crossover designs. In an ophthalmology study Fan et al., 2003, randomize individuals' left eyes and right eyes to serve as treatment and control "subjects."
Δm and mΔ are what Manski, 1997, refers to generically and respectively as "ΔD" and "DΔ" treatment effects, thus the Δm and mΔ notation used here.
- 92,373 hits using the search string: "median time to" OR "median survival" OR "median progression-free" OR "median length" OR "median duration"
- 191,805 hits using the search string: "median difference" OR "median change" OR "median percentage change" OR "median percent change" OR "median relative change" OR "difference in median" OR "difference between median
- 261,181 hits using the union of these search strings
The corresponding y-ordinates that define the Djk may not be unique.
- When time-to-event endpoints (e.g., mortality) are used, median or mean survival alone is not usually an adequate descriptor. Survival curves (or event-free survival curves) and hazard ratios are often effective ways to display such data. Data can also be summarized at specific times (e.g., prevalence at 3, 6, 9, 12 months) or at specific event frequency (e.g., time to 25 percent, 50 percent, and 75 percent prevalence of events).
While it is easy to appreciate why IPs may be instructive (see Mullahy, 2018) it may be less obvious how knowledge of, or knowledge of bounds on, mΔ informs specific decisions. While applicability to practical questions may be more nuanced there are real-world settings where knowledge of mΔ—or more generally of q(α)Δ—should be valuable (see appendix E).
Using inf/sup instead of min/max covers cases where b (θ) may be half or fully open.
The paper does not consider monotone treatment response (Manski, 1997) or related considerations (Lee, 2000). See appendix B for a brief discussion.
Appendix C discusses strategies for computing B*(mΔ). A Stata program, medte, computes B*(mΔ) with uncensored or right-censored outcomes. A zip file containing the do-file defining the medte program is available here. medte reports Δm, B*(mΔ), the IP bounds defined in appendix C ((C.1) and (C.2)), and the approximate central aperture measures defined in section 5. The zip file also contains a readme file and the two datasets whose analysis is reported in section 6 and tables 1 and 2.
For ethical and statistical reasons stopping clinical trials "early for benefit" is controversial (Pocock, 1992). Ethical considerations aside, if the only endpoints of interest are the median TTE outcomes m0 and m1 then there is no further statistical benefit to be realized by waiting to amass more data once the events for half the subjects in each treatment arm have occurred.
With smaller values of y corresponding to better health it might be reasonable in such cases to consider Tj to be a breakthrough treatment. See appendix E.
Exploring such decisions in a minimax-regret framework may be instructive; see Manski, 2018a,b, for insights on regret and its role in decisionmaking in clinical and related contexts.
When β is unknown, a modest yet pragmatic suggestion—one that still acknowledges both the decision-relevance of mΔ as well as its partial identification—is simply to routinely report B*(mΔ) alongside point estimates of Δm when a study's results are tabulated. For instance, one might report −4 {−15,10} in a tabular summary of the results reported in table 1 (being careful to note that this is not a conventional confidence interval).
The approaches suggested in this section are seemingly new and admittedly speculative; their merits would need to be vetted more thoroughly than the scope of this paper permits.
With discrete data "area" should be interpreted as a scaled sum over a set y ∈ Q = {r,…,s}, e.g. if Q is a set of consecutive integers (with R = #Q).
Imagine at this point that F0 and F1 do not cross on the interval (r, s). Crossovers are seen below to be irrelevant for the particular approaches proposed here.
Note that m* may or may not be in the common support Y (e.g. if Y is the set of non-negative integers).
The vertical distances between F0 and F1 described by the δj not only define probability treatment effects but also correspond roughly to the degree of informativeness (or tightness) of bounds on parameters like mΔ and IP (see section 4 and appendix C). Moreover while the roles played by the δj in (22) and (23) may be intuitive per se, it might also be noted that they are respectively (if positive) Boole-Fréchet lower bounds on Pr(yj < m*,yk > m*) and Pr(yj < mk,yk > mj) (if the δj < 0 then reverse the inequality directions in these probability statements). These translate roughly as "yj is small and yk is large so yk – Yj is large."
Note that F0 and F1 cannot cross over their local domains in the definitions of a1(m*,.5) and a2(.5). E.g. suppose m1 < m0 ; then F1(y) ≥ .5 and F0(y) < .5 for m1 ≤ y < m0.
The .5 multiplier in the expression for a2(.5) in (23) arises since the area of the trapezoid PQRS is the combined area of the two right triangles PSQ and RQS.
a1(m*,.5) and a2(.5) can be estimated even with right censoring; they are point identified so long as both Fj(y) are point identified for all y ≤ max{m0,m1}.
For positive y the global aperture of F0 and F1 —the (net) area between F0 and F1 over the entirety of Y—is the difference in means E0[y] – E1[y] = −ΔE[y], necessarily finite in a sample but perhaps not so in a population where Y is unbounded. If a decisionmaker uses ΔE[y] to inform choice they are appealing to the global aperture of F0 and F1 even if they do not appreciate this explicitly. While ΔE[y] could be used to gauge treatment effectiveness it will often be impractical in clinical studies to do so due to right censoring. Also note that F0 and F1 can cross over Y, making any corresponding notion of aperture as "opening" somewhat fuzzy.
Referring to Δm × δ to gauge treatment effectiveness recalls the Harberger, 1971 (eq. 2), first-order approximation to welfare change due to a "treatment": the product of marginal value and quantity change. Quantity change alone—here Δm—tells an incomplete story; a fuller story unfolds by considering as well the worth of quantity along the margin of quantity change.
The data used in this section's analyses of the first two case studies as well as in producing figure 1 (bottom panel) and figure 2 were coded from an "eyeball analysis" of Beigel's figure 2A and Cao's figure 2 then calibrated as necessary to match the reported sample medians.
As seen in figure 2 F0 first-order dominates F1 (at least up to the censoring threshold), so that a decisionmaker might have other reasons to favor T1.
While this paper has sidestepped issues of sampling variation and inference it might be noted that a1(m*,.5) and a2(.5) are themselves subject to sampling variation. Even in a case like the Cao data where the point estimates of both measures are zero (since Δm = 0) there will generally arise non-degenerate sampling distributions. A simple bootstrap with 1,000 replications shows for those data 95% central sampling intervals of [−.85,.09] for a1(m*,.5) and [−.80,.07] for a2(.5) with respective sampling distribution medians of −.04 and −.04 . (For the Beigel data the corresponding results are [−1.15, −.05] for a1(m*,.5) and [−1.11,−.04) for a2(.5) with respective sampling distribution medians of −.42 and —.40.)
The Hung study reports no information on right-censoring. The analysis proceeds as if the data are uncensored, i.e. that the reported quantiles are those corresponding to the full sample.
A curiosity is that IQRs and medians are sometimes reported when right-censoring probabilities exceed .25 or .5. For example, the Cao study reports IQR0 as {15,18} even though the right-censoring fraction in the control group is .30. Other clinical studies report analogous results. Whether what is reported in such instances derives from model-based predictions (e.g. from Cox proportional hazard models), from other sources, or from erroneous calculations is not always evident. In no event, however, are right-censoring probabilities exceeding .5 or .25 logically consistent with point-identified medians or .75-quantiles.
References
- Beigel JH et al. 2020a. "Remdesivir for the Treatment of Covid-19—Preliminary Report." NEJM. DOI: 10.1056/NEJMoa2007764 (May 22, 2020). [DOI] [PubMed] [Google Scholar]
- Beigel JH et al. 2020b. "Remdesivir for the Treatment of Covid-19—Final Report." NEJM. DOI: 10.1056/NEJMoa2007764 (October 8, 2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cao B et al. 2020. "A Trial of Lopinavir–Ritonavir in Adults Hospitalized with Severe Covid-19." NEJM 382: 1787–1799. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Castagnoli E 1984. "Some Remarks on Stochastic Dominance." Revista di Matematica per le Scienze Economiche e Sociali 7: 15–28. [Google Scholar]
- Depalo D 2020. "True COVID-19 Mortality Rates from Administrative Data." Journal of Population Economics. DOI: 10.1007/s00148-020-00801-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan DSP et al. 2003. "Ocular-Hypertensive and Anti-Inflammatory Response to Rimexolone Therapy in Children." Archives of Ophthalmology 121: 1716–1721. [DOI] [PubMed] [Google Scholar]
- Goldman M and Kaplan DM. 2018. "Comparing Distributions by Multiple Testing across Quantiles or CDF Values." Journal of Econometrics 206: 143–166. [Google Scholar]
- Hansen B 2020. Introduction to Econometrics (Vol. 1). Web textbook. University of Wisconsin-Madison, Department of Economics. (https://www.ssc.wisc.edu/~bhansen/probability/, accessed May 21, 2020) (May 2020 version). [Google Scholar]
- Harberger AC 1971. "Three Basic Postulates for Applied Welfare Economics: An Interpretive Essay." Journal of Economic Literature 9: 785–797. [Google Scholar]
- Hung IF-N et al. 2020. "Triple Combination of Interferon beta-1b, Lopinavir–Ritonavir, and Ribavirin in the Treatment of Patients Admitted to Hospital with COVID-19: An Open-label, Randomised, Phase 2 Trial." The Lancet 395: 1695–1704. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Imbens GW and Wooldridge JM. 2009. "Recent Developments in the Econometrics of Program Evaluation." Journal of Economic Literature 47: 5–86. [Google Scholar]
- Koenker R and Bilias Y. 2001. "Quantile Regression for Duration Data: A Reappraisal of the Pennsylvania Reemployment Bonus Experiments." Empirical Economics 26: 199–220. [Google Scholar]
- Lee M-J 2000. "Median Treatment Effect in Randomized Trials." JRSS-B 62: 595–604. [Google Scholar]
- Lee M-J and Kobayashi S. 2001. "Proportional Treatment Effects for Count Response Panel Data: Effects of Binary Exercise on Health Care Demands." Health Economics 10: 411–428. [DOI] [PubMed] [Google Scholar]
- Manski CF 1997. "Monotone Treatment Response." Econometrica 65: 1311–1334. [Google Scholar]
- Manski CF 2007. Identification for Prediction and Decision. Cambridge: Harvard University Press. [Google Scholar]
- Manski CF 2018a. "Credible Ecological Inference for Medical Decisions with Personalized Risk Assessment." Quantitative Economics 9: 541–569. [Google Scholar]
- Manski CF 2018b. "Reasonable Patient Care under Uncertainty." Health Economics 27: 1397–1421. [DOI] [PubMed] [Google Scholar]
- Manski CF 2020a. "The Lure of Incredible Certitude." Economics and Philosophy 36: 216–245. [Google Scholar]
- Manski CF 2020b. "Bounding the Predictive Values of COVID-19 Antibody Tests." Working Paper (May 14, 2020, version). Department of Economics and Institute for Policy Research, Northwestern University. [Google Scholar]
- Manski CF and Molinari F. 2020. "Estimating the COVID-19 Infection Rate: Anatomy of an Inference Problem." Journal of Econometrics (in press) 10.1016/j.jeconom.2020.04.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Manski CF and Tetenov A. 2020. "Statistical Decision Properties of Imprecise Trials Assessing COVID-19 Drugs." NBER Working Paper 27293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mullahy J 2018. "Individual Results May Vary: Inequality-Probability Bounds for Some Health-Outcome Treatment Effects." Journal of Health Economics 61: 151–162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pocock SJ 1992. "When to Stop a Clinical Trial." BMJ 305: 235–240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Polack FP et al. 2020. "Safety and Efficacy of the BNT162b2 mRNA Covid-19 Vaccine." NEJM 383: 2603–2615. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rogawski ET et al. 2017. "Estimating Differences and Ratios in Median Times to Event." Epidemiology 27: 848–851. [DOI] [PMC free article] [PubMed] [Google Scholar]
- U.S. Food and Drug Administration. 2006. Clinical Studies Section of Labeling for Human Prescription Drug and Biological Products—Content and Format: Guidance for Industry. Rockville, MD: U.S. FDA. [Google Scholar]
- U.S. Food and Drug Administration. 2020. Coronavirus (COVID-19) Update: FDA Issues Emergency Use Authorization for Potential COVID-19 Treatment. https://www.fda.gov/news-events/press-announcements/coronavirus-covid-19-update-fda-issues-emergency-use-authorization-potential-covid-19-treatment (May 1, 2020; accessed May 6, 2020).
- U.S. National Institute on Allergy and Infectious Diseases (NIAID). 2020. NIH Clinical Trial Shows Remdesivir Accelerates Recovery from Advanced COVID-19. https://www.niaid.nih.gov/news-events/nih-clinical-trial-shows-remdesivir-accelerates-recovery-advanced-covid-19 (April 29, 2020; accessed May 6, 2020).
- Washington Post. 2020. "Gilead’s remdesivir improves recovery time of coronavirus patients in NIH trial." https://www.washingtonpost.com/business/2020/04/29/gilead-says-positive-results-coronavirus-drug-remdesivir-will-be-released-by-nih/ (reported April 29, 2020; accessed May 6, 2020).
- Zarin DA et al. 2011. "The ClinicalTrials.gov Results Database—Update and Key Issues." NEJM 364: 852–860. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.








