Significance
Tuberculosis (TB) remains a serious global health problem. A new, more accurate test for diagnosis was endorsed by the World Health Organization in 2010. However, trials showed that using the test did not yield reductions in TB-related deaths. To help understand why, we model how a clinician might decide whether to order tests for TB and whether to treat a patient for TB, with or without test results. We highlight the role of uncertainty about the prevalence of TB and the accuracy of different tests, for patients with different characteristics. We show that, given such uncertainty, a reasonable policy may be to diversify testing and treatment, randomly assigning patients with certain characteristics to different combinations of testing and treatment.
Keywords: public health, medical decision making, decision under ambiguity, tuberculosis, diagnosis and treatment
Abstract
In 2017, 1.6 million people worldwide died from tuberculosis (TB). A new TB diagnostic test—Xpert MTB/RIF from Cepheid—was endorsed by the World Health Organization in 2010. Trials demonstrated that Xpert is faster and has greater sensitivity and specificity than smear microscopy—the most common sputum-based diagnostic test. However, subsequent trials found no impact of introducing Xpert on morbidity and mortality. We present a decision-theoretic model of how a clinician might decide whether to order Xpert or other tests for TB, and whether to treat a patient, with or without test results. Our first result characterizes the conditions under which it is optimal to perform empirical treatment; that is, treatment without diagnostic testing. We then examine the implications for decision making of partial knowledge of TB prevalence or test accuracy. This partial knowledge generates ambiguity, also known as deep uncertainty, about the best testing and treatment policy. In the presence of such ambiguity, we show the usefulness of diversification of testing and treatment.
Addressing the continuing prevalence of tuberculosis (TB) in several regions of the world remains a major priority for global health policymakers and practitioners. In 2017, 1.6 million people worldwide died from TB (1), and the “End TB” strategy of the World Health Organization (WHO) is committed to reduce TB deaths by 95% during the period 2015 to 2035 (2). A key challenge in fighting TB is that of improving capacity for rapid and accurate diagnosis. A new TB diagnostic test—Xpert MTB/RIF from Cepheid—was endorsed by the WHO in 2010 (3). Trials to establish Xpert’s diagnostic effectiveness demonstrated that Xpert is faster and has greater sensitivity and specificity than smear microscopy—the most common sputum-based diagnostic test under the status quo (4, 5). Xpert is also much faster at diagnosing multidrug-resistant (MDR) TB compared to existing culture tests. However, subsequent trials to establish Xpert’s impact on morbidity and mortality found no statistically significant effect across a range of settings (6–12), although they were not powered to detect modest effect sizes.
Considering this apparent paradox, we present a decision-theoretic model of how a clinician might decide 1) whether to order one or more tests for TB and 2) whether to treat a patient, with or without test results. The model is prescriptive; that is, it seeks to improve the performance of actual decision making. This contrasts with a descriptive model, which would seek to understand and predict how actual decision makers behave. We begin by defining optimal decision making but then show that the clinician typically does not have the information required to assess optimality. We therefore ultimately focus on reasonable decision making, as characterized in previous work (13–15).
The model highlights 2 key features of decision making for TB diagnosis and treatment. First, evidence from the aforementioned trials suggests that an important driver of Xpert’s lack of impact on mortality is that clinicians frequently engage in empirical treatment under the status quo. That is, they treat patients for TB using observation alone, without obtaining a positive result from diagnostic tests (16, 17). Our model shows that there exist settings in which clinicians may find empirical treatment to be optimal for some groups of patients, even when one or more diagnostic tests are available. When a new, superior diagnostic is introduced, it may still be optimal to choose empirical treatment.
Second, important parameters relevant to decision making may not be known by the clinician. Moreover, there may exist no credible basis for asserting a subjective probability distribution over the possibilities, as is supposed in the Bayesian paradigm for decision making under uncertainty. Diagnosis and treatment are then a problem of decision making under ambiguity (18), also known as deep uncertainty.
There may, for example, be ambiguity regarding the prevalence of TB in specific subpopulations of patients. Reasons for this ambiguity may include underreporting, misdiagnosis, limited granularity in available data on patient characteristics, and absence of scientific consensus on the relationship between latent and active TB (see ref. 19 and accompanying rapid responses). Prevalence surveys, expert opinion, and modeling may provide credible bounds on prevalence for certain subpopulations but no basis for choosing a point estimate or asserting a subjective probability distribution between the bounds.
Similarly, there may be ambiguity about the diagnostic effectiveness of Xpert for subpopulations that differ from the populations studied in trials. That is, the external validity of the available trials may be unclear. For example, trials performed on HIV-positive adults who are receiving antiretroviral therapy (ART) may not reveal much about diagnosis and treatment of children, or of ART-naïve HIV-positive adults with advanced levels of immunosuppression.
Considering this ambiguity, it is important to ask what the policy response to the availability of a new diagnostic test should be. We build on refs. 13–15 to show that under ambiguity it is reasonable for clinicians to pursue diversification. That is, within groups of observationally identical patients, clinicians may want to randomly test and treat some fraction of patients, but not others. Diversification has the immediate benefit of eliminating gross errors. Over time, it also generates new evidence on the accuracy of new diagnostic tests and on treatment response, like the evidence produced by a randomized controlled trial.
The problem of TB diagnosis and treatment under ambiguity exemplifies a broad class of decision-making problems under ambiguity in global health. Randomized trials and observational studies of diagnostic tests and treatments may often yield credible bounds on test accuracy and treatment response in given settings, but not credible point estimates or subjective probability distributions between these bounds. Moreover, even when well-designed trials have high internal validity they may lack external validity. That is, it may be difficult to extrapolate trial findings to different patient populations or different healthcare contexts.
A Model of Optimal Diagnosis and Treatment Decisions
We first abstract from ambiguity and study decision making when the clinician knows the population parameters that determine optimal diagnosis and treatment decisions. The idealized optimization model presented here applies to a broad spectrum of medical settings. We specify how it relates to TB.
Basic Concepts and Notation.
To model diagnosis and treatment decisions, we first specify a decision maker and the feasible actions. We use the concepts and notation of ref. 14, applying the abstract setup developed there to TB. As there, we consider a clinician who cares for the patients who present for examination. We consider these patients to be predetermined, and we assume that they comply with the clinical decisions.*,† We also assume that treatment decisions for these patients do not affect disease transmission.‡
The clinician initially observes patient covariates that may include medical history, demographic attributes, measures of health status, and patient statements expressing their preferences for care and outcomes. The clinician can choose a treatment based on observing these covariates alone (empirical treatment) or order a test providing further evidence. In the latter case, the clinician chooses a treatment after observing the test result.
Let x denote the initially observed patient covariates and let t denote a treatment. We suppose that x is discrete and there are 2 feasible treatments, denoted A and B. These conditions can be weakened in principle. In the context of TB, let treatment A indicate surveillance—a decision not to prescribe antibiotics. Let treatment B indicate aggressive treatment—prescription of antibiotics, perhaps complemented by nutritional supplements.§
Let s indicate whether the clinician orders the test, with s = 1 if she orders it and s = 0 otherwise. Let r denote the test result. Suppose that r can take 2 values: p, positive, indicating the patient has the condition, or n, negative, indicating the patient does not have the condition.¶
The feasible actions, and the accompanying knowledge of patient covariates, may be expressed as a decision tree. The clinician chooses s = 1 or s = 0 with knowledge of x. If s = 0, she chooses t = A or t = B knowing x. If s = 1, she chooses t = A or t = B knowing (x, r).
When the clinician makes the testing decision, patients with the same value of x are observationally identical, while those with different values are observationally distinct. Hence, the clinician can use x to profile, making different testing decisions for patients with different values of x. The clinician cannot systematically differentiate patients with the same value of x. However, she can randomly differentiate among them, ordering testing for some fraction and not testing the remainder. We term this diversification in testing.
To formalize this idea, let δS(x) be the fraction of patients with covariates x who are tested and let 1 − δS(x) be the fraction who are not tested. The clinician can choose δS(x) to be any fraction in the interval [0, 1]. This done, she tests a randomly drawn fraction δS(x) of the patient group and does not test the remainder. Such randomization could be implemented in a similar way to random security screening at airports, or random drug testing of athletes.
Applying similar reasoning, the clinician can profile treatment across groups of patients with different observed covariates and randomly differentiate treatment among patients with the same observed covariates. When considering treatment, we distinguish 3 groups of patients. Those who are not tested have observed covariates x when treated. Those who are tested have observed covariates (x, r) when treated, with r equaling n or p. Among patients who are not tested, let δT0(x) be the fraction with covariates x who receive treatment B and let 1 − δT0(x) be the fraction who receive A. Among those who are tested, let δT1(x, r) be the fraction with covariates (x, r) receiving B and let 1 − δT1(x, r) be the fraction receiving A.
Welfare Function.
We next specify a welfare function embodying the objective of the clinician. Rather than consider a patient in isolation, we will suppose that the objective is to optimize care on average across the patients in her practice. This sense of optimization does not require certainty about what treatment is best for each patient. It only requires knowing mean treatment response for patients with the same observed covariates.
Discussions of patient care often suppose optimization for each patient separately, without reference to care of other patients. However, a clinician can optimize care for a single patient only if she knows enough to be certain what treatment is best for this patient. Clinicians typically lack this knowledge, particularly when deciding whether to order a test. After all, the medical purpose of testing is to yield evidence on health status that may be helpful when choosing a treatment. If the clinician were already to know the best treatment, there would be no medical reason to perform a test.
It remains to specify the welfare function. As in ref. 14, we assume that the clinician aggregates the benefits and harms of making a specific testing and treatment decision into a scalar measure of welfare. Going beyond the abstract notation of ref. 14, we make the dependence of welfare on illness explicit. Let z = 1 if the patient is ill and z = 0 otherwise. The clinician does not know z when choosing (s, t). Given this, testing and treatment decisions will depend on a patient’s risk of illness rather than on realized illness outcomes.
Specifically, let U(z, s, t) summarize the clinician’s assessment of the benefits and harms that would occur if she were to make testing decision s and treatment decision t for a patient whose illness state was z. The welfare measure may express not only health outcomes but also patient preferences and financial costs. In the case of TB, the welfare U(z, s, A) from the decision not to treat a patient may also include the value of possible future treatment, based on the probability that an untreated patient will come back to be diagnosed and treated again at a later date. Patients may respond heterogeneously to testing and treatment, so U(z, s, t) may vary across patients. Given that testing for TB is noninvasive and does not affect treatment outcomes directly, welfare may have the additive form U(z, s, t) = u(z, t) – Ks, where u(z, t) measures life-cycle expected utility for a patient with illness state z and treatment t (including direct utility from health, financial benefits from being fit to work, and so on), while K measures the financial cost of testing.
Mean welfare across patients is determined by the fraction with each covariate value that the clinician assigns to each option for testing and treatment. Suppose that x lies in a finite set X of feasible values. For each x ∊ X, let P(x) denote the fraction of patients with value x. For each r ∊ {p, n}, let f(r|x) denote the fraction of the patients with value x who would have test result r if they were tested.
For each value of (s, t), let E[U(z, s, t)|x] be the mean expected welfare that results if all patients with x receive (s, t). Let E[U(z, s, t)|x, r] be the mean expected welfare that results if all those with covariate value x and test result r receive (s, t). We assume that, for each value of (s, t), the potential welfare function U(∙, s, t) varies across patients in a way that is mean independent of their illness outcomes z, conditional on x and on (x, r); that is, E[U(z, s, t)|z = 0, x] = E[U(0, s, t)|x] and E[U(z, s, t)|z = 1, x] = E[U(1, s, t)|x]. It then follows from the law of iterated expectations that
| [1a] |
| [1b] |
Let δ = [δS(x), δT0(x), δT1(x, r), x ∊ X, r ∊ {p, n}] denote a specified testing–treatment allocation. The mean welfare W(δ) that results if the clinician chooses δ is obtained by averaging the various mean welfare values E[U(z, s, t)|x] and E[U(z, s, t)|x, r] across the groups who receive them. Hence,
| [2] |
Optimal Testing and Treatment.
An optimal testing and treatment allocation maximizes W; ref. 14. shows that an optimal allocation is
| [3a] |
| [3b] |
| [3c] |
| [3d] |
Each maximum is unique when the inequality is strict, while all allocations yield the same welfare when the values are equal.
The above derivation shows that empirical treatment (treatment with antibiotics without performing a diagnostic test) is optimal when the inequality in [3a] does not hold and the inequality in [3b] does hold. Empirical treatment is not optimal otherwise.
The analysis in ref. 14 extends immediately to settings with more than 2 possible test results. To perform the extension, one simply sums over the feasible results in [3a] and extends [3c] and [3d] to consider each feasible result. For example, r could be an ordered measure of the magnitude of a test finding, or one could undertake multiple tests, in which case r gives a combination of test results. If one undertakes multiple tests, the analysis assumes that they are ordered together, and their results are observed simultaneously. Sequential testing can in principle be accommodated, but it requires generalization of the framework.
Risk of Illness and Treatment Decisions.
To simplify further computations, we henceforth use a more compact notation for E[U(z, s, t)|x] and E[U(z, s, t)|x, r], as follows:
| [1a′] |
| [1b′] |
We can simplify further if we assume that knowledge of a test result does not directly affect patient welfare in a given illness state: formally, Uxr(0, s, t) = Ux(0, s, t) and Uxr (1, s, t) = Ux (1, s, t). With this assumption, knowledge of the test result affects decision making purely by changing risk assessment from Px to Pxr, not for any other reason. In the context of TB, the assumption means that, if one were to know that a patient is or is not ill with the disease, the result of a microscopy or Xpert test would not affect welfare. We think this assumption is realistic and maintain it below.
With the above notation and assumption, the treatment decision criteria in [3b] through [3d] are as follows:
| [3b′] |
| [3c′] |
| [3d′] |
In the medical literature on diagnostic testing, Px is called the base rate or the prevalence of the illness for patients with covariates x. Pxp is called the positive predictive value of a test and 1 − Pxn is called the negative predictive value. In general, Pxp > Pxn. An ideal test that perfectly predicts disease would have Pxp = 1 and Pxn = 0. In practice, tests are imperfect predictors, so 1 > Pxp > Pxn > 0.
A considerable part of the medical literature measures test accuracy in a different way, reporting the sensitivity and specificity of a test. Sensitivity is the probability that the test result is positive conditional on the patient’s being ill, P(r = p|x, z = 1). Specificity is the probability that the result is negative conditional on the patient being healthy, P(r = n|x, z = 0).
Sensitivity and specificity do not provide the information that a clinician would want to have to inform patient care. These measures of accuracy permit one to predict the test result conditional on patient health status, but the clinician’s problem is to predict health status conditional on the test result. Perceptive writers on diagnostic testing have long cautioned that sensitivity and specificity do not inform patient risk assessment. For example, Altman and Bland (ref. 20, p. 102) wrote:
The whole point of a diagnostic test is to use it to make a diagnosis, so we need to know the probability that the test will give the correct diagnosis. The sensitivity and specificity do not give us this information. Instead we must approach the data from the direction of the test results, using predictive values. Positive predictive value is the proportion of patients with positive test results who are correctly diagnosed. Negative predictive value is the proportion of patients with negative test results who are correctly diagnosed.
Despite the cautions expressed in articles such as ref. 20, it has remained common to measure the accuracy of diagnostic tests by their sensitivity and specificity, without providing the value of the prevalence required to derive predictive values. We discuss the potential implications of this later.
Threshold Risk Assessments for Choice between Surveillance and Aggressive Treatment.
It is often credible to make various assumptions about patient welfare when comparing surveillance and aggressive treatment. In particular,
Health is better than illness: Ux(0, s, t) > Ux (1, s, t) for all (s, t).
Testing is costly/harmful: Ux(z, 0, t) > Ux(z, 1, t) for all (z, t).
Surveillance is better than aggressive treatment when healthy: Ux(0, s, A) > Ux(0, s, B) for all s.
Aggressive treatment is better than surveillance when ill: Ux(1, s, B) > Ux(1, s, A) for all s.
These assumptions are realistic in the TB context: 1) a patient is better off not having TB than having TB; 2) performance of a microscopy or Xpert test does not harm patients but does incur financial costs; 3) when a patient is healthy, there is no benefit from prescription of antibiotics, but there are financial costs and possible harms to patients; and 4) when a patient is ill with TB, the health benefits of antibiotic treatment exceed the financial costs and harms.
Analysis in ref. 15 shows that under assumptions 3 and 4 the treatment criteria in [3b′] through [3d′] yield simple solutions. Aggressive treatment is the optimal decision if the risk of illness equals or exceeds a threshold that equalizes mean welfare under treatments A and B. Surveillance is better if risk is less than or equal to the threshold.
In the absence of testing, risk of illness is measured by Px and the threshold yielded by criterion [3b′] is
| [4a] |
With testing, risk of illness is measured by Pxp or Pxn, respectively. The threshold yielded by both the criteria in [2c′] and [2d′] is
| [4b] |
Under assumptions 3 and 4 both thresholds lie in the open interval (0, 1).
It is important to keep in mind that some treatment errors occur with optimal decisions. Define a type I error to be a choice of treatment B when A is optimal. Let a type II error be a choice of A when B is optimal. Suppose that a clinician makes optimal decisions as derived above. Without testing, type I errors do not occur when Px < P*x0 and type II errors do not occur when Px > P*x0. Type II errors occur with probability Px when Px < P*x0 and type I errors occur with probability (1 − Px) when Px > P*x0. Analogous results hold with testing.
How Testing Affects Treatment.
We observed earlier that empirical treatment is optimal if the inequality in [3a] does not hold and the inequality in [3b] does hold. When choosing between surveillance and aggressive treatment, the inequality in [3b] reduces to the condition Px > P*x0. We now ask how, if at all, testing affects treatment.
In general, the answer to this question is complex because the thresholds P*x0 and P*x1 in [4a] and [4b] may differ. Substantial simplification occurs if the thresholds are equal. A sufficient condition for equality is the assumption that testing imposes an additive treatment-invariant cost on welfare; that is, Ux(z, 0, t) − Ux(z, 1, t) = K > 0 for some positive K, for all z and t. This assumption is realistic in the case of TB, since treating a patient with antibiotics is generally no more or less costly depending on whether the patient has taken a diagnostic test. In contrast, the assumption may be violated for diseases where treatment is easier to perform after a test, for example if a testing procedure is invasive, and treatment can be delivered at the same time.
Suppose that P*x0 = P*x1 and let the common value be denoted P*x. Then, the implication of testing for treatment depends purely on the magnitudes relative to P*x of the pertinent probabilities of illness (Px, Pxn, Pxp). It holds algebraically that Px lies between Pxn and Pxp. Specifically,
| [5] |
where fx ≡ f(r = p|x) is the probability of a positive test result. We assume that a positive result indicates a higher risk of illness than does a negative one, so Pxn < Pxp. This inequality and [5] yield the inequality Pxn < Px < Pxp.
It follows that testing affects optimal treatment if and only if the inequality Pxn < P*x < Pxp holds. Given this inequality, a patient with a positive test result receives treatment B, and a patient with a negative test result receives A. In the absence of testing, the patient might receive either A or B, depending on whether the risk of illness is above or below the threshold characterized in [4a].
Testing does not affect treatment otherwise. If Pxp < P*x, treatment A is optimal with or without testing. If Pxn > P*x, treatment B is optimal with or without testing.
Testing and Treatment under Ambiguity
Sources of Ambiguity and Standard Decision Criteria.
Within the model of optimal diagnosis and treatment, optimization for patients with covariates x is feasible if one knows the mean welfare function Ux(∙, ∙, ∙), the illness probabilities (Px, Pxn, Pxp), and the probability fx of a positive test result. A clinician with incomplete knowledge may not be able to optimize and hence faces a problem of decision making under ambiguity. To formalize incomplete knowledge, let the state space, denoted Γ, list the vectors (Uγx, Pγx, fγx, Pγxp, Pγxn), x ∊ X, γ ∊ Γ that satisfy [5] for each value of x and that are deemed feasible based on available evidence and maintained assumptions.
To consider decision making under ambiguity (14), begins with the welfare function of [2], considered as a function over the state space. For each γ ∊ Γ,
| [6] |
where
| [7a] |
| [7b] |
With this structure, one may in principle study decision making using standard criteria, including maximization of subjective expected welfare, maximin, and minimax-regret. Maximization of subjective expected welfare is a standard dynamic programming problem and thus is tractable. However, it requires specification of a subjective distribution on the state space, which we find difficult to motivate. Study of the other criteria appears to require complex new analysis.
Piecemeal Minimax-Regret Decision Making.
Rather than pursue any of the above approaches, we propose a piecemeal minimax-regret criterion. We consider each value of x and each of the 4 component decisions in isolation from one another. These components are the choices 1) to test or not to test, 2) between A and B without testing, 3) between A and B with testing and a positive result, and 4) between A and B with testing and a negative result. Each choice is a decision between 2 options, making piecemeal decision making relatively simple to study. Piecemeal decision making may also be realistic in settings where each component decision may be performed by a different clinician, for example if a different clinician may be on duty for the follow-up consultation to make the treatment decision once the test results have been received. In such a setting, each clinician cannot control what the clinician making the subsequent decision will do and may not even be able to communicate with him or her. Thus, a reasonable approach is to model each subsequent decision as separate.
We perform analysis that extends the study of minimax-regret decision making in ref. 13. The extension is especially simple if we suppose that Ux(∙, ∙, ∙) is known; hence, the threshold risk assessment P*x is known. Considerable scope for ambiguity remains through incomplete knowledge of (Px, Pxn, Pxp) and fx. We proceed abstractly here and characterize the ambiguity in the TB context later.
The minimax-regret analysis in ref. 13 can be applied separately to each component decision. In each case, the result is a singleton allocation of patients in the absence of ambiguity and a fractional allocation with ambiguity. In nontechnical language, a fractional allocation means diversification of treatment.
Consider decisions 3 and 4. The options are treatments A and B. For test result r, let the smallest and largest feasible values of Pxr be denoted PxrL and PxrH, respectively. We later discuss how these lower and upper bounds may be generated in practice.
Ambiguity occurs when PxrL < P*x < PxrH. Let Mxr(B) be the maximum value of the average treatment effect Eγ[U(z, 1, B)|x, r] − Eγ[U(z, 1, A)|x, r] across the state space. With ambiguity, the maximum is positive and occurs when Pxr = PxrH, that is, at the maximum probability of being ill conditional on x and r. Analogously, let Mxr(A) be the maximum value of the reverse average treatment effect Eγ[U(z, 1, A)|x, r] − Eγ[U(z, 1, B)|x, r], which occurs when Pxr = PxrL. Thus,
| [8a] |
| [8b] |
The analysis in ref. 13 shows that, among patients with test result r, the minimax-regret criterion yields
| [9] |
Consider decision 2. The situation is the same except that the relevant probability of illness is Px (and its bounds, PxL and PxH) rather than Pxr. With this modification, the above result in [8a] through [9] applies.
Consider decision 1. The options are to test and not to test. When making the testing decision, one should consider how decisions 2 through 4 will be made. Suppose that piecemeal minimax regret will be used to make decisions 2 through 4. It can be shown that the average effect of testing is then the average effect of treatment compared to surveillance, multiplied by the difference between the probabilities that the patient will be assigned to treatment with testing compared to without testing, minus the cost of testing. See SI Appendix for the derivation.
Applying again the analysis in ref. 13, the fraction of patients allocated to testing is similar to the result in [9]. The fraction is 0 if the maximum “average treatment effect” of testing vs. not testing (i.e., the maximum regret from not testing) is less than 0. It is 1 if the “average treatment effect” of not testing vs. testing (i.e., the maximum regret from testing) is negative. It is a fractional allocation if both values are positive.
Adaptive Diversification.
The above provides a full description of static piecemeal decision making. Finally, as in ref. 13, consider adaptive application of the piecemeal criteria across a sequence of cohorts. Suppose that the distributions of test results and treatment response among persons with covariates x remain stable over time. Suppose as well that observation of patients eventually reveals whether they are ill. Then, complete learning eventually occurs if δS(x) > 0 for some cohort. Randomized testing of patients with covariates x reveals fx, and randomized treatment following testing reveals Pxp and Pxn. Px is revealed directly if δS(x) < 1 and indirectly by [5] if δS(x) = 1.
We caution that complete learning may not occur if observation of patients does not always reveal whether they are ill. For example, patients with the illness in question may self-cure without treatment, or patients may respond to treatment even if they have a different illness—in the case of TB, antibiotics may also cure non-TB bacterial infections. Thus, one may never learn with certainty whether a patient was ill.
Learning occurs most quickly when δS(x) = 1, that is, with universal testing. However, our model shows that universal testing may not be reasonable given the cost of testing. Complete learning does not occur if δS(x) = 0 for all cohorts. In this case, randomized treatment reveals Px. This yields only partial knowledge of (fx, Pxp, Pxn), which must satisfy [5]. To avoid this outcome, which is generally undesirable from a multicohort perspective, one might set δS(x) > 0 for some cohort.
Implications for TB Testing and Treatment
Optimal TB Testing and Treatment.
The model is useful for studying TB, first because it formalizes the conditions under which empirical treatment is optimal. This is important because the status quo diagnostic test in the absence of Xpert—microscopy—has a low positive predictive value (4, 8).
The model makes clear that a clinician should choose treatment B regardless of the test result if the inequality in [3a] does not hold and the inequality in [3b] does hold. In this case, the expected welfare following testing, considering the probabilities of negative and positive results, is no greater than the welfare of assigning the patient without testing to either no treatment or treatment. Without testing, the expected welfare of treatment is higher than that of no treatment.
The conditions for optimality of empirical treatment may hold if a patient’s probability of having TB is high even following a negative test result. For example, given the poor predictive value of microscopy among HIV-positive patients with advanced levels of immunosuppression, a clinician may find it optimal to treat such patients even following a negative test result. Empirical treatment may also be optimal if the probability of having TB after a negative test result is moderate and the welfare cost of untreated illness is high, as may occur with patients in intensive care units.
If a clinician chooses to treat a patient empirically, then a type I error is more likely to occur than in treatment following testing. The costs of type I errors may be substantial. For example, they may prevent correct diagnosis and treatment of another condition (21). Our model takes these costs into account when determining whether empirical treatment is optimal or not.
The model also allows us to formalize the possible effects of introducing Xpert on rates of empirical treatment. Xpert has a higher positive predictive value than microscopy, making condition [3a] more likely to hold. For some patient populations, it may therefore be optimal to switch out of empirical treatment and into testing with Xpert.# For other patient populations, condition [3a] may still not hold. Then empirical treatment may remain optimal.
Several of the trials examining Xpert’s impact on morbidity and mortality found only partial substitution away from empirical treatment when Xpert was introduced (6, 8, 11, 12). The studies did not find conclusive evidence of reduced morbidity and mortality, which one might expect if there was a reduction in the number of type II errors in treatment.‖ A possible reason is that the studies were generally only powered to detect relatively large effects. Another possible reason is if empirical treatment mainly leads to type I rather than type II errors. This may occur if clinicians err on the side of overtreating, in order to reduce the risk of not treating patients who truly have TB. If introduction of Xpert mainly leads to a reduction in type I errors, then this reduces unnecessary treatment for TB. However, this may not translate into significant reductions in morbidity and mortality, unless patients incorrectly treated for TB have other serious conditions and are now more quickly treated under a correct diagnosis.
Model Limitations.
The model has limitations insofar as it relies on certain simplifying assumptions. One is the assumption that patient response is individualistic. TB is an infectious disease, implying that the decision to treat a given patient may have spillovers on the illness status of other future patients and, hence, on future testing and treatment decisions. A clinician should take this into account when making individual testing and treatment decisions, if the effect of treating a given patient on future TB transmission is nonnegligible.
Whether the spillover effect is nonnegligible may depend, among other things, on the prevalence of TB and of risk factors such as HIV/AIDS, the infectivity of the specific strain of TB, and the nature of social networks. A policymaker may take spillovers into account in setting clinical guidelines, even if each individual treatment decision has a negligible effect on transmission. We cannot speculate on the magnitude of the effect, but we think it reasonable to expect that concern for spillovers would increase the public health incentive to treat TB actively. This is a key topic for future research.
When performing future research on spillovers, researchers will have to confront forms of ambiguity beyond those considered in this paper. There exists a large public health literature on optimal treatment of infectious diseases under the assumption that the mechanism of disease transmission is completely known. However, this assumption often is not credible. A vexing problem impeding progress has been the infeasibility of using large-scale randomized trials to learn how transmission varies with treatment policies. Hence, epidemiologists have had to rely on fitting mathematical models to available observational data. In the case of TB, there is an ongoing debate over the extent to which new cases of active TB are caused by recent infection or by progression of latent TB (see ref. 19 and accompanying rapid responses).
We are aware of only 2 studies that address formation of public health policy for treatment of infectious disease with partial knowledge of disease transmission. These studies (22, 23) consider reasonable choice of vaccination policy by a social planner, with specific attention to maximin and minimax-regret policies. Vaccination is somewhat distant from TB testing and treatment. Nevertheless, the general methodology used in these studies—specification of a social welfare function and a set of policy options, characterization of the available knowledge, and derivation of policies that have desirable properties—should provide some general guidance for future research.
We also made the simplifying assumption that the patient’s decision to present for examination is fixed conditional on x—which may include the patient’s symptoms, the distance from the patient’s home to the clinic, and so on—and is not affected by testing and treatment policy. Introduction of a new diagnostic could in principle influence the patient’s decision as to whether to incur the time and cost of presenting for examination at a clinic. As discussed earlier, we think that the magnitude of this effect is likely to be small. In the case of a different policy such as case-finding intervention, one would have to allow for the policy’s substantially increasing the probability of patient presentation within certain patient populations.
Yet another simplification of the model is its assumption that available evidence yields known bounds on the various probabilities needed to optimize decision making. In practice, the sources of ambiguity that have been discussed above are further exacerbated by the ordinary sampling imprecision that occurs when one uses finite-sample data to draw inferences. In principle, statistical decision theory as envisioned by ref. 24 provides a coherent framework for public health planning with sampling imprecision. This theory has been applied to study minimax-regret treatment choice with sample data from randomized trials (see, e.g., refs. 25 and 26). However, application to a problem as complex as testing and treatment of TB is a topic for future research.
Ambiguity in the TB Context.
To optimize, the clinician must know a patient’s risk of illness P(z|x) before performance of a test, the risk P(z|x, r) after observation of a test result, and the probabilities f(r|x) of positive and negative test results. There are several reasons why these parameters are subject to ambiguity in the context of TB.
First, when epidemiological studies estimate prevalence, they typically report P(z|w), where w is a subset of the attributes x that a clinician observes. For example, the WHO reports prevalence by country, HIV status, age, and sex but not by factors such as socioeconomic status or other comorbidities (1). This is convenient for reporting and monitoring purposes, but it means that a clinician will typically face ambiguity over P(z|x). Second, imperfect data quality implies there is often ambiguity in estimates of P(z|w). In the absence of prevalence surveys, estimates are based on notification rates. Underreporting is typically accounted for by expert opinion, or a constant adjustment factor, rather than data or modeling of the patient presentation decision (1, 27).
Third, when trials of diagnostic tests report predictive values, the same issue emerges that they report conditional on a subset of attributes. Thus, these studies reveal f(r|w), rather than f(r|x). Moreover, these studies do not report P(z|r, x) or even P(z|r, w). Instead, they report sensitivity, P(r = p|w, z = 1), and specificity, P(r = n|w, z = 0), as discussed above. Thus, these studies do not provide the clinician with the predictive values that she needs for decision making.
Fourth, there may also be ambiguity in the welfare function. There may be incomplete knowledge of the effectiveness of antibiotic treatment in curing patients who have TB. Again, this can arise if clinical trials to determine the effectiveness of antibiotic treatments condition on w rather than x. There may also be ambiguity about the cost of different errors in treatment. Regarding type I errors, there may be uncertainty as to what will happen to patients if they are treated for TB when they in fact have a different condition. Regarding type II errors, there may be uncertainty as to whether and when a patient will present for examination again, if they are not treated after a first visit.
Illustrative Numerical Exercises.
We perform some illustrative numerical exercises to demonstrate the quantitative importance of some of these sources of ambiguity in TB diagnosis and in understanding the impact (or lack thereof) when a superior diagnostic such as Xpert is introduced. SI Appendix details these exercises in full.
Of the published efficacy and effectiveness trials for Xpert, ref. 8. provides particularly granular and extensive reporting of data. We draw on this study as an example, to highlight the ambiguity that remains even when data and results are reported in a relatively thorough and transparent manner.
First, in SI Appendix we calculate the positive and negative predictive values for Xpert and microscopy using the data in ref. 8. This exercise underlines the potentially misleading nature of reporting sensitivity and specificity alone. Xpert offers a dramatically greater sensitivity compared to microscopy—84.3% compared to 61.2% in the Cape Town clinic which we use as a case study—and particularly so for patients who are HIV-positive—78.3% compared to 41.0%, taken across all clinics since this is not reported by clinic. The gains in predictive value, while still sizeable, are of a much smaller magnitude. The largest difference across tests is in the negative predictive value, which is 96.4% for Xpert compared to 92.4% for microscopy across all patients in the Cape Town clinic, 91.9% for Xpert compared to 80.4% for microscopy for HIV-positive patients across all clinics, and 98.1% for Xpert compared to 92.6% for microscopy for HIV-negative patients across all clinics. These predictive values, not sensitivity and specificity, are the values clinicians should use when making testing and treatment decisions. These more modest differences may thus help explain the muted impact of Xpert on rates of empirical treatment, morbidity, and mortality, especially in clinics where most patients are HIV-negative.
Second, in SI Appendix we illustrate how substantial ambiguity can arise from a seemingly minor lack of granularity in data. A clinician making TB testing and treatment decisions should wish to know the positive and negative predictive value of test results, conditional on a patient’s HIV status, in the context of her clinic and patient population. The data in ref. 8 are reported by clinic and HIV status separately, but not by HIV status conditional on clinic. The positive and negative predictive values conditional on HIV status and clinic needed by the clinician can therefore only be bounded.
Given that the largest improvement offered by Xpert compared to microscopy appears to be in the negative predictive value for HIV-positive patients, we focus on bounding the probability that an HIV-positive patient (w = 1) at the Cape Town (x = CT) clinic has TB conditional on a negative result (r = n). We show that with weak credible assumptions, the bounds on this probability are P(z = 1|x = CT, w = 1, r = n) ∊ [0.036, 0.566] for Xpert and P(z = 1|x = CT, w = 1, r = n) ∊ [0.076, 0.566] for microscopy. Meanwhile, P(z = 1|x = CT, w = 1) ∊ [0.181, 0.566] for an HIV-positive patient at the Cape Town clinic, in the absence of a test.
A negative test result therefore substantially reduces the lower bound on the probability that an HIV-positive patient at the Cape Town clinic has TB, and this effect is larger for an Xpert test compared to a microscopy test. However, without further assumptions or more granular data the probability that such a patient has TB conditional on a negative test result still encompasses large values. A clinician may therefore reasonably treat such a patient even conditional on a negative test result, and hence reasonably perform empirical treatment in anticipation of this, that is, not order a test. Moreover, the fact that trials observe only a partial substitution away from empirical treatment when moving from microscopy to Xpert may be reasonable using the data available to clinicians.
Conclusion
The model we have presented provides an idealized yet helpful characterization of optimal clinical decision making when testing for and treating TB. The model also highlights the role of ambiguity in such decision making. Ambiguity may arise from imperfect data quality and lack of granularity in data reporting, reporting of sensitivity and specificity rather than predictive values, and incomplete knowledge of the welfare function.
The model and numerical exercises may help shed light on the apparent paradox that the recent introduction of a superior TB diagnostic—Xpert—has had little impact on morbidity and mortality. In particular, the model shows how empirical treatment (treatment without testing) may be optimal under full information and may be reasonable under ambiguity.
Under ambiguity, we showed that a reasonable policy is to diversify treatment and testing, that is, randomly assign observationally identical patients to different treatment and testing regimes, in proportions that can be calculated from available data. The piecemeal minimax-regret procedure studied in the paper offers a specific practical way to implement diversification. A public health agency may want to consider this procedure, or another procedure with desirable properties.
As well as having reasonable decision-theoretic properties, an additional benefit of diversification is that it produces learning. Diversification mimics a trial with multiple arms, one for each possible testing and treatment decision. Thus, over time it yields information on the distribution of test results and the risks of illness, with and without testing. Adaptive diversification would update the proportion of patients assigned to each treatment and testing regime as this information became available.
Implementation of diversification may pose practical and ethical challenges. As in a randomized trial, diversification generates equal treatment of patients ex ante, but not ex post. Procedures for obtaining patients’ informed consent to participate in a diversification scheme at the level of a clinic could be based on similar procedures for randomized controlled trials. If diversification were to be implemented on a larger scale, for example at the level of a region or country, the ethical considerations would be like those faced by large-scale policy experiments.
Adaptive diversification aside, the ambiguity currently faced by clinicians could be reduced by making relatively straightforward changes to the ways in which data from trials and prevalence studies are reported. Trials of diagnostic tests could report positive and negative predicted values, rather than focus on sensitivity and specificity. Studies could report more granular data; that is, data conditional on richer covariates. This would allow clinicians to condition their decision making on a richer set of the patient characteristics that they observe.
Data Availability.
The research performed in this article involved no collection of new data, nor did we perform new secondary analysis of previously collected data. The article only performs illustrative numerical exercises that directly use empirical findings reported in ref. 8, which is a published article. The details of the numerical exercises are provided in SI Appendix.
Supplementary Material
Acknowledgments
R.C.’s work on this project was performed while she was a Postdoctoral Fellow at the Institute for Fiscal Studies (IFS) and was funded by the Economic and Social Research Council Centre for the Microeconomic Analysis of Public Policy at the IFS. We thank Kalipso Chalkidou, Michael Gmeiner, Rein Houben, and seminar audiences at the Centre for Microdata Methods and Practice at University College London and the Institute for Policy Research at Northwestern University for valuable comments.
Footnotes
The authors declare no competing interest.
*The impact of introducing a new, improved TB diagnostic test on patients’ decision to present for examination is likely to be marginal. This is because the generic nature of symptoms (persistent cough and fever) means that patients will likely present for examination suspecting a range of possible illnesses. Moreover, if treatment following new diagnostic tests substitutes for empirical treatment, patients may not perceive an increase in the overall probability of receiving treatment.
†In the case of TB treatment, arguably the largest source of patient noncompliance arises from patients not completing the course of antibiotics. It is not clear what effect, if any, improved diagnostic tests would have on compliance.
‡This assumption is a simplification given that TB is an infectious disease. The assumption is least problematic when considering testing and treatment of isolated patients, most so when considering broad public-health efforts to reduce the prevalence of TB. We discuss the implications of relaxing this assumption later.
§For simplicity we consider just 2 treatments, antibiotics versus observation only, and 2 illness states, TB versus no TB. The model can be extended to include testing for and treating MDR TB alongside regular TB, as well as testing for and treating HIV alongside TB. These extensions may be accomplished with further notation.
¶In the context of TB, the raw measurements from both microscopy and Xpert are continuous. However, the standard practice in the research literature and in clinical practice has been to set a threshold and binarize the outcome. That is, one views measurements above the threshold as a positive test result and measurements below as negative.
#Nevertheless, clinicians may fail to update their behavior, at least in the short run. The extent to which clinicians’ behavior is characterized by biases is an important consideration for descriptive modeling, but it is outside the scope of our prescriptive model.
‖As outlined earlier, one might also expect a reduction in morbidity and mortality if introduction of Xpert leads to more rapid diagnosis and hence correct treatment of MDR TB. However, most of the studies cited are conducted in sites where prevalence of MDR TB is relatively low.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1912091116/-/DCSupplemental.
References
- 1.World Health Organisation , Global tuberculosis report 2018. https://www.who.int/tb/publications/global_report/en/. Accessed 21 May 2019.
- 2.World Health Organisation , The End TB strategy. (2015). https://www.who.int/tb/End_TB_brochure.pdf?ua=1. Accessed 21 May 2019.
- 3.World Health Organisation , WHO endorses new rapid tuberculosis test. (2010). https://www.who.int/mediacentre/news/releases/2010/tb_test_20101208/en/. Accessed 21 May 2019.
- 4.Boehme C. C., et al. , Rapid molecular detection of tuberculosis and rifampin resistance. N. Engl. J. Med. 363, 1005–1015 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Boehme C. C., et al. , Feasibility, diagnostic accuracy, and effectiveness of decentralised use of the Xpert MTB/RIF test for diagnosis of tuberculosis and multidrug resistance: A multicentre implementation study. Lancet 377, 1495–1505 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Yoon C., et al. , Impact of Xpert MTB/RIF testing on tuberculosis management and outcomes in hospitalized patients in Uganda. PLoS One 7, e48599 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Hanrahan C. F., et al. , Time to treatment and patient outcomes among TB suspects screened by a single point-of-care xpert MTB/RIF at a primary care clinic in Johannesburg, South Africa. PLoS One 8, e65421 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Theron G., et al. ; TB-NEAT Team , Feasibility, accuracy, and clinical effect of point-of-care Xpert MTB/RIF testing for tuberculosis in primary-care settings in Africa: A multicentre, randomised, controlled trial. Lancet 383, 424–435 (2014). [DOI] [PubMed] [Google Scholar]
- 9.Cox H. S., et al. , Impact of xpert MTB/RIF for TB diagnosis in a primary care clinic with high TB and HIV prevalence in South Africa: A pragmatic randomised trial. PLoS Med. 11, e1001760 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Mupfumi L., et al. , Impact of Xpert MTB/RIF on antiretroviral therapy-associated tuberculosis and mortality: A pragmatic randomized controlled trial. Open Forum Infect. Dis. 1, ofu038 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Calligaro G. L., et al. , Burden of tuberculosis in intensive care units in Cape Town, South Africa, and assessment of the accuracy and effect on patient outcomes of the Xpert MTB/RIF test on tracheal aspirate samples for diagnosis of pulmonary tuberculosis: A prospective burden of disease study with a nested randomised controlled trial. Lancet Respir. Med. 3, 621–630 (2015). [DOI] [PubMed] [Google Scholar]
- 12.Churchyard G. J., et al. , Xpert MTB/RIF versus sputum microscopy as the initial diagnostic test for tuberculosis: A cluster-randomised trial embedded in South African roll-out of Xpert MTB/RIF. Lancet Glob. Health 3, e450–e457 (2015). [DOI] [PubMed] [Google Scholar]
- 13.Manski C. F., The 2009 Lawrence R. Klein Lecture: Diversified treatment under ambiguity. Int. Econ. Rev. 50, 1013–1041 (2009). [Google Scholar]
- 14.Manski C. F., Diagnostic testing and treatment under ambiguity: Using decision analysis to inform clinical practice. Proc. Natl. Acad. Sci. U.S.A. 110, 2064–2069 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Manski C. F., Credible ecological inference for medical decisions with personalized risk assessment. Quant. Econom. 9, 541–569 (2018). [Google Scholar]
- 16.Van Rie A., Should countries implement Xpert MTB/RIF when empirical treatment precludes a clinical effect? Lancet Respir. Med. 3, 591–593 (2015). [DOI] [PubMed] [Google Scholar]
- 17.Auld A. F., Fielding K. L., Gupta-Wright A., Lawn S. D., Xpert MTB/RIF–Why the lack of morbidity and mortality impact in intervention trials? Trans. R. Soc. Trop. Med. Hyg. 110, 432–444 (2016). [DOI] [PubMed] [Google Scholar]
- 18.Ellsberg D., Risk, ambiguity, and the Savage axioms. Q. J. Econ. 75, 643–669 (1961). [Google Scholar]
- 19.Behr M. A., Edelstein P. H., Ramakrishnan L., Revisiting the timetable of tuberculosis. BMJ 362, k2738 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Altman D. G., Bland J. M., Statistics notes: Diagnostic tests 2: Predictive values. BMJ 309, 102 (1994). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Houben R. M. G. J., et al. , What if they don’t have tuberculosis? The consequences and trade-offs involved in false-positive diagnoses of tuberculosis. Clin. Infect. Dis. 68, 150–156 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Manski C. F., Vaccination with partial knowledge of external effectiveness. Proc. Natl. Acad. Sci. U.S.A. 107, 3953–3960 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Manski C. F., Mandating vaccination with unknown indirect effects. J. Public Econ. Theory 19, 603–619 (2017). [Google Scholar]
- 24.Wald A., Statistical Decision Functions (Wiley, New York, 1950). [Google Scholar]
- 25.Manski C. F., Statistical treatment rules for heterogeneous populations. Econometrica 72, 1221–1246 (2004). [Google Scholar]
- 26.Kitagawa T., Tetenov A., Who should be treated? Empirical welfare maximization methods for treatment choice. Econometrica 86, 591–616 (2018). [Google Scholar]
- 27.Glaziou P., Sismanidis C., Zignol M., Floyd K., “Methods used by WHO to estimate the global burden of TB disease” (WHO, Geneva, Switzerland, 2018), https://www.who.int/tb/publications/global_report/gtbr2018_online_technical_appendix_global_disease_burden_estimation.pdf. Accessed 21 May 2019.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The research performed in this article involved no collection of new data, nor did we perform new secondary analysis of previously collected data. The article only performs illustrative numerical exercises that directly use empirical findings reported in ref. 8, which is a published article. The details of the numerical exercises are provided in SI Appendix.
