Assessing the net benefit of machine learning models in the presence of resource constraints

Karandeep Singh; Nigam H Shah; Andrew J Vickers

doi:10.1093/jamia/ocad006

. 2023 Feb 22;30(4):668–673. doi: 10.1093/jamia/ocad006

Assessing the net benefit of machine learning models in the presence of resource constraints

Karandeep Singh ^1,^2,^3,^4,^✉, Nigam H Shah ⁵, Andrew J Vickers ⁶

PMCID: PMC10018264 PMID: 36810659

Abstract

Objective

The objective of this study is to provide a method to calculate model performance measures in the presence of resource constraints, with a focus on net benefit (NB).

Materials and Methods

To quantify a model’s clinical utility, the Equator Network’s TRIPOD guidelines recommend the calculation of the NB, which reflects whether the benefits conferred by intervening on true positives outweigh the harms conferred by intervening on false positives. We refer to the NB achievable in the presence of resource constraints as the realized net benefit (RNB), and provide formulae for calculating the RNB.

Results

Using 4 case studies, we demonstrate the degree to which an absolute constraint (eg, only 3 available intensive care unit [ICU] beds) diminishes the RNB of a hypothetical ICU admission model. We show how the introduction of a relative constraint (eg, surgical beds that can be converted to ICU beds for very high-risk patients) allows us to recoup some of the RNB but with a higher penalty for false positives.

Discussion

RNB can be calculated in silico before the model’s output is used to guide care. Accounting for the constraint changes the optimal strategy for ICU bed allocation.

Conclusions

This study provides a method to account for resource constraints when planning model-based interventions, either to avoid implementations where constraints are expected to play a larger role or to design more creative solutions (eg, converted ICU beds) to overcome absolute constraints when possible.

Keywords: machine learning, net benefit, resource constraints

BACKGROUND AND SIGNIFICANCE

Improved access to large healthcare datasets paired with technical advances in algorithms has opened a Pandora’s box of potential use cases for machine learning (ML) in health care. While there have been reports of improved clinical outcomes using model-based interventions, there are many counterexamples where implementing ML models has not translated into clinical gains,^1–3 a phenomenon that has been referred to as the “artificial intelligence (AI) chasm.”⁴ Model implementation failures can be attributed to poor-quality models⁵^,⁶ or ineffective interventions,⁷ but many models fail because of real-world constraints that limit the delivery of the intervention despite a useful model.

Determining whether a model is useful is challenging because the harms caused by false positives and false negatives are often unequal. For example, in cancer screening with mammography, a false positive may result in an unnecessary biopsy and anxiety about a potential cancer diagnosis, while a false negative may lead to cancer progression from a missed diagnosis. While neither is desirable, cancer progression is clearly more harmful.⁸ Quantifying this tradeoff can help determine whether the model is useful. Decision theory has a rich history of directly linking decisions to their risks and benefits through the lens of probability and utility. Recent decision theory work has grappled with the issue of eliciting utilities, which are critical to determining thresholds for actions. In health care, utilities are commonly elicited using the “standard gamble,” in which people are presented with 2 alternative health outcomes and asked to select the set of probabilities for each outcome (p and 1 − p, respectively) at which they would be indifferent to choosing between the 2 (see Supplementary Notes on Decision Theory and the Selection of a Threshold Probability for additional details).⁹

Net benefit (NB) is a quantity that measures whether use of a model to help decision-making results in more good than harm and thus may be useful.¹⁰ While underutilized in the ML literature, it is extremely widely used in biostatistical practice and has been recommended by the EQUATOR Network’s TRIPOD reporting guidelines¹¹ and by the British Medical Journal,¹² the Journal of the American Medical Association,¹³Annals of Internal Medicine,¹⁴ and the Journal of Clinical Oncology.¹⁵ It has also been specifically recommended for the evaluation of ML models.¹⁶^,¹⁷ When a threshold probability (Pt) is used to determine whether a patient is a candidate for an intervention (ie, intervene only on patients with predicted risk ≥ Pt), then the NB calculates whether the benefits conferred by intervening on true positives outweigh the harms conferred by intervening on false positives (Figure 1). Using decision theory, the optimal threshold Pt can be derived from the cost/benefit ratio for intervening on false positives (ie, the optimal Pt is 0.20 when the cost/benefit ratio is 0.2/[1–0.2], or 0.25).¹⁰ Different cost/benefit ratios can thus result in different values of NB even when the underlying model is the same. For example, the consequences of wrongly conducting a surgery on a patient are much more severe than wrongly starting a medication that has a low risk of toxicity.

Figure 1. — Formula for calculating net benefit (NB) based on the number of true positives (TP), false positives (FP), sample size (N), and the threshold probability (Pt). Source: Karandeep Singh.

Models with a positive NB can also be ineffective when the intervention cannot be delivered due to resource constraints, even if the model is equally accurate among high- and low-resource settings.¹⁸ An obvious example would be if a model recommended that a patient be transported to the intensive care unit (ICU) but this did not happen because there was no ICU bed available.

The manner in which ML models are currently evaluated in the literature ignores this resource constraint issue altogether. We demonstrate the importance of this issue using an ICU referral algorithm as a case study and propose a method of quantifying NB in the presence of resource constraints, a quantity we refer to as the realized net benefit (RNB). Recognizing that resource constraints may not always be absolute, we also extend this calculation to situations where there are relative constraints, for example, when a non-ICU bed could be converted into an ICU bed, but this would be costly. We share code to calculate NB and RNB in the modelrecon R package.¹⁹

CASE STUDY 1: A MODEL TO PREDICT NEED FOR INTENSIVE CARE

Take the problem of whether patients being admitted from the emergency department with respiratory infections should receive ICU-level care. Imagine that, at a given hospital, the current process involves multiple layers of clinical review so is slow and burdensome, which has direct implications for the clinical outcome. To streamline the admissions process and provide more timely care for patients in need of the ICU, an ML model is developed. The model is trained using retrospective data and validated prospectively. The outcome is considered “positive” if a patient with a respiratory infection was directly admitted from the emergency department to the ICU, and otherwise “negative.” The threshold probability (Pt) is selected through a stakeholder engagement process that ultimately requires hospital leaders to answer the following question: “If we had a very large ICU, how many patients with respiratory infections meeting ICU-transfer criteria would the hospital be willing to admit directly to the ICU for every 1 patient who actually requires ICU-level care?” After weighing the benefits of timely ICU care for patients who need it with the harms of incorrectly transferring patients to the ICU, the committee decides that they would be willing to admit 5 patients with respiratory infections for every 1 patient who truly needs ICU-level care. Based on this, they select a threshold probability of 1/5, or 0.20. They rationalize that patients wrongly admitted to the ICU could always be transferred out of the ICU, whereas the reverse could cause harm by delaying ICU care.

Recognizing the challenges to implementing an ML model in a hospital, alternative ICU-transfer criteria are proposed as a back-up plan. Specifically, any patients ≥75 years old or requiring at least 4 L per minute of oxygen will be directly admitted to the ICU. Developed based on clinical knowledge, these criteria are easily acted upon and do not require extensive implementation within the EHR. If the model is successfully implemented, it would need to be at least as useful as these ICU-transfer criteria.

A representative example of 10 prospective patients with their predicted probabilities and actual outcomes is shown in Table 1, along with whether or not they meet the proposed ICU-transfer criteria.

Table 1.

Case study showing 10 representative patients with respiratory infections from prospective validation, sorted by the model’s predicted risk from high to low

Patient #	Actually required ICU	Based on model		Based on proposed ICU-transfer criteria
Patient #	Actually required ICU	Predicted risk (sorted from high to low)	Predicted to require ICU (based on Pt = 0.2)	Meets ICU-transfer criteria (age ≥75 or ≥4 L of oxygen per nasal cannula
1	Yes	0.8	Yes	Yes
2	Yes	0.7	Yes	Yes
3	No	0.6	Yes	Yes
4	Yes	0.5	Yes	Yes
5	Yes	0.3	Yes	Yes
6	No	0.2	Yes	No
7	No	0.1	No	Yes
8	No	0.1	No	No
9	Yes	0.05	No	Yes
10	No	0.01	No	Yes

Open in a new tab

Abbreviation: ICU: intensive care unit.

Calculating the NB of the model

Using this case study, the NB of the model can be calculated using the formula from Figure 1.¹² By multiplying both sides of the formula by N (the sample size, which in our example is 10), we will first calculate NB × N, and then will divide this quantity by N to derive NB.

\begin{matrix} N B \times N = T P - F P \times P t / (1 - P t) = 4 - 2 \times (0.2 / 0.8) = 4 - 0.5 = 3.5 \\ Then, N B = (N B \times N) / N = 3.5 / 10 = 0.35 \end{matrix}

To put this number in perspective, we can compare this against a hypothetical perfect model (for didactic purposes) and against the alternative strategy of a simple clinical algorithm. The maximal NB achievable by a model is the percentage of patients who actually require ICU-level care, which in this example is 0.50. This is because a perfect model would identify all of the true positives and no false positives. In the absence of false positives, NB × N equals the number of true positives, and dividing this by N to solve for NB equals the proportion of patients who experienced the outcome (ie, TP/N). Unsurprisingly, the NB achieved by the model (0.35) is lower than that of a perfect model (0.50). The fact that the NB is positive means that using the model to make determinations about ICU-level care would result in more good than harm when considering our specified cost/benefit ratio. However, to understand whether this model is worth implementing, something that would take time, effort and money, we also need to compare this NB against that of a simplified clinical algorithm.

Calculating the NB of the proposed ICU-transfer criteria

While the hospital leadership recognizes that not all patients ≥75 years old or requiring at least 4 L per minute of oxygen will require ICU-level care, this strategy is clinically interpretable and easy to implement. We can calculate the NB of this alternative strategy using the formula from Figure 1. As in the prior example, we will first calculate NB × N and then will divide this quantity by N to derive NB. For the proposed ICU-transfer criteria strategy, we will assume that all patients actually requiring the ICU (in Table 2) met the proposed ICU-transfer criteria.

Table 2.

Case study showing 10 representative patients with respiratory infections from prospective validation, with an absolute constraint of 3 ICU beds

Patient #	Actually required ICU	Predicted risk (sorted from high to low)	Predicted to require ICU (based on Pt = 0.2 and capacity = 3)
1	Yes	0.8	Yes
2	Yes	0.7	Yes
3	No	0.6	Yes
4	Yes	0.5	No
5	Yes	0.3	No
6	No	0.2	No
7	No	0.1	No
8	No	0.1	No
9	Yes	0.05	No
10	No	0.01	No

Open in a new tab

Note: Patients in bold were “Yes” in Table 1 but are now a “No” in Table 2 because of the resource constraint.

Abbreviation: ICU: intensive care unit.

\begin{matrix} N B \times N = T P - F P \times P t / (1 - P t) = 5 - 3 \times (0.2 / 0.8) = 5 - 0.75 = 4.25 \\ Then, N B = (N B \times N) / N = 4.25 / 10 = 0.425 \end{matrix}

Hence, the proposed simplified ICU-transfer criteria strategy achieves a higher NB than the model (0.425 vs 0.35) and is the superior approach.

CASE STUDY 2: AN ABSOLUTE CONSTRAINT, WHERE ONLY 3 ICU BEDS ARE AVAILABLE

While the decision to set the threshold probability of 0.2 was based on weighing the benefits of intervening on true positives against the harms of intervening on false positives, what was not considered was that the ICU has limited capacity. In the current case involving 10 patients in the emergency department, only 3 ICU beds are available. Thus, even though 6 patients had a predicted risk ≥0.20, only 3 would be able to receive the intervention. This situation represents an absolute constraint because it cannot be overcome. Assuming that all 10 patients’ scores were available to us, we would naturally reserve the ICU bed for the patients with the 3 highest predicted risk scores. Given this absolute resource constraint, we are no longer able to intervene on Patients 4–6 even though their predicted risk is ≥0.2 (Table 2).

Calculating the RNB of the model in the presence of an absolute resource constraint

We refer to the NB that is achievable in the presence of resource constraints as the RNB, where “realized” denotes the fact that a resource constraint may prevent the entire net benefit from being achieved. Similar to our calculation of NB, we will first calculate RNB × N and will derive RNB by dividing this quantity by N. The algorithm to calculate RNB is depicted in Table 3. Because some patients’ predictions will be reclassified from positive to negative in the presence of the resource constraint, we will use the notation TPc and FPc to refer to the constrained true and false positives and distinguish them from the true and false positives that occur when there is no constraint.

Table 3.

Algorithm to calculate realized net benefit with an absolute resource constraint

1. Sort all predictions by predicted risk (highest to lowest)

2. Count predictions as positive if they are ≥ Pt up to maximum capacity, with all subsequent predictions considered negative

3. Count true positives (TPc) and false positives (FPc) using the predictions from 2

Open in a new tab

Assuming Pt = 0.2 and the capacity = 3 (as depicted in Table 2):

\begin{matrix} R N B \times N = TPc - FPc \times P t / (1 - P t) = 2 - 1 \times (0.2 / 0.8) = 2 - 0.25 = 1.75 \\ Then, RNB = 1.75 / 10 = 0.175 \end{matrix}

While the RNB remains positive, it is half of the (unconstrained) NB of 0.35, illustrating the degree to which a resource constraint can degrade a model’s real-world utility. To determine whether we should implement the model, we also need to compare the RNB for the model against the proposed alternative strategy.

Calculating the RNB of the proposed ICU-transfer criteria in the presence of an absolute resource constraint

Eight of the 10 patients with respiratory infections meet the hospital’s ICU-transfer criteria but there are only 3 available ICU beds. Because the criteria are binary (either patients meet them or they do not), there is no way to prioritize which 3 patients (of the 8 who qualify) should be transferred to the ICU. We can assume that the selection would not be informative as to risk—for instance, the first to present—among these 8 patients. To calculate the RNB of the ICU-transfer criteria strategy, we can multiply the NB without the resource constraint (0.425) by the fraction of patients who can be transferred with the constraint (3/8, or 0.375) which results in an RNB of 0.159.

While the proposed ICU-transfer criteria appeared superior to the model-based strategy in the absence of resource constraints, we find that the model-based strategy achieves a higher NB (0.175 vs 0.159) when we account for the 3 ICU beds available for every 10 patients. While we would not make a decision to implement a model based on only 10 patients, this case illustrates how accounting for a resource constraint can change the optimal allocation strategy.

CASE STUDY 3: A RELATIVE CONSTRAINT, WHERE OTHER BEDS CAN BE CONVERTED TO ICU BEDS IF NEEDED

The hospital recently completed building a new surgical patient tower. While the new tower does not include any dedicated ICU beds, the beds in this tower can be converted to ICU beds, which is a feature of many modern hospitals. Although the nurses and physicians in this new tower are trained to handle ICU-level care, intensive care patients require nurses to care for fewer patients, which can limit the number of patients who can be admitted to the new tower. To account for this burden of converting non-ICU beds into ICU beds, the hospital leadership decides that these beds will only be converted if a patient’s risk of ICU-level care is very high (eg, 50%) and only after the predesignated ICU beds are filled. For the sake of demonstration, we will assume that there is no limit to the number of beds that can be converted to ICU beds (ie, the constraint is relative), but this scenario could also be handled with an absolute constraint. Given the lifting of an absolute resource constraint through the addition of a relative constraint, we are now able to transfer patient 4 to a converted ICU bed because their risk is ≥0.5 (Table 4).

Table 4.

Case study showing 10 representative patients with respiratory infections from prospective validation, with both an absolute and a relative resource constraint

Patient #	Actually required ICU	Predicted risk (sorted from high to low)	Predicted to require ICU
Patient #	Actually required ICU	Predicted risk (sorted from high to low)	Absolute constraint (Pt1 = 0.2, capacity1 = 3)	Relative constraint (Pt2 = 0.5, capacity2 = infinity)
1	Yes	0.8	Yes	–
2	Yes	0.7	Yes	–
3	No	0.6	Yes	–
4	Yes	0.5	–	Yes
5	Yes	0.3	–	No
6	No	0.2	–	No
7	No	0.1	–	No
8	No	0.1	–	No
9	Yes	0.05	–	No
10	No	0.01	–	No

Open in a new tab

Note: Patients in bold were “No” in Table 2 but are now a “Yes” in Table 4 because of the addition of a relative constraint.

Abbreviation: ICU: intensive care unit.

With a relative resource constraint, we cannot directly apply the proposed ICU-transfer criteria because they do not produce a probability. While a more stringent set of criteria could be used to capture a higher risk group (eg, age ≥85 years or requiring ≥6 L per minute of oxygen) for which the absolute constraint could be lifted, we will not calculate the RNB for this alternative strategy in this article.

Calculating the RNB in the presence of a relative resource constraint

In this case study, the absolute constraint is partially lifted through the addition of a relative constraint. The absolute constraint is that the 3 patients with the highest predicted risk ≥0.2 will be directly admitted to the ICU. The relative constraint is that after these 3 patients, any remaining patients with a predicted risk ≥0.5 will be transported to newly converted ICU beds in the surgical tower. The algorithm to calculate RNB with a relative constraint is depicted in Supplementary Table S1. Because there are 2 separate cost/benefit ratios at play depending on whether the initial constraint has been met (ie, Pt1 of 0.2 and Pt2 of 0.5), separate RNB values need to be calculated for each set of constraints. While only 2 constraints are shown, the algorithm in Supplementary Table S1 could be expanded to any number of relative constraints.

The number of patients who are evaluated up until the absolute constraint is met will be considered N1, and the remaining patients will be considered N2, where N1 + N2 = N, where N is the sample size (10 in this case).

Assuming Pt1 = 0.2, capacity1 = 3, Pt2 = 0.5, and capacity2 = infinity (ie, there is no upper limit):

\begin{matrix} R N B 1 \times N 1 = TPc 1 - FPc 1 \times P t 1 / (1 - P t 1) = 2 - 1 \\ \times (0.2 / 0.8) = 2 - 0.25 = 1.75 \\ R N B 2 \times N 2 = TPc 2 - FPc 2 \times P t 2 / (1 - P t 2) = 1 - 0 \\ \times (0.5 / 0.5) = 1 - 0 = 1 \end{matrix}

\begin{matrix} Because RNB \times N = RNB 1 \times N 1 + RNB 2 \times N 2, \\ R N B \times N = 1.75 + 1 = 2.75 \\ Then, RNB = 2.75 / 10 = 0.275 \end{matrix}

CASE STUDY 4: USING DECISION CURVES TO VISUALIZE RNB OVER A RANGE OF THRESHOLD PROBABILITIES

In the first case study, hospital leaders agreed to use a threshold probability of 0.20 to transfer patients to the ICU. In reality, there will not always be agreement for what is an acceptable threshold probability for a given clinical decision. When there is a lack of consensus regarding appropriate threshold probabilities, the NB can be calculated over a range of threshold probabilities to examine the extent to which the threshold probability affects NB. The range chosen is not from 1 to 99%, but based on what is considered reasonable differences of opinion and preference for the decision under consideration. The plot of threshold probability against NB is referred to as a decision curve and forms the basis of decision curve analysis. In a decision curve analysis, the NB of 3 strategies are typically compared visually across a range of threshold probabilities: a model-based strategy, a treat-none strategy (eg, transfer no patients to the ICU), and a treat-all strategy (eg, transfer all patients to the ICU). In this case study, we will focus only on the model-based strategy.

Drawing from the predictions and outcomes in Table 1, the results from the first 3 case studies can be visualized using a decision curve. Figure 2 demonstrates the decision curves generated by varying the threshold probabilities (solid line) across different capacities (in the different panels) and shows the result of adding a relative constraint at a threshold of 0.50 (dashed line) to relax the absolute constraint. The dashed line is only shown between thresholds of 0 to 0.5 because it would not make sense to have an absolute constraint that has a more strict criterion (ie, requires a higher threshold) than a relative constraint.

When the capacity is infinite, as in the solid line depicts in the bottom-right panel of Figure 2, the decision curve for the model is identical to that without resource constraints (as in the first case study). On the other extreme, when there is zero capacity to act on the model’s predictions, as in the solid line in the top-left panel, the decision curve for the model is identical to that of a treat-none strategy where no patient is transferred to the ICU. As the capacity gradually increases across the panels, the NB depicted across the curves generally increases. The addition of a relative constraint to relax the absolute constraint (dashed line) also generally increases the RNB as it did in our third case study. The points depicted on the plot correspond to results from the first 3 case studies.

Note that the figure has some artifacts associated with using a small data set for didactic purposes, for instance, there is an increase in NB between 0.6 and 0.7 because of the removal of a false positive (Patient 3 from Table 1). In a large sample, NB will be monotonically nonincreasing as the threshold probability increases.

DISCUSSION

In this article, we demonstrate how ML models can lose much of their NB when implemented in a setting with resource constraints. In our case studies, we found that our ML model’s NB was halved (from 0.35 to 0.175) given the limited number of available ICU beds. Accounting for the resource constraint also changed the optimal allocation strategy. Without the constraint, using the ICU-transfer criteria achieved the highest NB (0.425 vs 0.35). With the constraint, the model-based strategy achieved the highest RNB (0.175 vs 0.159). The RNB was higher when allowing for a relative constraint (0.275), which relaxed the absolute constraint. This relative constraint has a higher penalty for false positives due to the higher burden of converting surgical beds into ICU beds. We provide a method to determine the extent to which a series of constraints affects a model’s NB. This quantity, the RNB, can be calculated in silico before the model’s output is used to guide care. This means that it can be used for planning purposes, either to avoid implementations where constraints are expected to play a larger role or to design more creative solutions (eg, converted ICU beds) to overcome absolute constraints when possible.

In the absence of the RNB, the current approach for handling too many positive predictions (ie, more than a system is equipped to handle) is to raise the threshold probability. While this superficially achieves a similar goal, doing so creates 3 problems. First, it is not clear how high to raise the threshold to avoid the resource constraint, which is even more complex in the presence of multiple resource constraints. Second, the resource constraint may change from day to day, meaning that a relatively high-risk patient might not receive appropriate care on a day with fewer available resources. Third, the optimal threshold probability cannot be selected from the data and should instead reflect the clinical context.²⁰ By delinking the selection of a threshold probability from the impact of resource constraints, the RNB allows us to understand the degree to which the presence of a resource constraint will affect the utility of a model for a given clinical context.

Our approach does have some caveats that need to be considered. Just as the intent of calculating the NB of a model is to avoid implementing models that may do more harm than good, the intent of calculating an RNB is to avoid implementing models where superior strategies exist. From a systems perspective, the gap between the NB and the RNB can be thought of as an indicator for where additional resources may be necessary to achieve benefit from model-guided care. Recent advances in digital interventions such as telehealth, virtual care, and care at home may provide a path toward closing this gap.

Resource constraints also have other implications that we did not address in this article but require further study (see Supplementary Notes on Future Directions).

CONCLUSION

Many of the decisions guided by the over 250 000 existing risk-stratification scores²¹ may be affected by resource constraints. We need to know which model predictions are worth acting upon when taking into account real-world model performance, clinical context, and resource constraints. The promise of personalized medicine hinges on accurate risk stratification to help physicians and patients make the best possible decisions.²²^,²³ An honest evaluation of the usefulness of model-guided decisions is a necessary step to realize that promise.

Supplementary Material

ocad006_Supplementary_Data

Click here for additional data file.^{(23.8KB, docx)}

Contributor Information

Karandeep Singh, Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan, USA; Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, Michigan, USA; Department of Urology, University of Michigan Medical School, Ann Arbor, Michigan, USA; School of Information, University of Michigan, Ann Arbor, Michigan, USA.

Nigam H Shah, Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA.

Andrew J Vickers, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York, USA.

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Informatics Association online.

FUNDING

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

AUTHOR CONTRIBUTIONS

KS conceived this study, analyzed the data, drafted the manuscript, gave final approval to the submitted version, and agrees to be accountable for all aspects of the study. NHS conceived this study, made substantive revisions to the manuscript, gave final approval to the submitted version, and agrees to be accountable for all aspects of the study. AJV conceived this study, made substantive revisions to the manuscript, gave final approval to the submitted version, and agrees to be accountable for all aspects of the study.

CONFLICT OF INTEREST STATEMENT

KS serves on a scientific advisory board for Flatiron Health. KS’s institution receives funding from the National Institute of Diabetes and Digestive and Kidney Diseases, Blue Cross Blue Shield of Michigan, and Teva Pharmaceuticals for unrelated work. The remaining authors have no competing interests to declare.

DATA AVAILABILITY

The predictions and outcomes from this example dataset can also be found in the modelrecon R package.¹⁹

REFERENCES

1. Obermeyer Z, Weinstein JN.. Adoption of artificial intelligence and machine learning is increasing, but irrational exuberance remains. NEJM Catal 1. doi: 10.1056/CAT.19.1090. [DOI] [Google Scholar]
2. Saria S, Butte A, Sheikh A.. Better medicine through machine learning: what’s real, and what's artificial? PLoS Med 2018; 15 (12): e1002721. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Emanuel EJ, Wachter RM.. Artificial intelligence in health care: will the value match the hype? JAMA 2019; 321 (23): 2281–2282. [DOI] [PubMed] [Google Scholar]
4. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019; 25 (1): 44–56. [DOI] [PubMed] [Google Scholar]
5. Gulati G, Upshaw J, Wessler BS, et al. Generalizability of cardiovascular disease clinical prediction models: 158 independent external validations of 104 unique models. Circ Cardiovasc Qual Outcomes 2022; 15 (4): e008487. [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Wong A, Otles E, Donnelly JP, et al. External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients. JAMA Intern Med 2021; 181 (8): 1065–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Post B, Lapedis J, Singh K, et al. Predictive model-driven hotspotting to decrease emergency department visits: a randomized controlled trial. J Gen Intern Med 2021; 36 (9): 2563–2570. [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Sicsic J, Pelletier-Fleury N, Moumjid N.. Women’s benefits and harms trade-offs in breast cancer screening: results from a discrete-choice experiment. Value Health 2018; 21 (1): 78–88. [DOI] [PubMed] [Google Scholar]
9. Gafni A. The standard gamble method: what is being measured and how it is interpreted. Health Serv Res 1994; 29 (2): 207–224. [PMC free article] [PubMed] [Google Scholar]
10. Vickers AJ, Elkin EB.. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making 2006; 26 (6): 565–574. [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Collins GS, Reitsma JB, Altman DG, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ 2015; 350: g7594. [DOI] [PubMed] [Google Scholar]
12. Vickers AJ, Van Calster B, Steyerberg EW.. Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests. BMJ 2016; 352: i6. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Fitzgerald M, Saville BR, Lewis RJ.. Decision curve analysis. JAMA 2015; 313 (4): 409–410. [DOI] [PubMed] [Google Scholar]
14. Localio AR, Goodman S.. Beyond the usual prediction accuracy metrics: reporting results for clinical decision making. Ann Intern Med 2012; 157 (4): 294–295. [DOI] [PubMed] [Google Scholar]
15. Kerr KF, Brown MD, Zhu K, et al. Assessing the clinical impact of risk prediction models with decision curves: guidance for correct interpretation and appropriate use. J Clin Oncol 2016; 34 (21): 2534–2540. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Vollmer S, Mateen BA, Bohner G, et al. Machine learning and artificial intelligence research for patient benefit: 20 critical questions on transparency, replicability, ethics, and effectiveness. BMJ 2020; 368: l6927. [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Christodoulou E, Ma J, Collins GS,et al. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol 2019;110:12–22. [DOI] [PubMed] [Google Scholar]
18. Jung K, Kashyap S, Avati A, et al. A framework for making predictive models useful in practice. J Am Med Inform Assoc 2021; 28 (6): 1149–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.modelrecon: Assessing Prediction Models in the Presence of Resource Constraints. Github. https://github.com/ML4LHS/modelrecon. Accessed October 31, 2022
20. Wynants L, van Smeden M, McLernon DJ. et al. ; Topic Group ‘Evaluating diagnostic tests and prediction models’ of the STRATOS initiative. Three myths about risk thresholds for prediction models. BMC Med 2019; 17 (1): 192. [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Challener DW, Prokop LJ, Abu-Saleh O.. The proliferation of reports on clinical scoring systems: issues about uptake and clinical utility. JAMA 2019;321:2405–2406. [DOI] [PubMed] [Google Scholar]
22. Overby CL, Tarczy-Hornoch P.. Personalized medicine: challenges and opportunities for translational bioinformatics. Per Med 2013; 10 (5): 453–462. [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Shah NH, Milstein A, Bagley PhD SC.. Making machine learning models clinically useful. JAMA 2019; 322 (14): 1351–1352. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ocad006_Supplementary_Data

Click here for additional data file.^{(23.8KB, docx)}

Data Availability Statement

The predictions and outcomes from this example dataset can also be found in the modelrecon R package.¹⁹

[ocad006-B1] 1. Obermeyer Z, Weinstein JN.. Adoption of artificial intelligence and machine learning is increasing, but irrational exuberance remains. NEJM Catal 1. doi: 10.1056/CAT.19.1090. [DOI] [Google Scholar]

[ocad006-B2] 2. Saria S, Butte A, Sheikh A.. Better medicine through machine learning: what’s real, and what's artificial? PLoS Med 2018; 15 (12): e1002721. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocad006-B3] 3. Emanuel EJ, Wachter RM.. Artificial intelligence in health care: will the value match the hype? JAMA 2019; 321 (23): 2281–2282. [DOI] [PubMed] [Google Scholar]

[ocad006-B4] 4. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019; 25 (1): 44–56. [DOI] [PubMed] [Google Scholar]

[ocad006-B5] 5. Gulati G, Upshaw J, Wessler BS, et al. Generalizability of cardiovascular disease clinical prediction models: 158 independent external validations of 104 unique models. Circ Cardiovasc Qual Outcomes 2022; 15 (4): e008487. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocad006-B6] 6. Wong A, Otles E, Donnelly JP, et al. External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients. JAMA Intern Med 2021; 181 (8): 1065–70. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocad006-B7] 7. Post B, Lapedis J, Singh K, et al. Predictive model-driven hotspotting to decrease emergency department visits: a randomized controlled trial. J Gen Intern Med 2021; 36 (9): 2563–2570. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocad006-B8] 8. Sicsic J, Pelletier-Fleury N, Moumjid N.. Women’s benefits and harms trade-offs in breast cancer screening: results from a discrete-choice experiment. Value Health 2018; 21 (1): 78–88. [DOI] [PubMed] [Google Scholar]

[ocad006-B9] 9. Gafni A. The standard gamble method: what is being measured and how it is interpreted. Health Serv Res 1994; 29 (2): 207–224. [PMC free article] [PubMed] [Google Scholar]

[ocad006-B10] 10. Vickers AJ, Elkin EB.. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making 2006; 26 (6): 565–574. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocad006-B11] 11. Collins GS, Reitsma JB, Altman DG, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ 2015; 350: g7594. [DOI] [PubMed] [Google Scholar]

[ocad006-B12] 12. Vickers AJ, Van Calster B, Steyerberg EW.. Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests. BMJ 2016; 352: i6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocad006-B13] 13. Fitzgerald M, Saville BR, Lewis RJ.. Decision curve analysis. JAMA 2015; 313 (4): 409–410. [DOI] [PubMed] [Google Scholar]

[ocad006-B14] 14. Localio AR, Goodman S.. Beyond the usual prediction accuracy metrics: reporting results for clinical decision making. Ann Intern Med 2012; 157 (4): 294–295. [DOI] [PubMed] [Google Scholar]

[ocad006-B15] 15. Kerr KF, Brown MD, Zhu K, et al. Assessing the clinical impact of risk prediction models with decision curves: guidance for correct interpretation and appropriate use. J Clin Oncol 2016; 34 (21): 2534–2540. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocad006-B16] 16. Vollmer S, Mateen BA, Bohner G, et al. Machine learning and artificial intelligence research for patient benefit: 20 critical questions on transparency, replicability, ethics, and effectiveness. BMJ 2020; 368: l6927. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocad006-B17] 17. Christodoulou E, Ma J, Collins GS,et al. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol 2019;110:12–22. [DOI] [PubMed] [Google Scholar]

[ocad006-B18] 18. Jung K, Kashyap S, Avati A, et al. A framework for making predictive models useful in practice. J Am Med Inform Assoc 2021; 28 (6): 1149–58. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocad006-B19] 19.modelrecon: Assessing Prediction Models in the Presence of Resource Constraints. Github. https://github.com/ML4LHS/modelrecon. Accessed October 31, 2022

[ocad006-B20] 20. Wynants L, van Smeden M, McLernon DJ. et al. ; Topic Group ‘Evaluating diagnostic tests and prediction models’ of the STRATOS initiative. Three myths about risk thresholds for prediction models. BMC Med 2019; 17 (1): 192. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocad006-B21] 21. Challener DW, Prokop LJ, Abu-Saleh O.. The proliferation of reports on clinical scoring systems: issues about uptake and clinical utility. JAMA 2019;321:2405–2406. [DOI] [PubMed] [Google Scholar]

[ocad006-B22] 22. Overby CL, Tarczy-Hornoch P.. Personalized medicine: challenges and opportunities for translational bioinformatics. Per Med 2013; 10 (5): 453–462. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocad006-B23] 23. Shah NH, Milstein A, Bagley PhD SC.. Making machine learning models clinically useful. JAMA 2019; 322 (14): 1351–1352. [DOI] [PubMed] [Google Scholar]

PERMALINK

Assessing the net benefit of machine learning models in the presence of resource constraints

Karandeep Singh

Nigam H Shah

Andrew J Vickers

Abstract

Objective

Materials and Methods

Results

Discussion

Conclusions

BACKGROUND AND SIGNIFICANCE

Figure 1.

CASE STUDY 1: A MODEL TO PREDICT NEED FOR INTENSIVE CARE

Table 1.

Calculating the NB of the model

Calculating the NB of the proposed ICU-transfer criteria

Table 2.

CASE STUDY 2: AN ABSOLUTE CONSTRAINT, WHERE ONLY 3 ICU BEDS ARE AVAILABLE

Calculating the RNB of the model in the presence of an absolute resource constraint

Table 3.

Calculating the RNB of the proposed ICU-transfer criteria in the presence of an absolute resource constraint

CASE STUDY 3: A RELATIVE CONSTRAINT, WHERE OTHER BEDS CAN BE CONVERTED TO ICU BEDS IF NEEDED

Table 4.

Calculating the RNB in the presence of a relative resource constraint

CASE STUDY 4: USING DECISION CURVES TO VISUALIZE RNB OVER A RANGE OF THRESHOLD PROBABILITIES

Figure 2.

DISCUSSION

CONCLUSION

Supplementary Material

Contributor Information

SUPPLEMENTARY MATERIAL

FUNDING

AUTHOR CONTRIBUTIONS

CONFLICT OF INTEREST STATEMENT

DATA AVAILABILITY

REFERENCES

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases