Skip to main content
American Journal of Health-System Pharmacy: AJHP logoLink to American Journal of Health-System Pharmacy: AJHP
. 2021 Apr 4;78(14):1309–1316. doi: 10.1093/ajhp/zxab152

Development and validation of a predictive model to predict and manage drug shortages

Ina Liu 1,, Evan Colmenares 2,3, Casey Tak 4, Mary-Haston Vest 2,3, Henry Clark 3, Maryann Oertel 5, Ashley Pappas 2
PMCID: PMC8271205  PMID: 33821926

Abstract

Purpose

Pharmacy departments across the country are problem-solving the growing issue of drug shortages. We aim to change the drug shortage management strategy from a reactive process to a more proactive approach using predictive data analytics. By doing so, we can drive our decision-making to more efficiently manage drug shortages.

Methods

Internal purchasing, formulary, and drug shortage data were reviewed to identify drugs subject to a high shortage risk (“shortage drugs”) or not subject to a high shortage risk (“nonshortage drugs”). Potential candidate predictors of drug shortage risk were collected from previous literature. The dataset was trained and tested using 2 methods, including k-fold cross-validation and a 70/30 partition into a training dataset and a testing dataset, respectively.

Results

A total of 1,517 shortage and nonshortage drugs were included. The following candidate predictors were used to build the dataset: dosage form, therapeutic class, controlled substance schedule (Schedule II or Schedules III-V), orphan drug status, generic versus branded status, and number of manufacturers. Predictors that positively predicted shortages included classification of drugs as intravenous-only, both oral and intravenous, antimicrobials, analgesics, electrolytes, anesthetics, and cardiovascular agents. Predictors that negatively predicted a shortage included classification as an oral-only agent, branded-only agent, antipsychotic, Schedule II agent, or orphan drug, as well as the total number of manufacturers. The calculated sensitivity was 0.71; the specificity, 0.93; the accuracy, 0.87; and the C statistic, 0.93.

Conclusion

The study demonstrated the use of predictive analytics to create a drug shortage model using drug characteristics and manufacturing variables.

Keywords: drug shortage, predictive analytics, predictors, models, statistical


Key Points.

  • Various predictors related to manufacturing, economics, and drug characteristics were used to train and test a multiple logistic regression model to identify drugs subject to high shortage risk.

  • Predictors that positively predicted shortages included classification of drugs as intravenous-only, both oral and intravenous, antimicrobials, analgesics, electrolytes, anesthetics, cardiovascular agents, or Schedule III, IV, or V agents.

  • The resulting accuracy, discriminatory power, sensitivity, and specificity data highlight the potential utilization of the model to identify and target high risk drugs for which purchasing and contracting strategies can be applied to manage shortages.

Drug shortages, defined by the American Society of Health-System Pharmacists (ASHP) as a “supply issue that affects how the pharmacy prepares or dispenses a drug or influences patient care when prescribers must use an alternative agent,” are a national problem that can compromise the entire medication-use process and, most importantly, patient care.1,2 From 2004 to 2011, the volume of drug shortages reported nearly tripled.3 It has been estimated that drug purchasing costs have increased by $209 million annually due to a need to acquire more expensive substitutes and that pharmacy labor costs associated with managing shortages in the United States have increased by $359 million annually.2,4,5 Time and resources dedicated to managing drug shortages have steadily increased as well. In 2004, pharmacists reported spending a median of 3 hours per week managing drug shortages. However, in 2010, those hours tripled, with pharmacists and technicians spending 9 hours and 8 hours per week, respectively, managing drug shortages and hospitals spending an average of 8.6 million hours of additional labor annually to manage drug shortages in 2019.3,4

In addition to their impact on costs and resources, drug shortages compromise the quality of care by adversely affecting drug therapy, delaying medication administration, and causing medication errors and patient harm.1,2 In a survey of oncologists practicing in the United States, 83% reported they were unable to prescribe key drugs in standard chemotherapy regimens, such as cytarabine, leucovorin, and liposomal doxorubicin, resulting in most providers switching regimens or substituting a drug within the regimen.6 Another survey showed that 59% of oncologists reported they were unable to prescribe preferred cancer drugs at least once over the prior 6 months.7 In light of these shortages, a survey of health-system pharmacy leaders showed that more than one-third of respondents have had to ration drugs.8 Shortages, rationing, and excluding medications from treatment regimens have devastating effects on patients, as exemplified by a recent study that showed how global shortages of Erwinia chrysanthemi asparaginase used in the treatment of acute lymphoblastic leukemia have been linked to inferior disease-free survival in patients.9

The financial implications, the inefficiencies in patient care, and the potential safety risks delineate a need to better manage drug shortages. While Congress has passed legislation that requires prescription drug manufacturers to notify the Food and Drug Administration (FDA) in advance of product discontinuations or supply interruptions affecting medically necessary drug products, these measures have been criticized for being nondefinitive and vague, given that “medically necessary” is not defined.10 For instance, a recent survey fielded by the Institute for Safe Medication Practices (ISMP) completed by directors, pharmacy managers, purchasing agents, and clinical staff showed that 84% of respondents never or rarely received advanced notices about shortages, their causes, or their durations from manufacturers or FDA.11 As such, there is a need for health systems to find ways to efficiently and proactively manage drug shortages.

One possible strategy to optimize efficiency in drug shortage management is to turn from reactive processes of drug shortage management to a proactive approach through the use of predictive analytics. The use of data and analytics is a growth area in healthcare, including use of predictive modeling.12,13 While there are multiple studies and examples of data analytics in measuring clinical impact and outcomes, there are limited examples of data analytics being utilized for drug shortage management. Moreover, the data sources for these commercialized models are seeded from external hospitals’ usage patterns and may not be as specific to a health system’s formulary.14,15 The overall purpose of the study described here was to develop and validate a predictive model to identify drugs subject to a high risk of shortages using purchasing and formulary data internal to our organization.

Methods

Study design.

The overall goal of the study was to train and cross-validate a drug shortage model utilizing cross-sectional data collected from internal data on historically recorded drug shortages. First, candidate predictors that have been published in the literature regarding drug shortages were identified. Then we built a dataset based on the identified predictors utilizing internal drug formularies, historical purchasing data, and internal drug shortages recorded from 2016 to 2017.

Outcome definition.

Drugs that were on shortage were defined in this study as any drug on the study site’s formulary monitored from 2016 to 2017 by the organization’s internal strategic sourcing and shortage management (SSSM) team. The SSSM team obtains this list of drug shortages from various sources, including listservers, professional organization messages, FDA alerts, and internal stock-out and wholesaler reports.

Drug shortage risk factors.

A comprehensive literature search was completed using PubMed, Google Scholar, and other Web search engines using key search terms including drug shortage, risk factor, predictor, and causes. After reviewing identified literature for potential causes of drug shortages, a comprehensive list of potential predictors was created. The study team, including a drug shortage clinical specialist from the SSSM team, leadership members with oversight of the data analytics team, and a lead pharmacist on the data analytics team with a clinical background, convened to select the final predictors to be included in the data collection.

Dataset build.

A collection tool based on the finalized predictors was built and utilized for data collection for both drugs subject to a high risk of shortages (“shortage drugs”) and those not subject to a high shortage risk (“nonshortage drugs”). To ensure standardized data collection methods, a code book was developed to guide data collection. Drugs that were recorded to be on shortage from January 1, 2016, through December 31, 2017, were identified by review of historical drug shortage data maintained by the drug shortage management team. Data for the identified predictor variables were collected by 2 study investigators. Information that was not available from internal sources was supplemented with tertiary literature resources. The class of medication, availability of brand and generic products, and the number of manufacturers measured at the beginning of the study period were obtained from Facts & Comparisons database.16 Information on the drugs’ controlled substance scheduling was obtained from the Drug Enforcement Administration (DEA) database, and orphan drug status was determined by reviewing the FDA database on orphan drugs.17,18 Clinical experts on the team completed data cleaning. This process included looking through the dataset for nonsensical values, ensuring that ranges for all data values matched expected ranges that were preemptively set in the code book, and ensuring that values were clinically meaningful. If multiple data points related to a specific predictor were missing due to data integrity issues, the predictor variable was excluded in the final model build.

Data analysis.

Descriptive statistics were analyzed for each variable to explore each predictor characteristic. Simple logistic regressions were performed to determine the individual impact of each predictor on the outcome of drug shortages; a combination of significant predictors at an α level of <0.05 and predictors determined to be of high importance by clinical judgement were included in the final predictive model. These clinical judgements were based on team discussion and prior evidence reported in the literature. For instance, if a predictor did not show significance at an α level of <0.05 but literature has reported on the potential impact of that predictor, it was included in the final model.

Multiple logistic regression was utilized to determine the combined impact on the outcome of drug shortages. The model was trained and tested using k-folds cross-validation, with a k of 10 selected. In this method, the dataset was randomly split into 10 folds. The model was fit with k-1 folds and validated with the remaining fold. This process was automatically repeated until every k-fold served as the test set and an average of the resulting β coefficients were then calculated and converted to odds ratios (ORs). A confusion matrix was applied to this model to determine the final accuracy. The sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated and averaged across the 10 models.19 To ensure the consistency and robustness of the model developed by the k-folds cross-validation method, the dataset was trained and tested with a second method by randomly partitioning the data into training and testing datasets, split 70% and 30%, respectively.

To assess the discrimination power of the prediction model, the receiver operating characteristic (ROC) curve was used, and the area under the ROC curve, also known as the C statistic, was calculated. C statistic values above 0.7 were used to guide our interpretation of adequate model power for characterizing drug shortage predictions.20

Results

Candidate predictor variables.

A comprehensive literature review of potential predictors for drug shortages resulted in 4 major categories of risk factors, including those related to manufacturer and production, economics, specific drug characteristics, and other miscellaneous factors. Manufacturer- and production-related risks included use of antiquated machinery by manufacturers and contamination or shortages of raw materials.1,3,6,21 Candidate predictors related to economics described certain supply and demand risks related to shortages.3,22-24 For instance, a drug having a single manufacturer, or an increase in demand for a superior product—as in the recent shortage of GlaxoSmithKline’s Shingrix (zoster vaccine recombinant, adjuvanted)—may serve as potential risks for shortages.3,22 Additionally, certain drug characteristics have been associated with drug shortages. For instance, generic injectables have been highly associated with drug shortages.3,22,23 Other identified candidate predictors include natural disasters and the domino effect (ie, one shortage catalyzing further shortages).1 Certain candidate predictors were not included in the finalized list due to inability to obtain data (eg, many manufacturer-related risks are proprietary) or inability to effectively measure or predict (eg, natural disasters). Ultimately a total of 7 candidate predictors were used to build the dataset: dosage form; therapeutic class; Schedule II status; Schedule III, IV, or V status; orphan drug status; generic versus branded product availability; and number of manufacturers at the beginning of the study period.

Predictive model.

A total of 1,588 observations of shortage and nonshortage drugs were collected. A total of 71 observations were not included in the final dataset due to exclusions (due to missing data, invalid values that could not be corrected, etc). A total of 1,517 observations of shortage and nonshortage drugs remained, with 421 (27.8%) being shortage drugs and 1,096 (72.2%) being nonshortage drugs. Most of the drugs were either oral-only (37.8%) or intravenous-only (29.8%) drugs. A large percentage of the drugs (44.6%) were classified as “other” in regards to their therapeutic class, with antimicrobials (13.6%), cardiovascular agents (15.0%), and antineoplastics (10.5%) being the next 3 most common classes. The average number of manufacturers per drug was 4.871 (Table 1). Simple logistic regressions showed that classification as an oral-only drug, orphan drug, or branded-only drug and an increasing number of manufacturers negatively predicted a drug shortage, while categorization as an intravenous-only drug, antimicrobial, analgesic, electrolyte, anesthetic, cardiovascular agent, or generic-only medication positively predicted a shortage (Table 2). In the multiple logistic regression, predictors that negatively predicted a shortage included classification as an oral-only medication, branded-only medication, antipsychotic, or Schedule II medication, as well as orphan drug status and a higher total number of manufacturers. Variables that positively predicted a shortage included classification as an intravenous-only medication, both oral and intravenous medication, antimicrobial, analgesic, electrolyte, anesthetic, or cardiovascular agent (Table 3).

Table 1.

Characteristics of Drugs Included in Final Dataset (n =1,517)

Frequency, No. (%)a
Study classification
Nonshortage drug 1,096 (72.2)
Shortage drug 421 (27.8)
Dosage form
Oral only 574 (37.8)
IV only 452 (29.8)
Oral and IV 291 (19.2)
Other 200 (13.2)
Therapeutic class
Other 676 (44.6)
Cardiovascular 227 (15.0)
Antimicrobial 207 (13.6)
Antineoplastic 160 (10.5)
Analgesic 109 (7.2)
Electrolyte 74 (4.9)
Antipsychotic 41 (2.7)
Anesthetic 23 (1.5)
Controlled substance schedule
Schedule II 78 (5.1)
Schedules III–V 53 (3.5)
Orphan drug status 250 (16.5)
Branded/generic product availability
 Both branded and generic 675 (44.5)
Generic only 458 (30.2)
Branded only 384 (25.3)
No. of manufacturers per drug, mean (SD) 4.871 (4.641)

Abbreviations: IV, intravenous; SD, standard deviation.

aUnless indicated otherwise.

Table 2.

Results of Simple Logistic Regression Analysis of Association of Variables With Drug Shortage Risk

Variable Odds Ratio
(95% CI)
Dosage form
 Other [Reference]
 Oral only 0.14 (0.078-0.25)
 Oral and IV 1.67 (1.06-2.66)
 IV only 2.85 (1.87-4.40)
Therapeutic class
 Other [Reference]
 Antipsychotica 0.96 (0.28-2.57)
 Antineoplastica 1.13 (0.63-1.95)
 Cardiovascular 3.07 (2.01-4.67)
 Antimicrobial 3.84 (2.50-5.90)
 Analgesic 5.57 (3.35-9.30)
 Anesthetic 16.54 (5.49-61.00)
 Electrolyte 150.36 (45.26-932.70)
Controlled substance schedule
 Schedules III–Va 1.28 (0.65-2.77)
 Schedule IIa 1.36 (0.78-2.32)
Orphan drug status 0.29 (0.17-0.46)
Branded/generic product availability
 Other [Reference]
 Branded only 0.24 (0.15-0.38)
 Generic only 2.11 (1.57-2.85)
No. of manufacturers 0.33 (0.25-0.44)

Abbreviations: CI, confidence interval; IV, intravenous.

a P > 0.05 for comparison with reference.

Table 3.

Results of Multiple Logistic Regression to Determine Predictors of Drug Shortage Risk

Variable Odds Ratio
(95% CI)a
Dosage form
 Oral only 0.20 (0.11-0.35)
 Oral and IV 2.41 (1.43-4.13)
 IV only 3.94 (1.43-4.13)
Therapeutic class
 Antipsychotic 0.24 (0.06-0.79)
 Antineoplasticb 0.71 (0.36-1.36)
 Cardiovascular 1.90 (1.15-3.15)
 Antimicrobial 3.68 (2.28-5.98)
 Analgesic 8.41 (4.00-18.1)
 Anesthetic 12.9 (3.17-65.3)
 Electrolyte 101.1 (30.1-481.7)
Controlled substance schedule
 Schedule II 0.19 (0.07-0.51)
 Schedules III–Vb 2.47 (0.96-7.05)
Orphan drug status 0.45 (0.25-0.77)
Branded/generic product availability
 Branded only 0.03 (0.02-0.06)
 Generic onlyb 1.12 (0.76-1.64)
No. of manufacturers 0.11 (0.08-0.17)

Abbreviations: CI, confidence interval; IV, intravenous.

aMultiple logistic regression intercept, 0.44 (0.14-1.23).

b P > 0.05 for association with drug shortage risk.

The sensitivity of the cross-validated model was calculated to be 0.71, while the specificity was calculated to be 0.93. The PPV was calculated to be 0.80, and the NPV was 0.90. The accuracy of the overall model, calculated from the confusion matrix, was 0.87. The trained model with a 70/30 partition for training and testing, respectively, yielded similar results with regard to variables that negatively and positively predicted a drug shortage. Sensitivity, specificity, PPV, NPV, and accuracy values (0.71, 0.97, 0.90, 0.90, and 0.89, respectively) were similar to those for the cross-validated model. The area under the ROC curve was found to be 0.93 (Figure 1).

Figure 1.

Figure 1.

Results of receiver operating characteristic curve analysis.

Discussion

There are multiple bodies of literature that explain the theoretical relationships between different factors and drug shortages. To our knowledge, the study described here was the first study undertaken to assess and quantify the relationship between these variables and drug shortages. Based on the literature review, candidate predictors were divided into 4 major groups of predictors related to manufacturing and production, economics, drug characteristics, and other factors. Results of our literature overview of manufacturer- and production-related variables aligned well with a root cause analysis of drug shortages performed by FDA.1

The ORs associated with each predictor variable mostly aligned with what has been reported in the literature. For instance, the finding of intravenous medications having 3.94 higher odds of being on shortage than other dosage forms corresponds well to the theory that intravenous medications are more likely to go on shortage due to the complexity of their manufacturing and quality issues.2,3 Additionally, branded drugs were found to have lower odds of going on shortage relative to drugs available in both branded and generic products. This finding aligns well with the literature describing branded drugs as less likely to go on shortage due to manufacturers’ ability to make higher profits and thereby sustain operations.2 Our finding of increased shortage risk for therapeutic classes such as antimicrobials, analgesics, anesthetics, and cardiovascular agents aligns with literature stating that these therapeutic classes have been associated with higher drug shortage rates. While electrolytes were shown to have 101.1 times higher odds of being on shortage, there was a wide OR confidence interval (30-482) due to relatively few observations in that class compared to other classes.

The therapeutic class–related factors for which our findings did not align with previously reported literature on shortage risk included orphan drug status. Previous literature has shown that orphan drugs may be subject to a higher risk of going on shortage due to being produced in smaller quantities for rare diseases, which poses scale-up issues.25 Our model, however, showed the opposite; orphan drugs were predicted to have a statistically significant lower odds of going on shortage in both the simple and multiple logistic regressions. Even though orphan drugs may be produced in smaller quantity, the results of our study may be explained by potentially higher profit margins to sustain manufacturing due to branded status and FDA’s ability to offer orphan drug companies means of producing greater drug quantities to meet demand.2,25,26 Future studies should further investigate the impact of orphan status on shortages.

Similar to orphan drugs, Schedule II drugs were found to have significantly lower odds of going on shortage. This finding contradicts what has been reported in the literature, especially in light of DEA’s yearly quota for Schedule II medications, which limits the number of Schedule II medications a manufacturer can make by setting limits on aggregate product quotas (APQs), thereby increasing the risk of or exacerbating shortages.2,27 A potential reason why our model showed the opposite may be the timing of our data collection. All shortage drugs that were included in the drug collection had documented shortage dates from 2014 to 2017. During this time, the DEA quota for Schedule II drugs actually increased; for example, the APQ for fentanyl was increased from 2.1 million to 2.3 million from 2014 to 2016.28-31 While there was a sharp decrease in the APQ from 2016 to 2017 (the APQ for fentanyl was decreased from 2.3 million to 1.3 million), our collected dataset reflected an increasing quota rather than a decreasing one. This implies that relative changes in the APQs should be included in future models to more accurately capture the overall effect of controlled substance scheduling and quotas.

Overall, the final model was found to have high discriminatory power, as shown by the area under the ROC curve (0.93) and moderately high accuracy (0.87). The sensitivity of the model (0.71) indicated a moderately high true positive rate, suggesting that the model may assist with identifying drugs subject to high shortage risk. The greater specificity of the model implies a high true negative rate, allowing shortage management teams to accurately identify drugs associated with lower shortage risk. The high PPV (0.80) and high NPV (0.90) indicate a high likelihood of drugs actually going on shortage or not going on shortage, respectively. These attributes of the model will be favorable in its applicability, as drug shortage management teams will primarily target drugs that go on shortage and can use the model to identify and target high-risk drugs for which purchasing and contracting strategies, such as increasing stock or contracting with wholesalers to obtain guaranteed supplies, can be applied. Future studies should focus on improving the ability to identify drugs subject to a risk of shortage, possibly by including other candidate predictors not evaluated in this analysis, such as changes in cost, APQs, and other time-varying predictors.

Our model and analysis had several limitations. First, there were multiple factors that were not included in the model due to lack of feasibility. For instance, the domino effect is difficult to reliably measure as a single concrete data point due to multiple effects taking place.1 While variables such as raw materials, antiquated equipment, quality issues, and scale-up issues were factors commonly associated with drug shortages, a majority of these predictors could not be quantified or included for analysis due to proprietary barriers set in place by manufacturers.1 Additionally, while drugs with no alternatives have been shown to typically be on shortage longer than those with alternatives, standardized measurements for this factor may not be feasible because practices may differ within and among institutions.22

Economic predictors (wholesale acquisition cost [WAC] of a drug 3 months prior to the date of the shortage and quarterly changes in WAC pricing 2 years prior to the date of the shortage) were considered for inclusion in the model but ultimately excluded due to a lack of consistent purchase history and a lack of integrity of internal data feeds. An evaluation by the US Department of Health and Human Services showed that drugs that have not been on shortage have had stable or increasing prices, while drugs that have been on shortage had decreasing prices prior to going on shortage.32 Given those findings, future studies should aim to identify more robust and comprehensive sources for drug pricing data and incorporate those data into predictive models.

Additionally, we used retrospective observational data and therefore cannot assume the relationships found within the model are causal. This model should therefore be further validated with prospective data that can be collected in an automated way to ensure efficiency, scalability, and applicability. Furthermore, the validity of this model and its application to other health systems or hospitals outside of our organization cannot be verified due to differences in formularies and purchasing strategies. Collaborations with external stakeholders such as group purchasing organizations may be able to mitigate both of these limitations. Group purchasing organizations may also help serve as a conduit in establishing key partnerships with other hospitals and health systems to determine whether our model can be applied to external institutions.

Conclusion

Overall, the study demonstrated that a predictive model with high discriminatory power can be created by using internal data feeds to incorporate variables such as drug characteristics and manufacturing and production variables. There is published literature that hypothesizes various predictors of drug shortages, and our predictive model validated those hypotheses, with many modeled predictors found to be significantly associated with shortages. While future studies will validate, expand on, and improve this model, this model illustrates the potential of utilizing data analytics to manage drug shortages.

Acknowledgments

The authors acknowledge the support of the pharmacy analytics and outcomes team at UNC Health in their help with obtaining data and their consultations in model review.

Disclosures

This publication was supported by grant number UL1TR002489 from the National Center for Advancing Translational Sciences (NCATS) at the National Institutes of Health (NIH). The content is solely the responsibility of the authors and does not necessarily represent the official views of NIH. The authors have declared no potential conflicts of interest.

References


Articles from American Journal of Health-System Pharmacy: AJHP are provided here courtesy of American Society of Health-System Pharmacists

RESOURCES