Skip to main content
Journal of the American Medical Informatics Association: JAMIA logoLink to Journal of the American Medical Informatics Association: JAMIA
. 2021 May 6;28(8):1712–1718. doi: 10.1093/jamia/ocab071

Pharmacists’ perceptions of a machine learning model for the identification of atypical medication orders

Sophie-Camille Hogue 1, Flora Chen 1, Geneviève Brassard 1, Denis Lebel 1, Jean-François Bussières 1, Audrey Durand 2,3, Maxime Thibault 1,
PMCID: PMC8324239  PMID: 33956971

Abstract

Objectives

The study sought to assess the clinical performance of a machine learning model aiming to identify unusual medication orders.

Materials and Methods

This prospective study was conducted at CHU Sainte-Justine, Canada, from April to August 2020. An unsupervised machine learning model based on GANomaly and 2 baselines were trained to learn medication order patterns from 10 years of data. Clinical pharmacists dichotomously (typical or atypical) labeled orders and pharmacological profiles (patients’ medication lists). Confusion matrices, areas under the precision-recall curve (AUPRs), and F1 scores were calculated.

Results

A total of 12 471 medication orders and 1356 profiles were labeled by 25 pharmacists. Medication order predictions showed a precision of 35%, recall (sensitivity) of 26%, and specificity of 97% as compared with pharmacist labels, with an AUPR of 0.25 and an F1 score of 0.30. Profile predictions showed a precision of 49%, recall of 75%, and specificity of 82%, with an AUPR of 0.60, and an F1 score of 0.59. The model performed better than the baselines. According to the pharmacists, the model was a useful screening tool, and 9 of 15 participants preferred predictions by medication, rather than by profile.

Discussion

Predictions for profiles had higher F1 scores and recall compared with medication order predictions. Although the performance was much better for profile predictions, pharmacists generally preferred medication order predictions.

Conclusions

Based on the AUPR, this model showed better performance for the identification of atypical pharmacological profiles than for medication orders. Pharmacists considered the model a useful screening tool. Improving these predictions should be prioritized in future research to maximize clinical impact.

Keywords: machine learning, clinical pharmacy information systems, decision support systems, clinical, medical order entry systems, hospital pharmaceutical services

INTRODUCTION

Background and Significance

The development of artificial intelligence (AI) in health care is an active research topic. However, few studies have been published on the application of AI in clinical pharmacy. Two recent articles discussed theoretical concepts but did not find published studies demonstrating its applications.1,2 Yet, publications from more than 10 years ago illustrate the potential of such technology to help pharmacists automate and streamline repetitive processes such as medication order review.2–4 A retrospective study of 1 887 751 prescriptions showed that a dispensing hospital pharmacist makes an average of 4.9 errors per 100 000 prescriptions. Errors increase with the number of prescriptions to be validated per shift.5 The review of medication orders can be quite time-consuming. An AI software that would identify prescriptions requiring intervention, without causing an overload of alerts, would be beneficial.

Several decision support systems already exist, for example, to detect drug interactions. However, these usually evaluate patient data according to logical conditions, such as allergies, age, or kidney function, taken from a database.2 These software can help pharmacists by identifying potential pharmacotherapeutic problems. However, they generate a very high number of alerts, many of which are false positives. Moreover, pharmacists do not always act on these alerts and may dismiss them or consult other resources before intervening.6 An approach based on AI, more specifically on machine learning (ML), could improve clinical decision support software (CDSS) by learning from clinical practice patterns and thus provide tools that are more clinically impactful.7,8

A previously proposed AI-based CDSS detected potential medication errors by identifying statistically aberrant prescriptions.9 It was shown that errors detected using this approach may not have been identified by existing decision support systems while generating a modest false positive rate. Recently, a hybrid CDSS aiming to prioritize and optimize medication order review was investigated.10 The method relied on a decision tree trained using multiple patient features, such as sex, age, medical history, and some laboratory values, in addition to rule-based CDSS, which provided a list of triggered alerts that informed the model about the patient’s medication. The model was trained in a supervised manner using the presence of pharmacist interventions as a target. This hybrid method was shown to be more accurate than standard systems.

A few methods have been described to identify atypical medication orders with ML. One method used historical patterns that included the generic medication name, the route of administration, the dose, and the frequency of administration to determine if these parameters were statistically common or not.11 Another model was proposed to detect outlier prescriptions similarly using dose-frequency prescription patterns.12 These methods only looked at the dose, route, and frequency for a given drug and ignored concurrently prescribed drugs.

The present study aimed to evaluate the potential of ML models to identify medication orders and pharmacological profiles that are atypical. To this extent, we compared several models on precollected data and conducted a real-world experiment with the best-performing model allowing us to evaluate the perception of such tools by pharmacists. This ML model could help identify atypical orders more efficiently and could serve as a triage system to direct pharmacist attention toward the prescriptions at a higher risk of error. Compared with other models, our approach learns prescription patterns in an unsupervised manner, directly from medication order data. It analyzes medications in relation to one another and could eventually be combined with other techniques to analyze dosage instructions.

Objectives

The primary objective of this study was to assess the ML model’s performance by comparing its medication order predictions with pharmacist perceptions. The secondary objectives were to compare the predictions for entire pharmacological profiles to pharmacist perceptions, and to explore pharmacists’ general perceptions of AI and the integration of this model in their clinical practice.

MATERIALS AND METHODS

Setting

This prospective study was conducted at CHU Sainte-Justine, Montreal, Canada, a 500-bed tertiary care mother-and-child academic hospital, from April to August 2020. The study protocol and access to the data were approved by the local research ethics board.

Machine learning models and data

A model selection dataset consisting of every medication order between 2005 and 2018 inclusively was extracted from the pharmacy database and preprocessed to reconstruct pharmacological profiles. Atypical orders are defined as deviating from usual prescription patterns of drugs in relation to one another. Profiles were defined as lists of active medications for a patient at every time point in which no order was entered in the following hour, to avoid sampling profiles when orders were actively being entered. Atypical profiles were defined as deviating from previously seen patterns of pharmacological profiles, without any rule-based decision support.

These pharmacological profiles were used to train three ML models for anomaly detection: an isolation forest model, a neural network autoencoder, and an adversarially trained autoencoder adapted from the GANomaly model,13 which was originally developed for anomaly detection in images and adapted to our use case. The isolation forest model was only capable of predicting atypical pharmacological profiles and not individual orders, while the autoencoder and GANomaly-based models offered predictions as to whether individual orders and profiles were atypical. These models analyzed drug selection only and did not take into consideration the duration of treatment or dosage instructions, nor the patient’s medical history or any preexisting rule-based algorithm. More details and the training processes are available in the Supplementary Appendix.

The GANomaly-based model was considered as the primary focus of this study. Other models were treated as baselines. We also included a non-ML baseline for pharmacological profiles in our comparisons, which consisted of determining how frequently a pharmacological profile was seen in previous years.

During the model selection phase, all models were compared on the 2005 to 2018 dataset. Because the GANomaly-based model displayed the best performance in this preliminary evaluation, this model was used in the real-world experiment to assess the pharmacists’ perceptions of the model as a decision support tool. In this experiment, predictions from the GANomaly-based model were presented to the pharmacists. Data resulting from the real-world experiment was further used to evaluate the predictions for all considered models.

To identify atypical pharmacological profiles, the GANomaly model uses a value called “encoder loss,” indicating how much of the information contained in the pharmacological profile is lost by the model in its internal representation of that profile. A higher value indicates a more atypical profile (see Supplementary Appendix for a technical definition). To provide a clear binary prediction to pharmacists, this continuous value needed to be thresholded into 2 classes (typical or atypical) using data not used during training of the model and without using pharmacist perception. We used Otsu’s method to calculate this threshold on the first 2 months of profile predictions generated during the study.14 These profiles were excluded from further analysis (see Supplementary Appendix for complete details).

To account for the most recent changes in practice as well as new drugs, the models were retrained with the 10 years of most recent data before the study and every month during the study. We kept the best hyperparameters identified during model selection on the 2005 to 2018 dataset.

Data collection

We approached pharmacists from all clinical teams to participate in the study. For every pharmacist who accepted to participate, we collected the number of years of practice and the clinical practice department. Only pharmacological profiles for newly admitted patients that had already been validated for dispensing, but not evaluated by the clinical pharmacist, were included. Profiles that had already been evaluated in the study were not re-evaluated if changes were made to ensure that the judgment of the pharmacist was not biased by his previous perception. Profiles from the first 2 months of the study were excluded because they were used to calculate the classification thresholds for the following profiles.

Daily data collection was Web based. Once during each clinical practice day, before seeing the model predictions, the pharmacists labeled (typical or atypical) each medication order for the patients under their care that were included in the study. An atypical prescription was defined as a medication order that did not correspond to usual prescribing patterns, according to the pharmacist’s expertise and experience. At this point, pharmacists labeled only medication orders, not profiles. The predictions per medication were then shown as a confidence percentage between 0% and 100% and were color-coded, from red (atypical, 0%) to green (typical, 100%), with intermediate colors (orange; yellow; light green). Pharmacists had to indicate whether they agreed or disagreed with the predictions for individual orders by maintaining or changing their classification. They then indicated whether they agreed or disagreed with the prediction for the profile. We considered a pharmacological profile as atypical if at least 1 medication order within the profile was labeled as atypical by a pharmacist. We planned an analysis of model performance as compared with pharmacist perception for individual medication orders and complete pharmacological profiles, in both cases before and after the display of predictions. It should be noted that a profile composed of highly typical medication orders could be considered an atypical profile in some cases. For example, if a patient had 2 specific unrelated pathologies for which they received typical drugs, the individual drugs could be judged as typical while the complete pharmacological profile represented an atypical profile.

Although this has not been previously validated in a formal study, we considered pharmacist perceptions as the ground truth for atypical orders and profiles. We believe that the on-duty pharmacist was the person best placed to label their patients' prescriptions in actual practice. Additionally, to build trust in an AI model that would identify atypical orders and profiles, these predictions should somewhat match pharmacist perceptions, and as such these perceptions are the point of reference even if they cannot be easily defined using objective criteria.

Finally, to explore the pharmacists’ perceptions of the model used in this study and of AI in health care, we conducted 2 focus groups as well as an online postparticipation survey (detailed in Supplementary Appendix). The survey explored the general perceptions of the application of AI in pharmacy practice and was previously used on a cohort of 267 Quebec pharmacy residents and pharmacists (M Thibault, BPharm, MSc, et al, unpublished data). Its objective was to evaluate if exposure to an AI model in practice modified these perceptions and to evaluate the usefulness of such models.

Statistical analysis

We used confusion matrices (ie, 2 × 2 contingency tables) to classify our binary results (typical or atypical) and measure the model performance. From these tables, we computed the precision (positive predictive value), recall (sensitivity), specificity, negative predictive value, and F1 score regarding the comparison of the model’s medication order and profile predictions to the pharmacist’s perception. When applicable, we also reported the area under the receiver-operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPR), as our observations were unbalanced between categories (ie, typical medication orders were much more frequent). Subgroup exploratory analyses by pharmacological AHFS classification, departments, and degree of experience as a clinical pharmacist were performed.

To our knowledge, there is no widely available method for the calculation of a sample size associated with precision-recall curves. Based on an estimated threshold of 20% of atypical medication orders from a 1-day test, if a sample size of 640 typical and 160 atypical medication orders was reached, it was possible to detect a minimum AUROC of 0.6 with a power of 95% and a bilateral alpha of 0.05. All calculations for 2 × 2 tables were made with Microsoft Excel (Microsoft Corporation, Seattle, WA). Calculations for AUROC and AUPR curves were made with Scikit-learn.15 The power calculation was made with the pROC package in R (R Foundation for Statistical Computing, v3.6.1., Vienna, Austria).16

No statistical analyses were performed with the observations collected during the focus groups.

Results from the postparticipation survey were analyzed with descriptive statistics. Responses that totally agreed and partially agreed on the Likert scale were interpreted as agreeing.

RESULTS

Between April 9 and August 7, 2020, a total of 12 624 medication orders and 2114 pharmacological profiles were displayed to 25 pharmacists from 7 departments: obstetrics-gynecology and nursery, general pediatrics, surgery, oncology, specialized pediatrics, neonatal intensive care unit (NICU), and pediatric intensive care unit (PICU). A total of 153 (1.2%) medication orders were excluded (109 without a corresponding profile, 23 miscellaneous reasons, 18 not labeled, 3 missing data). A total of 699 (33%) profiles were excluded because they were used to calculate the classification thresholds for profiles. A total of 59 (2.8%) profiles were excluded because the pharmacists did not provide a label. The distribution of profiles per department was similar, except for obstetrics-gynecology and general pediatrics, which were slightly overrepresented in included profiles (data not shown). A total of 12 471 orders and 1356 profiles were included in the analysis (Table 1).

Table 1.

Characteristics of pharmacists, ratings, and performance by department

Medical department Pharmacists Experience (y) Orders Profiles F1 Orders F1 Profiles
Overall 25 10.7 ± 7.4 12 471 (100) 1356 (100) 0.30 0.59
Obstetrics-gynecology 4 15.5 ± 6.8 7119 (57) 536 (40) 0.25 0.55
General pediatrics 4 13.5 ± 9.0 2036 (16) 368 (27) 0.28 0.66
Surgery 2 2.0 ± 1.4 1113 (9) 107 (8) 0.45 0.42
Oncology 6 10.0 ± 6.9 731 (6) 32 (2) 0.32 0.65
Specialized pediatrics 3 6.0 ± 8.7 615 (5) 67 (5) 0.39 0.69
Neonatal intensive care 5 12.0 ± 8.5 410 (3) 99 (7) 0.13 0.35
Nursery N/Aa 267 (2) 128 (9)
Pediatric intensive care 1 13 180 (1) 19 (1) 0.33 0.72

Values are n, mean ± SD, or n (%), unless otherwise indicated.

N/A: not applicable.

a

Labeled by pharmacists from obstetrics-gynecology.

The mean number of years of practice was 10.7 ± 7.4 years. Twelve pharmacists had more than 13 years of experience.

Each pharmacist labeled 499 ± 811 medication orders and 54 ± 79 profiles. Two pharmacists, from obstetrics-gynecology and nursery, labeled together 5966 (48%) medications orders and 567 (42%) pharmacological profiles.

Medication orders

The GANomaly-based model displayed low performance, with a precision of 35%, recall of 26%, specificity of 97%, and negative predictive value of 96% (Table 2), when comparing its predictions per medication against pharmacist labels (before seeing the model predictions). Overall F1 score was only 0.30. F1 scores by department were 0.45 in surgery, 0.39 in specialized pediatrics, 0.33 in PICU, 0.32 in oncology, 0.28 in general pediatrics, 0.25 in obstetrics-gynecology, and 0.13 in NICU. The AUROC and AUPR were only 0.80 and 0.25, respectively.

Table 2.

Confusion matrices of pharmacist ratings compared with predictions

Individual Orders
Profiles Before Seeing Predictions
Profiles After Seeing Predictions
Model Atypical Typical Atypical Typical Atypical Typical
Atypical 166 304 195 201 263 133
Typical 465 11 536 66 894 38 922

Predictions from the GANomaly-based model for orders labeled by pharmacists with <13 years of experience showed a precision of 35%, recall of 28%, specificity of 97%, and F1 score of 0.31, while those labeled by pharmacists with more than 13 years of experience showed a precision of 36%, recall of 25%, specificity of 98%, and an F1 score of 0.30. The most common medication orders were central nervous system agents (n = 4305, 35%), gastrointestinal drugs (n = 2593, 21%), and skin and mucous membrane agents (n = 1293, 10%). Serums, toxoids and vaccines showed the highest F1 score with 0.67 and antihistamine drugs showed the lowest F1 score with 0.16. Serums, toxoids, and vaccines (n=240 medication orders) showed a 100% recall, 100% specificity, and 100% negative predictive value.

After seeing the model’s predictions, pharmacists changed their perception 22 (0.18%) times, half of which changed from disagreement to agreement with the model. Only 10 reasons for the change were documented. An unusual dosage form that had not been initially detected by the pharmacist was the most cited. Because changes of perception occurred rarely, we did not perform the analysis of model performance for medication orders after the display of predictions.

Pharmacological profiles

Pharmacological profile predictions generated by the GANomaly-based model, when compared with pharmacist labels before seeing the predictions, achieved moderate performance with a precision of 49%, recall of 75%, specificity of 82%, and negative predictive value of 93%. The overall F1 score was 0.59. F1 scores by department were 0.72 in PICU, 0.69 in specialized pediatrics, 0.66 in general pediatrics, 0.65 oncology, 0.55 in obstetrics-gynecology, 0.42 surgery, and 0.35 in NICU. Predictions for pharmacological profiles from the GANomaly-based model obtained the best performance compared with baselines, with an AUROC and AUPR of 0.88 and 0.60, respectively, compared with 0.83 and 0.56 for the autoencoder model and 0.75 and 0.33 for the isolation forest model (Figure 1).

Figure 1.

Figure 1.

Precision-recall curves for predictions for pharmacological profiles from the 3 machine learning models and statistical frequency baseline.

After the display of predictions, the model obtained a precision of 66%, recall of 87%, specificity of 87%, and negative predictive value of 96% (Table 2) for labels by pharmacists. F1 score rose to 0.75, showing that pharmacists agreed well with the predictions after seeing them. The AUROC and AUPR were 0.92 and 0.73, respectively.

Although data collection started during the initial wave of the SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) (coronavirus disease 2019 [COVID-19]) pandemic, we do not believe that this affected the results, as a very limited number of COVID-19 patients were admitted during the study period. Pharmacists indicated when orders or profiles were unusual to them because of patient movement due to infection prevention and control measures. In total, only 28 (2%) pharmacological profiles and 183 (1.5%) medication orders were labeled as such.

Focus groups

Two 60-minute sessions with 15 pharmacists from all 7 medical departments were conducted.

Perceptions of AI and the model

The majority of pharmacists were interested to learn more about AI after being exposed to the model. Some pharmacists showed interest in using this software during medication order review. The majority would dislike having to use different software to see the predictions and would prefer the integration of the predictions into pharmacy systems. The best way to integrate such a system into the pharmacist's workflow based on human factors and to combine its predictions with the output of rule-based systems should be evaluated in another study.

A majority (9 pharmacists) preferred the predictions by medication order but were not aware that performance was better with profile predictions. Only 4 participants preferred predictions by profile. Conceptually, showing the pharmacists a prediction for each medication order should be better because it identifies clearly which prescription is atypical, unlike profile predictions, which only inform the pharmacist that something is atypical within the profile. One pharmacist explicitly stated that if predictions by medication orders were better, he would have preferred those over predictions by profile. One participant was concerned about judgment on his competence, and another had concerns about the risk of losing his role in the clinical team.

Quality of the predictions

The majority of pharmacists were satisfied with the apparent threshold for identification of an atypical order, although some would have liked the model to show more or less atypical orders. Some pharmacists were open to the idea of allowing the user to manually adjust the classification threshold. Pharmacists from general pediatrics and obstetrics-gynecology thought the model would guide prioritization on busy days or when they have to cover more than 1 medical department. Sometimes, frequent medication orders from protocols were flagged as atypical for no apparent reason, and pharmacists would have preferred if those prescriptions were not flagged. In general, the model was useful to screen and prioritize patients. The pharmacists noted a good performance for the identification of dosage form errors such as dispensing vitamin D3 capsules instead of the oral solution for a young child. One pharmacist also noted the correct identification of an aberrant order of valproic acid in a pregnant woman’s pharmacological profile.

General issues

Some technical limitations of the model were identified as issues by pharmacists. For example, orders that were prescribed but not yet reviewed in the central pharmacy did not appear in the predictions. Moreover, the definition of atypical was subjective and sometimes misunderstood by pharmacists. Some were disconcerted that they did not always label the same way and some did not agree on what was atypical. Pharmacists also shared that labeling orders uniformly throughout the project was challenging. Finally, some pharmacists found the labeling process time-consuming.

Postparticipation survey

A total of 21 pharmacists completed the survey. This survey covered 2 major topics: opportunities and threats of AI in pharmacy practice.

Opportunities

The majority (n=20, 95%) thought that this technology would be beneficial to pharmacy practice and would like to contribute to its integration (n=13, 62%). Twelve (57%) pharmacists would agree to let the software verify a medication order autonomously according to parameters set by department policy. Twenty (95%) pharmacists thought that AI could improve the quality of health care.

Threats

More than half (n=12, 57%) were concerned that this technology could decrease pharmacist staffing. Half (n=10, 48%) were worried to be considered responsible for an adverse event that would result from the use of AI. A minority (n = 6, 30%) were concerned with the impact of AI on the pharmacist’s professional role and recognition.

DISCUSSION

This study aimed to evaluate the potential of ML models to identify atypical medication orders and pharmacological profiles, as well as to evaluate the perceptions of pharmacists about AI after having been exposed to a practical application.

The GANomaly-based model outperformed the baselines for pharmacological profile predictions. Predictions by profile had higher F1 score and recall as compared with medication order predictions. After viewing predictions from the GANomaly-based model for profiles, pharmacists agreed with the model with high recall and specificity, and moderate precision. However, most pharmacists preferred using predictions for individual medication orders. This is surprising considering the much better performance of the model to identify atypical profiles compared with medication orders. This highlights the importance of validating with the end user in addition to using the typical performance metrics when evaluating such predictive models.

The pharmacists seemed to find medication order predictions more useful, even though these correlated poorly with their perception. However, comments shared in the focus groups indicated that pharmacists did not place too much trust in order predictions. Rather, they were satisfied to use them as a safeguard. This leads us to believe that even moderately improving the quality of these predictions could lead to a useful tool for pharmacists.

Surveys on the perception of pharmacists and pharmacy residents regarding the implementation of AI in hospital pharmacy in Quebec and France were conducted as part of previous projects (M Thibault, BPharm, MSc, et al, unpublished data). In this previous work, most respondents (86%, n=145) who had no experience with an AI model believed that the use of AI could be beneficial, and expressed interest in helping to integrate this technology into practice. Our results also showed a positive interest in AI. In the previous surveys, 63% (n = 105) were concerned that AI could be detrimental to the professional role of pharmacists. After using the model, only 30% (n = 6) showed concern. Similarly, in previous surveys, the majority (64%, n = 108) were concerned about the possibility of being held responsible for an adverse event that would result from AI. In this study, after using the model, this proportion was 47% (n = 10). Overall, the pharmacists who participated in this study had a better perception of AI after using the model, as compared with the prior surveys.

The performance evaluation of such a model should account for the clinical use case of the model as well as the characteristics of the data. Recall should be favored as a performance metric because, ultimately, the clinical pharmacist would use this model as a triage tool and focus his or her attention on orders and profiles flagged as atypical. As such, false negatives should be minimized. Furthermore, in this study, the prevalence of an atypical order was 5% (n = 631) and an atypical profile was 22% (n = 301), as labeled by pharmacists. The prevalence of an atypical order was 4% (n = 470) and atypical profile 29% (n = 396), as labeled by the model. Because the prevalence of atypical medication orders is very low, precision-recall curves are more relevant than receiver-operating characteristic curves and should also be used to evaluate the model.17 We could also consider continuous, rather than dichotomized, predictions from the model, as some pharmacists indicated that they would have liked to be able to adjust the classification threshold manually.

Limitations of the data and model

A characteristic of the data to account for when using this model is that medication ordering patterns may include suboptimal but common practices. A potential deleterious effect would be the reinforcement of these practices by not signaling them for review because they fit previously seen patterns. Therefore, we do not believe that this model could be used as a stand-alone decision support tool, but rather would be combined with classical rule-based approaches to detect such issues independently of practice patterns.

Being a single-center study, the model was trained only with data from our center and generalization to another institution is not possible. Because we needed a finely detailed analysis of medication orders up to the exact drug entry that was used in the system, we did not map local data to an interoperable terminology such as Unified Medical Language System vocabularies. Reproducibility is a frequent challenge with AI technologies in health care.18 We believe that training a model that would generalize results across institutions would be challenging. A first step should be to train institution-specific models using our technique and to compare results across institutions to characterize their generalization capability.

Data drift has been identified as an issue with clinical order data.19 This model is susceptible to this effect because of frequent changes in ordering patterns due to situations including drug shortages or changes in guidelines or prescribing staff. Retraining or updating the model at regular intervals to account for practice evolution is of utmost importance. This was done monthly during the study.

The GANomaly-based model can be considered a “black-box.” Because the model only uses pharmacological profiles to learn, it is not possible to give an explanation of the prediction based on the characteristics of the patient or to get the reason as to why a given drug was considered atypical. Taking into account patient features and developing interpretable predictions is an avenue for future work.

Limitations of the study

The analysis of profiles that have already been evaluated carries risks such as biased judgment based on prior knowledge of the pharmacist. To limit this bias, data were only collected from new admissions. However, it was pointed out during the focus groups that a few patients were frequently hospitalized and known to the pharmacist (eg, oncology).

Obstetrics-gynecology pharmacists labeled almost half of the medication orders and pharmacological profiles, which led to an overrepresentation of this population in our results. The model underperformed as compared with our expectations in NICU. For these patients, atypical orders usually occur later during the hospitalization because most of the medication in the first hours of life are protocolized. This may explain at least in part the poor performance observed in this department.

Some pharmacists indicated that they may have labeled orders based on how clinically appropriate they found them, rather than on how much they corresponded to known patterns. This may have lowered performance metrics.

During the initial wave of the COVID-19 pandemic, changes were made in the organization of care. Hence, patients were sometimes admitted to different medical departments temporarily while waiting for COVID-19 screening results. Pharmacists may have found orders or profiles unusual because they were not as experienced with other specialized therapies. However, the prevalence of this situation was very low in our data.

CONCLUSION

This study evaluated the clinical performance of an ML model capable of identifying atypical medication orders and profiles. It showed better performance for the identification of atypical pharmacological profiles than medication orders. Improving these predictions should be prioritized in future research to maximize clinical impact. The focus groups and the postparticipation survey showed that pharmacists had a positive perception of this technology, but some pharmacists expressed concerns about consequences on their professional practice. The next step would be to determine if the same technique could be used in another institution with comparable performance, and we intend to implement our model in our practice for further improvements. This model has great potential for future application in our clinical hospital pharmacy practice setting to improve patient care.

FUNDING

The study was not funded.

AUTHOR CONTRIBUTIONS

All authors meet the four ICMJE criteria for authorship and all contributors meeting these criteria are listed as authors. No other person has made a substantial contribution to this manuscript.

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Informatics Association online.

Supplementary Material

ocab071_Supplementary_Data

ACKNOWLEDGEMENTS

We thank Pierre Snell, MSc, from Université Laval who provided insight and expertise during model development.

CONFLICT OF INTEREST STATEMENT

None of the authors have any conflicts of interest in relation to the present study.

DATA AVAILABILITY STATEMENT

The model training and evaluation code used for this project are available online (https://github.com/pharmai-lab/PharmAI_2). The model selection dataset can be shared upon request, under a data-sharing agreement.

REFERENCES

  • 1. Nelson SD, Walsh Cg, Olsen Ca, et al. Demystifying artificial intelligence in pharmacy. Am J Health Syst Pharm 2020; 77 (19): 1556–70. [DOI] [PubMed] [Google Scholar]
  • 2. Flynn A. Using artificial intelligence in health-system pharmacy practice: Finding new patterns that matter. Am J Health Syst Pharm 2019; 76 (9): 622–7. [DOI] [PubMed] [Google Scholar]
  • 3. Tribble DA. Automating order review is delegation, not abdication. Am J Health Syst Pharm 2009; 66 (12): 1078–9. [DOI] [PubMed] [Google Scholar]
  • 4. Poikonen J. An informatics perspective on nearly universal prospective order review. Am J Health Syst Pharm 2009; 66 (8): 704–5. [DOI] [PubMed] [Google Scholar]
  • 5. Gorbach C, Blanton L, Lukawski BA, Varkey AC, Pitman EP, Garey KW.. Frequency of and risk factors for medication errors by pharmacists during order verification in a tertiary care medical center. Am J Health Syst Pharm 2015; 72 (17): 1471–4. [DOI] [PubMed] [Google Scholar]
  • 6. Bagri H, Dahri K, Legal M.. Hospital pharmacists' perceptions and decision-making related to drug-drug interactions. Can J Hosp Pharm 2019; 72 (4): 288–94. [PMC free article] [PubMed] [Google Scholar]
  • 7. Bhakta SB, Colavecchia AC, Haines L, Varkey D, Garey KW.. A systematic approach to optimize electronic health record medication alerts in a health system. Am J Health Syst Pharm 2019; 76 (8): 530–6. [DOI] [PubMed] [Google Scholar]
  • 8. Segal G, Segev A, Brom A, Lifshitz Y, Wasserstrum Y, Zimlichman E.. Reducing drug prescription errors and adverse drug events by application of a probabilistic, machine-learning-based clinical decision support system in an inpatient setting. J Am Med Inform Assoc 2019; 26 (12): 1560–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Schiff GD, Volk LA, Volodarskaya M, et al. Screening for medication errors using an outlier detection system. J Am Med Inform Assoc 2017; 24 (2): 281–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Corny J, Rajkumar A, Martin O, et al. A machine learning-based clinical decision support system to identify prescriptions with a high risk of medication error. J Am Med Inform Assoc 2020; 27 (11): 1688–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Woods AD, Mulherin DP, Flynn AJ, Stevenson JG, Zimmerman CR, Chaffee BW.. Clinical decision support for atypical orders: detection and warning of atypical medication orders submitted to a computerized provider order entry system. J Am Med Inform Assoc 2014; 21 (3): 569–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Santos HDPD, Ulbrich A, Woloszyn V, Vieira R.. DDC-outlier: preventing medication errors using unsupervised learning. IEEE J Biomed Health Inform 2019; 23 (2): 874–81. [DOI] [PubMed] [Google Scholar]
  • 13. Akcay S, Atapour-Abarghouei A, Breckon TP. GANomaly: semi-supervised anomaly detection via adversarial training. arXiv, doi: https://arxiv.org/abs/1805.06725, 13 Nov 2018, preprint: not peer reviewed.
  • 14. Otsu N. A threshold selection method from Gray-Level Histograms. IEEE Trans Syst Man Cybern 1979; 9 (1): 62–6. [Google Scholar]
  • 15. Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. J Mach Learn Res 2011; 12: 2825–30. [Google Scholar]
  • 16. Robin X, Turck N, Hainard A, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 2011; 12: 77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Saito T, Rehmsmeier M.. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One 2015; 10 (3): e0118432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Payrovnaziri SN, Chen Z, Rengifo-Moreno P, et al. Explainable artificial intelligence models using real-world electronic health record data: a systematic scoping review. J Am Med Inform Assoc 2020; 27 (7): 1173–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Chen JH, Alagappan M, Goldstein MK, Asch SM, Altman RB.. Decaying relevance of clinical data towards future decisions in data-driven inpatient clinical order sets. Int J Med Inform 2017; 102: 71–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ocab071_Supplementary_Data

Data Availability Statement

The model training and evaluation code used for this project are available online (https://github.com/pharmai-lab/PharmAI_2). The model selection dataset can be shared upon request, under a data-sharing agreement.


Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of Oxford University Press

RESOURCES