Skip to main content
AMIA Annual Symposium Proceedings logoLink to AMIA Annual Symposium Proceedings
. 2020 Mar 4;2019:428–437.

Engaging Pharmacists to Crowdsource a Fine-grained Medication Risk Scale: An Initial Measurement Study Using Paired Comparisons of Medications

Allen J Flynn 1, Greg Farris 1, George Meng 1, Jack Allan 1, Sara Kurosu 1, Natalie Lampa 1, Koki Sasagawa 1
PMCID: PMC7153145  PMID: 32308836

Abstract

A coarse classification of medications into two risk categories, one for high-risk medications and one for all others, allows people to focus safety improvement work on medications that carry the highest risks of harm. However, such coarse categorization does not distinguish the relative risk of harm for the majority of medications. To begin to develop a more fine-grained measurement scale for the relative risk of harm spanning many medications, we performed an experiment with 18 practicing pharmacists. Each pharmacist-participant made 210 paired comparisons of 21 commonly prescribed medications to reveal a subjective scale of perceived medication worrisomeness (PMW). Statistical analyses of their collective judgments of medication pairs differentiated five levels of PMW. This study illuminates one path towards a fine-grained medication risk scale based on PMW. It also shows how the method of paired comparisons can be used to remotely crowdsource expert knowledge in support of learning health systems.

Introduction

Current approaches to medication safety typically focus on mitigating risks from using high-alert medications. According to the Institute for Safe Medication Practices (ISMP), a high-alert medication is “a drug that bears a heightened risk of causing significant patient harm when it is used in error”1,2. Based on evidence about medication errors and adverse drug events, ISMP enumerates individual high-alert medications. One example is promethazine injection, which has a low pH and therefore causes severe injuries in cases of extravasation3. ISMP also specifies that certain therapeutic drug classes are comprised only of high-alert medications, e.g., the opioid therapeutic drug class1.

It is useful to develop policies and procedures for handling high-alert medications like promethazine and opioids4. Pharmacists often lead efforts to develop such policies and procedures within provider organizations5. However, the current binary approach to medication risk categorization, where medications are classified either as high-alert or NOT high-alert, is too coarse to meet growing risk assessment needs in practice. With only these two risk categories, it is impossible to assess the relative risk of harm (RRH) of a specific medication in comparison to others. Also, the current two-category approach to risk categorization is not suitable for determining the composite relative risk of harm (cRRH) of a list of prescribed medications comprising a patient medication regimen. However, multiple studies suggest that relative risk scores for individual prescriptions and whole medication regimens are needed in practice to direct attention to potential problems and to allocate provider effort in more optimal ways6–8.

The broadest goal of this work is to improve medication safety systematically by first developing and then applying a reliable, valid, and fine-grained measurement scale to accurately indicate to prescribers and patients the RRH for every prescribed medication. In the absence of such a scale, pharmacists are devising makeshift scoring tools within electronic health record systems (EHRs) to try and identify patients at the greatest risk for adverse drug events9. This paper reports an initial experiment to realize a new approach to creating a fine-grained medication RRH measurement scale. We show that a crowdsourced scale can be built using paired comparisons and remote knowledge acquisition.

Background and Significance

Besides scoring the relative risk of individual prescriptions and whole medication regimens, there are a growing number of other uses for a reliable, numeric RRH measurement scale for medications9. For example, to safely and incrementally implement artificial intelligence (AI) in pharmacy practice, to uphold safety there is a need to initiate AI solutions in a limited way beginning with those medications with a low or very low RRH10,11. Also, with a more fine-grained medication RRH scale, it would be possible to use risk scores arising from applying the scale as inputs to try and improve many holistic patient risk models, such as hospital readmission risk models12.

Ideally, large quantities of accurate and current data about the actual harm caused by medications would be used to derive a reliable and valid measure of RRH for all medications. Unfortunately, high-quality medication error and adverse drug event data are not available in sufficient quantities to build a comprehensive RRH measurement scale in this way. Estimates indicate that less than 20 percent of actual medication errors and near misses are reported13. Such sparse and incomplete data only support the identification of the most harmful, high-alert medications1,2. This paper explores the feasibility of constructing a fine-grained medication RRH measurement scale with data about perceived medication worrisomeness (PMW) collected from practicing pharmacists using the method of paired comparisons.

There are several reasons to believe that pharmacists may develop an expert sense of the relative risk of harm for a wide range of medications as they practice pharmacy. First, pharmacists are accountable for therapeutic and other outcomes stemming from the use of a vast array of drugs14. Second, pharmacists’ interest in understanding the full breadth of the medication armamentarium distinguishes their practice from specialty medical and nursing practices. Third, pharmacists often develop risk-based medication safety policies and guidelines and apply them in practice5. Fourth, pharmacists are exposed to adverse drug events, likely adding to their insight about medication risks15.

If pharmacists do develop a sense of RRH of medications through practice, then the method of paired comparisons could offer an efficient and effective way to acquire this expert knowledge from them. The analytic method of using paired comparisons has at least a 90-year history16,17. Paired comparisons are used in many scientific fields to establish reliable and valid rating scales17. The method of paired comparisons is especially useful when no natural measurement scale is available17, which is the case for a scale to measure the RRH of many medications. For this study, we chose to infer RRH from perceived medication worrisomeness (PMW) while recognizing that the RRH of medications is most likely a complex multidimensional concept deserving of further, more detailed examination.

The method of paired comparisons provides a way to rank multiple items. Via experiments, paired comparisons enable us to estimate the probabilities of true ratings of multiple items along a continuum of interest17. First, the set of all pairs of items are defined, then participants are asked to judge items against one another head-to-head in a manner that reveals a subjective continuum. Once these judgments are made, the paired comparison data provided by participants can be fit to specialized linear models, like the Bradley-Terry model18.

In the Bradley-Terry model, paired comparison data are represented in terms of initial probability estimates where

(Eq.1)Probability(itemi>itemj)=αiαi+αj,

ij and αi and αj are positive-valued parameters associated with items i and j, for each of the paired comparisons17. These data are then combined for an entire experiment and fitted to a hyperbolic secant probability distribution function. By fitting these data this way, researchers can compute maximum likelihood estimates of the true probability of each item’s performance, along with a corresponding degree of uncertainty17–19. In light of their uncertainties, the maximum likelihood estimates of the true probabilities can then be rank ordered on a single continuum of interest.

We designed and executed a paired comparison experiment. It involved repeated paired comparisons of perceived medication worrisomeness made by pharmacists. We did this to demonstrate, in a preliminary way, the feasibility of using this type of experiment to reveal a comprehensive, fine-grained RRH scale for medications by remotely crowdsourcing pharmacists’ expert knowledge of medication concerns using a novel online data collection tool.

Medications generally have different risk profiles20. It follows that we must manage the risk of harm from medications on a medication-specific basis4. We recognize that in many but not all cases, patient-specific factors mediate or moderate the RRH of medications. However, just as we think of the likelihood of medication side effects for all people, we believe it is reasonable and equally necessary to focus on the general risks of harm from medications for everyone.

To better allocate limited clinician resources and mitigate the risk of harm from using medications, a fine-grained measure of the RRH of prescribed medications is needed6–9. To our knowledge, a broadly conceived, reliable, valid, fine-grained measure of RRH for medications does not exist. This study is significant because it begins to address this unmet need in a rigorous, systematic way. We report an initial step towards a comprehensive measure for assessing whether various prescribed medications have RRHs that are very low, low, moderate, high, or very high. To improve safety and spare provider work-time, we hope that in the future such a measure will direct people’s attention toward a wider array of significant medication risks and away from other insignificant medication issues.

Research Questions

For this study, we investigated three research questions. The first two questions, RQ1 and RQ2, represent tests of the major hypothesis proposed by Bradley and Terry for all paired comparison experiments17,18. To answer RQ1 and RQ2, we test whether or not statistically significant differences from random selection exist to rank medications on a subjective scale of perceived medication worrisomeness17,18. The third question, RQ3, addresses how to scale-up an experiment like this one to achieve a comprehensive fine-grained RRH scale for hundreds of medications.

RQ1. To what degree do a group of practicing pharmacists collectively perceive differences in worrisomeness among single medications in a collection of 21 commonly prescribed medications?

RQ2. To what degree do a group of practicing pharmacists collectively indicate differences in worrisomeness among individual therapeutic categories of medications in a collection of 7 common therapeutic categories?

RQ3. What are some key requirements for using the method of paired comparisons to crowdsource a comprehensive relative risk of harm scale spanning hundreds of medications?

Methods

We begin with a relevant but fictional worked example of our analytic method which comes primarily from the work of Bradley and Terry17,18. After the worked example, we describe more details of our specific experimental methods.

Worked Example of the Analytic Method of Paired Comparisons Applied to Fictitious Medication Comparisons

Consider the following fictional experiment where perceived medication worrisomeness (PMW) for 4 medication items is examined using the method of paired comparisons. The 4 medications are aspirin, acetaminophen, ibuprofen, and ketorolac. They are non-opioid analgesics with varying risk profiles21. The set of all pairs of these 4 medications has a cardinality of (42), or six. These six pairings appear in rows 1 to 6, columns II and III, of Table 1 below.

Table 1.

Fictional data for a worked example of the method of paired comparisons applied to medication choices

I II III IV V
1st in pair 2nd in pair 1st more worrisome 2nd more worrisome
1 aspirin acetaminophen 8 12
2 aspirin ibuprofen 15 5
3 aspirin ketorolac 2 18
4 acetaminophen ibuprofen 17 3
5 acetaminophen ketorolac 3 17
6 ibuprofen ketorolac 0 20

Now, imagine that 20 pharmacists compare the six pairs of medications in Table 1 head-to-head by indicating, independently for each pair, which medication in the pair would cause them to worry most for a patient who takes it. From these 20 repetitions, assume the fictional but realistic data shown in columns IV and V of Table 1 arise. The data in the first-row show that 8 fictitious pharmacists perceived aspirin (1st in pair) to be more worrisome than acetaminophen while 12 other pharmacists perceived acetaminophen (2nd in pair) to be more worrisome than aspirin.

With the data in Columns IV and V of Table 1, Equation 1 above can be used to generate estimates of the true probabilities describing how pharmacists perceive the relative worrisomeness of medications in each pair. Then, by fitting the data in Columns IV and V to an appropriate statistical model, it is possible to test whether or not any of these six medications differ from one another in overall perceived medication worrisomeness (PMW).

To rank order the four medications, M1M4, for PMW after arriving at parameter estimates with standard errors for each one, the data in Table 1, columns IV and V can be fitted to a statistical model for combining paired comparison data. One commonly used model for this comes from Bradley and Terry17,18. Here is an expression of their model in the context of this worked example:

(Eq.2)logit[Probability(MiismoreworrisomethanMj)]=λiλj,

where λi=log(αi) for all i, and λj=log(αj) for all j, for the entire experiment. Assuming all comparisons are independent, parameters {λi,λj} for the overall PMW of each medication can be estimated by maximum likelihood.

To estimate the parameters λ1,...,λ4, which correspond to M1 , … , M4, a system of two maximum likelihood equations can be solved iteratively17. To solve these equations, we used the BradleyTerry2 package for the R statistical computing platform19. What results from this model-fitting procedure are maximum likelihood estimates (MLEs) with standard errors (SEs) for the PMW of aspirin, ibuprofen, and ketorolac. The MLE and SE for acetaminophen set to zero by convention (Table 2, Columns II and III). In Column V of Table 2, we then use Firth and de Menezes method of calculating quasi-standard errors for these MLEs to allow statistical inferences to be made about the degree of difference in PMW between contrasting pairs of medications22.

Table 2.

Fictional results for a worked example of the method of paired comparisons for 4 medications. Column III shows maximum likelihood estimates for perceived medication worrisomeness from the Bradley-Terry model.

I II III IV V
Medication PMW MLE Standard Error (SE) Quasi Standard Error
1 ketorolac 1.86 (highest) 0.50 0.45
2 acetaminophen 0.00 0.00 0.27
3 aspirin - 0.47 0.37 0.26
4 ibuprofen - 1.73 (lowest) 0.45 0.35

Generating results like the fictional results in Table 2 above is a goal of this study. At this point, we have completed our review of a fictitious worked example detailing the analytic and statistical methods for this study.

Medication and Therapeutic Category Selection

For this study, we used the two criteria listed directly below to select a total of 21 medications for comparison. The five medications marked with a diamond (*) are designated as high-alert medications by ISMP.

  1. 1. Each medication appears on the Top 200 most commonly prescribed drugs list for 201823

  2. 2. Each medication is a member of one of these seven therapeutic drug categories or drug classes
     
    a. Antibiotics (amoxicillin, azithromycin, ciprofloxacin)
    b. Anticoagulants (apixaban*, rivaroxaban*, warfarin*)
    c. Antidepressants (amitriptyline, citalopram, sertraline)
    d. Antidiabetics (glyburide*, liraglutide, metformin*)
    e. Antihypertensives (amlodipine, lisinopril, losartan)
    f. Benzodiazepines (alprazolam, clonazepam, lorazepam)
    g. Non-steroidal anti-inflammatories Non-steroidal anti-inflammatories (aspirin, ibuprofen, naproxen)

To reach our complement of 21 medications, those that are listed above in parentheses, we selected three medications from seven different therapeutic drug categories. Only very commonly prescribed medications were included in an effort to ensure that pharmacists would be very familiar with all 21 medications. Settling on 21 as the total number of medications gave rise to a total of (212) or 210 pairs of medications for comparison by each pharmacist-participant. To show that this number of comparisons is workable, we ran several tests and confirmed that pharmacist-participants can comfortably make 210 paired medication comparisons in less than an hour.

We intentionally included medications for acute problems (e.g., antibiotics) and others for chronic diseases (e.g., antihypertensives). We included controlled substances (i.e., benzodiazepines) and non-controlled substances. Most of the medications come in oral tablets or capsules, but we added one injectable medication (liraglutide). Also, we included concerning medications associated with many adverse drug events (e.g., anticoagulants24) but also generally safe over-the-counter medications (non-steroidal anti-inflammatory drugs called NSAIDs).

Rationale for Not Permitting Ties When Pharmacist-Participants Make Paired Comparisons

We did not permit pharmacist-participants to declare ties in PMW when making paired comparisons. Our rationale for not permitting ties is based on the assumption that no two medications have precisely the same risk profile or carry an identical risk of harm. To check this assumption, we collected timestamps for each paired comparison choice made serially by pharmacist-participants and used this timestamp information to look for any paired comparisons that were difficult or very difficult for the majority of pharmacist-participants to make.

Determination of the Number of Pharmacist-Participants to Recruit

Using the method for estimating experiment size from the second edition of David’s 1988 textbook on paired comparisons (p. 109)25, we sought to determine an appropriate number of pharmacist-participants to recruit. Recall that we have a fixed number of 21 medication items (or treatments, t) to be compared and ranked. David indicates that, for t=21, to ensure at least a predetermined probability of 0.95 for the selection of the medication perceived to be most worrisome, assuming a minimum parameter difference of between 0.1 and 0.15 between the medication perceived to be most worrisome and the next most worrisome medication(s), 10 to 20 subjects must make all 210 paired comparisons. On this basis, we sought to recruit between 10 and 20 pharmacist-participants for this study.

Recruitment of Pharmacist-Participants

For this study, we recruited pharmacist-participants who met the following two inclusion criteria.

  1. 1. The pharmacist-participant is licensed and registered to practice pharmacy in at least one U.S. state

  2. 2. As part of their current work role, the pharmacist-participant performs some direct patient care duties

To recruit pharmacist-participants who met the two criteria above, we executed two non-probability sampling strategies. Our primary recruitment strategy was referral or snowball sampling. We sent invitation messages by e-mail to 20 hospital, ambulatory care, and community pharmacists asking them to participate and to invite other pharmacists they know to participate. We deliberately sent these messages to individuals in several states, including Michigan, Ohio, Indiana, Illinois, and Connecticut. Our secondary recruitment strategy was to advertise on the Michigan Pharmacists Association’s public LinkedIn account. We posted three paid messages inviting pharmacists to participate. When pharmacists responded, we shared more information, including a consent form, and offered them a $50 gift card in exchange for an hour of their time spent as a study participant.

Design, Development, and Testing of CrowdSort – An Online Paired Comparison Data Collection Tool

For this study, we developed an online web application tool, or web app, to collect paired comparison data from remote pharmacists (Figure 1). We call this web app CrowdSort because it enables people to participate in sorting or ranking using the method of paired comparisons. With CrowdSort, we collected data when participants joined us remotely from their worksites around the U.S. for individually-scheduled, hour-long conference calls.

Figure 1.

Figure 1.

View of CrowdSort web application for making paired comparisons online with a web browser

CrowdSort’s source code is available at github.com/kgrid-demos/crowdsort. Three team members led CrowdSort development (AF, GM, JA). For the front end, several team members (JA, SK, NL, KS) used HTML, CSS, Vue.js, and JavaScript to establish CrowdSort’s user interface. For the back end, three team members (GF, GM, SK) deployed CrowdSort on a cloud server at Heroku (heroku.com) and then used the mLab Heroku add-on to establish a persistent connection to a database operating on the Heroku platform. Once deployed in this way, the CrowdSort web app posted each paired comparison to our cloud database with a date and timestamp. Three team members (AF, JA, KS) developed and cross-checked two software tools for transforming the raw paired comparison data into a format suitable for analysis using the BradleyTerry2 package19 for the R statistical computing platform.

We designed CrowdSort’s user interface so that it would be simple to use. Our user interface design process began by examining the user interfaces of the comparison-making websites either.io and rrrather.com. With the designs of these two existing websites in mind, we created CrowdSort’s similar user interface (Figure 1).

For users to complete each paired comparison using CrowdSort, they simply click on one of the two text options appearing in the two onscreen boxes (Figure 1). These two boxes, highlighted by a shadow effect, are the chief feature of this user interface. For that reason, they have different colors and appear in the middle of the screen.

As an example of a paired comparison, in Figure 1 above, a single comparison is portrayed between the antibiotic ciprofloxacin in the blue box on the left and the antidiabetic drug metformin in the brown box on the right.

After the user clicks on one box or the other to make their choice, a new pair of medications is displayed until all 210 paired comparisons have been made. This first version of CrowdSort, which is accessible online at crowdsort.herokuapp.com, is designed NOT to allow users to indicate ties in perceived medication worrisomeness.

To mitigate order effects, whenever a web browser reloads CrowdSort, both the order of the 210 pairs and their left-right position onscreen are randomly shuffled. To mitigate effects of using colors to distinguish the two onscreen boxes visually, we chose the colors brown and blue because they are NOT safety related colors, like red, orange, yellow, or green. The two colors also randomly shift between left and right as the user makes paired comparisons.

To test CrowdSort, four research team members worked to develop, document, and run tests of this system, uncovering and fixing software defects in the process (JA, SK, NL, KS). Then, to confirm CrowdSort was ready to support remote data collection, we conducted two trial runs with University of Michigan student pharmacists. Because these two trial runs were both successful, data collection activities commenced.

Prompt to elicit pharmacists’ thoughts about perceived medication worrisomeness

For this study, we gave careful consideration to the prompt and instructions for pharmacist-participants who performed the comparisons of 210 pairs of medications. We prompted them to select which medication in each pair would cause them to worry more than the other if the medication was indicated and prescribed for one of their patients (Figure 1). This prompt was devised to elicit pharmacists’ initial reactions to each pair of medications in a way that does not suggest a correct answer. As much as possible, we wanted to avoid having the task of making comparisons feel like taking a test or exam. We deliberately avoided the term “risk” in the prompt because, unlike “worry”, which is subjective, “risk” could elicit objective or legalistic meanings in the minds of participants. We assume that the concepts of perceived medication worrisomeness and perceived medication risk are closely and positively correlated.

Results

This section begins with several overall results followed by results which address each research question, RQ1-RQ3.

A total of 18 pharmacist-participants from seven different states were recruited for this study. Eight men and 10 women were included. Seven pharmacist-participants were community pharmacists and 11 were hospital pharmacists. All 18 pharmacist-participants completed the data collection process by using the CrowdSort web app to perform 210 paired comparisons. Their work resulted in a dataset with 3,780 paired comparisons of PMW, including 360 comparisons for each of the 21 medications. After removing all comparisons of drugs in the same therapeutic category, we found that pharmacist-participants indirectly made 3,402 paired comparisons of the seven therapeutic categories listed above.

We analyzed the time required for pharmacist-participants to make 210 paired medication comparisons with the CrowdSort web app. On average, they completed this task in 16 ± 8 minutes (Range: 9 to 41 minutes). To detect whether some comparisons might have been more difficult than others, we used the timestamps for each comparison to look for paired comparisons that required pharmacist-participants three or five times more than their average comparison-making interval. No pairing required significantly more time than average for the majority of participants.

RQ1. To what degree do a group of practicing pharmacists collectively perceive differences in worrisomeness among single medications in a collection of 21 commonly prescribed medications?

Results relevant to answering RQ1 are illustrated and summarized in Figure 2 below. We found that our small crowd of 18 practicing pharmacists collectively perceive significant differences in perceived medication worrisomeness (PMW) among the 21 medications compared. The anticoagulant warfarin is perceived as most worrisome overall. It was judged more worrisome in 346/360 (96%) of its head-to-head pairings. The hypertension medication amlodipine and the antibiotic amoxicillin were least worrisome overall. These two medications were judged more worrisome in only 48/360 (13%) and 58/360 (16%) of their head-to-head pairings, respectively. Using quasi standard errors to enable medication-by-medication comparisons, our results reveal five emerging PMW groups (Figure 2). Between the groups of medications perceived to be least worrisome and most worrisome, three intermediate groups can be seen.

Figure 2.

Figure 2.

PMW estimates with Quasi Standard Errors for 21 medications abbreviated on the X-axis. Diamonds (♦) indicate ISMP high-alert medications. Select estimates are noted. Percentages are the empirical proportion of pairings when amlodipine, amoxicillin and warfarin were judged more worrisome.

As noted, warfarin♦, which is marked with a diamond because it is an ISMP high-alert medication, has the highest estimated PMW. Next, in the High Intermediate group, two other high-alert anticoagulants, apixaban♦ and rivaroxaban♦, appear (Figure 2). The Intermediate group includes all 3 benzodiazepines (alprazolam, clonazepam, lorazepam), an antidepressant (amitriptyline), an antibiotic (ciprofloxacin), and a high-alert antidiabetic (glyburide ♦).

The Low Intermediate group includes all 3 NSAIDS (aspirin, ibuprofen, naproxen), an antibiotic (azithromycin), two antidepressants (citalopram, sertraline), two antidiabetic drugs (liraglutide, metformin♦), and two antihypertensives (lisinopril, losartan). Finally, amoxicillin and amlodipine have the lowest estimated PMW scores (Figure 2).

We assessed goodness of fit and found that our medication paired comparison data fit the Bradley-Terry model well. To do this, we computed standardized residuals for each of the 210 paired comparisons. Upon inspection, we found the distribution of these residuals to be platykurtic and slightly skewed compared to the normal distribution (Kurtosis = 2; Skewness = 0.25; W/S7.5, n=210). Besides, we found that 171 of 210 (81%) of the fitted probabilities fit within less than one standard deviation from the Standardized Residual Mean (SRM), while 35 (17%) fell between 1 and 2 standard deviations of the SRM and four (2%) fell beyond 2 standard deviations from the SRM.

RQ2. To what degree do a group of practicing pharmacists collectively indicate differences in worrisomeness among individual therapeutic categories of medications in a collection of 7 common therapeutic categories?

Results relevant to this question are illustrated and summarized in Figure 3 below. By making 210 paired comparisons of medications associated with the seven therapeutic categories, participant-pharmacists indirectly indicated that significant differences exist in their minds regarding the relative worrisomeness of these therapeutic drug categories.

Figure 3.

Figure 3.

ITCW with Quasi Standard Errors for 7 categories abbreviated on the X-axis. Percentages are the empirical proportion of pairings when antihypertensives and anticoagulants were judged more worrisome.

As shown below in Figure 3, we infer from the paired comparison data for medications that the therapeutic category of medications perceived to be most worrisome is the anticoagulants category. Pharmacist-participants indicated the anticoagulant in a pair was more worrisome than medications from all other classes 93% of the time. The therapeutic category evidently perceived to be least worrisome was antihypertensives. Pharmacist-participants indicated that the antihypertensive medication in a pair was more worrisome than medications from other therapeutic classes only 21% of the time. Using quasi standard errors to enable category-by-category comparisons, our results reveal a total of four distinct, emerging groups of inferred therapeutic category worrisomeness (ITCW).

As a class, anticoagulants have the highest estimated PMW. Next, in the High Intermediate group, the benzodiazepine therapeutic category appears by itself (Figure 3). The Low Intermediate group includes antibiotics, antidepressants, antidiabetics, and NSAIDs. The least worrisome therapeutic category in these results is the antihypertensives category.

We assessed goodness of fit and found that our inferred paired comparison data for the seven therapeutic categories fit the Bradley-Terry model moderately well. To check goodness of fit we computed standardized residuals for the E: or 21 comparisons. Upon inspection, we found the distribution of this low number of 21 residuals to be very platykurtic and a little skewed compared to the normal distribution (Kurtosis = 0.2; Skewness = -0.5; W/S3.9, n=21). However, we also found that all 21 of the fitted probability values from using the Bradley-Terry model for the paired comparisons for the therapeutic categories fit within less than one standard deviation from the Standardized Residual Mean (SRM).

RQ3. What are some key requirements for using the method of paired comparisons to crowdsource a comprehensive relative risk of harm scale spanning hundreds of medications?

By doing this study, we learned about three sets of requirements that must be met to scale up paired comparison experiments with the CrowdSort web app to achieve large-scale crowdsourcing of expert knowledge by this method.

Access Requirements

The first set of requirements that must be met are access requirements. The first version of the CrowdSort web app surfaced several access issues which we overcame. However, CrowdSort is currently not accessible enough to be used for a large-scale experiment. Future versions of the CrowdSort web app need to enable paired comparison data collection on most web browsers and not only on the Google Chrome web browser that we used. Also, to scale up the use of CrowdSort, it needs to be upgraded to a responsive design that can be accessed on various devices including smart phones, tablets, laptops, and desktop computers. Furthermore, CrowdSort must be secured using a secure sockets layer (SSL) certificate. Also, CrowdSort must be made accessible to multiple simultaneous users.

Authorization Requirements

The second set of requirements relate to user authorization. For this study, we collected background information and made inquiries to ensure that every participant was a licensed pharmacist working in a role with patient care duties. Because we collected data remotely but synchronously during live video conference calls we could manually authorize pharmacists to participate. To expand the use of CrowdSort, a more robust and scalable method of validating user credentials is required. We are presently exploring options for automating the user authorization process.

Paired Comparison Task Division Requirements

The third set of requirements are intrinsic to the method of paired comparisons. To develop a fine-grained RRH scale for the majority commonly used medications requires 300 to 400 medications to be compared. Therefore, the number of head-to-head comparisons that must be made would range from 44,850 to 79,800. So many thousands of paired comparisons are far too many for individual pharmacists to complete them all. Instead, we are exploring ways to divide this sizeable comparison-making task up amongst a large number of pharmacist-participants, such as the cyclic paired comparison experiment designs described by David25.

Discussion

We began with a goal of demonstrating the feasibility of crowdsourcing a fine-grained scale for measuring the RRH of many medications. We confirmed that the method of paired comparisons can be used to acquire expert judgments about perceived medication worrisomeness remotely from a diverse group of practicing pharmacists. We are encouraged by the finding that our participants could complete 210 head-to-head medication comparisons in an average of 16 minutes. This is evidence that, with effective software and incentives, the approach we have taken could scale up to support a large crowdsourcing program to estimate RRH for hundreds of medications.

In terms of granularity, our results for PMW (Figure 2) and for ITCW (Figure 3) demonstrate that it is possible to rank medications into multiple risk-informed PMW categories. By systematically probing pharmacists’ mental models of medication worrisomeness, we found statistically meaningful differentiation among five groups of medications and four groups of therapeutic drug classes. Perhaps more compelling is that these credible results came from engaging just 18 pharmacists. We hypothesize that more groups of medications will emerge on the PMW scale as additional paired comparison data enable more precise estimates. Besides, we did not seek to find definite lower or upper bounds for PMW. More work is needed to determine which medications are perceived to be least and most worrisome of all.

Our PMW scale has four different levels containing high-alert medications (Figure 2). Also, PMW does NOT consistently align with the distinction between prescription-only (i.e., Legend) versus over-the-counter (OTC) drugs. In our results, two Legend medications, amlodipine and amoxicillin, had the lowest PMW estimates. Meanwhile, three OTC medications – aspirin, ibuprofen (Motrin), and naproxen (Aleve) – had remarkably higher PMW estimates than amlodipine and amoxicillin. We did not ask the pharmacist-participants why they perceive two Legend medications to be less worrisome than three OTC drugs. It could be that, taken together, the relatively wide therapeutic windows of amoxicillin and amlodipine, the typically short duration of amoxicillin use, and the fact that amlodipine (Norvasc) is generally well tolerated, all led to these two medications being ranked least worrisome of the 21 in the study set.

It is clear that the 18 pharmacist-participants perceived anticoagulants to be the most worrisome of the study set, especially warfarin. The finding of high PMW and ITCW for anticoagulants may reflect nationwide safety problems with serious but avoidable bleeding events caused by warfarin and other anticoagulants24. These findings suggest that a prescription for any oral anticoagulant may raise a mental warning flag in the minds of practicing pharmacists.

This study pertains to our prior published work examining how pharmacists currently develop medication-related risk scores and patient medication regimen complexity scores. Presently, many attempts to develop medication risk and complexity scores are localized at single organizations and involve assigning points to a small number of concerning medications, often with the most points going to those medications with the narrowest therapeutic windows. Few of these locally-devised medication-related scoring mechanisms have been validated9. In contrast, our approach is to develop and validate a rigorous, comprehensive RRH scale for medications by combining the judgments of pharmacists from many organizations using the method of paired comparisons. In this manner, we seek to develop a robust, comprehensive, reliable, and valid RRH scale for assessing prescriptions and whole medication regimens.

More generally, this study highlights a potential role for the paired comparisons method in large-scale knowledge acquisition for learning health systems. To be fully successful, we believe learning health systems will need methods and mechanisms like CrowdSort to routinely and methodically source expertise from human practice on a grand scale. The new knowledge that results from crowdsourcing will need to be carefully tested and curated to make it useful.

Limitations

This study is limited in a variety of ways. First and foremost, a single experiment with 18 pharmacist-participants does not allow us to generalize our findings about the worrisomeness of medications in any meaningful way. We plan to do future work to expand, validate, and test the impact of these findings. Another limitation is that we only included 21 medications yet 300 to 400 medications need to be rank-ordered in terms of RRH to arrive at a useful measurement instrument. Finally, we were unable to include participant interviews in this study and so we cannot explain the logic of our pharmacist-participants with respect to the task of comparing the worrisomeness of medications head-to-head.

Conclusion

By engaging 18 practicing pharmacists from around the United States in a paired comparison experiment, we ascertained their individual and collective judgments about the perceived medication worrisomeness of 21 commonly prescribed medications. We performed data collection remotely using a web application developed for this study called CrowdSort. We found the pharmacists in our sample perceive the relative worrisomeness of medications in a systematic way giving rise to five distinct groupings of medications. This result shows it is feasible to group medications into multiple risk-related categories, transcending the binary categories of high-alert and NOT high-alert that we use in practice today. Further, we showed that some therapeutic categories are more worrisome than others in the minds of our pharmacist-participants. In preparation for future work, we identified several key requirements for scaling up this initial experiment. This study demonstrates the feasibility of developing a comprehensive fine-grained measure of the relative risk of harm by crowdsourcing expert knowledge from pharmacists.

Acknowledgements

We thank our pharmacist-participants. We are grateful to pharmacist Bruce Chaffee for motivating this work. We thank Rachel Kuo and Melanie Johnson for helping us test the Crowdsort web app. For guidance on our statistical approach, we thank Kerby Shedden and Corey Lester. For their general help in preparation for this experiment, we recognize and thank Betty Chaffee, Peter Boisvert, Charles P. Friedman, Nate Gittlen, and Brooke Raths.

Figures & Table

References

  • 1.Institute for Safe Medication Practices (ISMP) High-alert medications in acute care settings. Available from: www.ismp.org/recommendations/high-alert-medications-acute-list. [Google Scholar]
  • 2.Institute for Safe Medication Practices (ISMP) High-alert medications in community/ambulatory settings. Available from: www.ismp.org/recommendations/high-alert-medications-community-ambulatory-list. [Google Scholar]
  • 3.Grissinger M. Preventing serious tissue injury with intravenous promethazine (Phenergan) P&T Pharm. Ther. 2009;34:175–6. [PMC free article] [PubMed] [Google Scholar]
  • 4.Graham S, Copp MP, Kostek NE, Crawford B. Implementation of a high-alert medication program. Perm. J. 2008;12:15–22. doi: 10.7812/tpp/07-135. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Billstein-Leber M, Carrillo JD, Cassano AT, Robertson JJ. ASHP guidelines on preventing medication errors in hospitals. Am. J. Heal. Pharm. 2018;75:1439–1517. doi: 10.2146/ajhp170811. [DOI] [PubMed] [Google Scholar]
  • 6.Damlien L, Davidsen N, Nilsen M, Godo A, Moger T, Viktil KK. Drug safety at admission to emergency department: an innovative model for PRIOritizing patients for MEdication Reconciliation (PRIOMER) Eur. J. Emerg. Med. 2017;24:333–339. doi: 10.1097/MEJ.0000000000000355. [DOI] [PubMed] [Google Scholar]
  • 7.George J, Phun YT, Bailey MJ, Kong DCM, Stewart K. Development validation of the medication regimen complexity index. Ann. Pharmacother. (2004);38:1369–1376. doi: 10.1345/aph.1D479. [DOI] [PubMed] [Google Scholar]
  • 8.Vande Griend JP, Saseen JJ, Bislip D, Emsermann C, Conry C, Pace WD. Prioritization of patients for comprehensive medication review by a clinical pharmacist in family medicine. J. Am. Board Fam. Med. 2015;28:418–424. doi: 10.3122/jabfm.2015.03.140303. [DOI] [PubMed] [Google Scholar]
  • 9.Flynn A, Mo H, Nguyen JV, Chaffee BW. Initial study of clinical pharmacy work prioritization tools. Am. J. Heal. Pharm. 2018;75:1122–1131. doi: 10.2146/ajhp170398. [DOI] [PubMed] [Google Scholar]
  • 10.Wheeler S, Patka J. Analysis of the autoverification process of medication orders placed in the emergency department. Crit. Care Med. 2018;46:600. [Google Scholar]
  • 11.Flynn AJ. Opportunity cost of pharmacists’ nearly universal prospective order review. Am. J. Health. Syst. Pharm. 2009;66:668–70. doi: 10.2146/ajhp070671. [DOI] [PubMed] [Google Scholar]
  • 12.Kansagara D, Englander H, Salanitro A, Kagen D, Theobold C, Freeman M, Kripalani S. Risk prediction models for hospital readmission: a systematic review. JAMA. 2011;306:1688–1698. doi: 10.1001/jama.2011.1515. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Levinson DR. Hospital incident reporting systems do not capture most patient harm. 2012 Jan; AHRQ. [Google Scholar]
  • 14.McBane SE, Dopp AL, Abe A, Benavides S, Chester EA, Dixon DL, et al. Collaborative drug therapy management and comprehensive medication management. Pharmacotherapy. 2015;35 doi: 10.1002/phar.1563. [DOI] [PubMed] [Google Scholar]
  • 15.Gavaza P, Brown CM, Lawson KA, Rascati KL, Wilson JP, Steinhardt M. Examination of pharmacists ’ intention to report serious adverse drug events ( ADEs ) to the FDA using the theory of planned behavior. Res. Soc. Adm. Pharm. 2011;7:369–382. doi: 10.1016/j.sapharm.2010.09.001. [DOI] [PubMed] [Google Scholar]
  • 16.Thurstone L. Psychophysical analysis. Am. J. Pyschology. 1927;38:368–389. [PubMed] [Google Scholar]
  • 17.Bradley RA. Paired comparisons: some basic procedures and examples, in Handbook of Statistics, Volume , eds. Krishnaiah P. and Sen P. 1984:299–326. Elsevier Science Publishers. [Google Scholar]
  • 18.Bradley RA, Terry ME. Rank analysis of incomplete block designs : I. the method of paired comparisons. 1952;39:324–345. [Google Scholar]
  • 19.Turner H, Firth D. Bradley-Terry models in R : the BradleyTerry2 package. Journal of Statistical Software. 2012 May 24;48(9) [Google Scholar]
  • 20.DailyMed. U.S. National Library of Medicine Website (2019) Available from: dailymed.nlm.nih.gov/dailymed/index.cfm. [Google Scholar]
  • 21.Graumlich JF. Preventing gastrointestinal complications of NSAIDs. 2001;109:117–128. doi: 10.3810/pgm.2001.05.931. [DOI] [PubMed] [Google Scholar]
  • 22.Firth D, De Menezes RX. Quasi-variances. Biometrika. 2004;91:65–80. [Google Scholar]
  • 23.Top 200 Drugs List of 2019. Clincalc.com. Available from: clincalc.com/DrugStats/Top200Drugs.aspx. [Google Scholar]
  • 24.United States Department of Health and Human Services (USDHHS) National action plan for adverse drug event prevention. 2014 USDHHS. [Google Scholar]
  • 25.David HA. Oxford University Press; 1988. The method of paired comparisons. [Google Scholar]

Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES