Abstract
Objective Medication-indication information is a key part of the information needed for providing decision support for and promoting appropriate use of medications. However, this information is not readily available to end users, and a lot of the resources only contain this information in unstructured form (free text). A number of public knowledge bases (KBs) containing structured medication-indication information have been developed over the years, but a direct comparison of these resources has not yet been conducted.
Material and Methods We conducted a systematic review of the literature to identify all medication-indication KBs and critically appraised these resources in terms of their scope as well as their support for complex indication information.
Results We identified 7 KBs containing medication-indication data. They notably differed from each other in terms of their scope, coverage for on- or off-label indications, source of information, and choice of terminologies for representing the knowledge. The majority of KBs had issues with granularity of the indications as well as with representing duration of therapy, primary choice of treatment, and comedications or comorbidities.
Discussion and Conclusion This is the first study directly comparing public KBs of medication indications. We identified several gaps in the existing resources, which can motivate future research.
Keywords: drug therapy, off-label use, indication, appropriateness, electronic health records
BACKGROUND
With the wide adoption of electronic health records (EHRs), there has been tremendous focus on developing automated solutions focused on the appropriate use of medications. Computerized clinical decision support (CDS) has been successfully used to prevent medication errors and adverse effects.1 Designing CDS solutions to promote appropriate medication use faces 2 challenges—determining the reason for the medication prescription and representing this information. Medications are prescribed to treat or prevent different signs, symptoms, diseases, or conditions, which are collectively known as indications.2 While many indications are relatively simple to represent (eg, insulin is indicated in patients with type 1 diabetes mellitus), some are more complex (eg, bismuth subsalicylate is indicated in the treatment of Helicobacter pylori infection only when combined with antibiotics). At least 1 indication is listed on the drug label for each medication (on-label indications), but many medications are also used to treat or prevent conditions that are not explicitly listed on the drug label (off-label indications). It is estimated that 21% of medication prescriptions are for off-label use,3,4 and this rate is estimated to be greater in the pediatric population.5,6
Linking medications to their indications is essential to providing effective care. It has been shown that treatment outcome and health care quality may improve once such links are made, either manually or electronically.7,8 Medication-indication information is a necessary part of the information that is needed to determine appropriate use of medications. Medications can be deemed inappropriate for several other reasons such as allergies, drug interactions, side effects, etc. However, it is essential to establish that a medication is being administered for an appropriate indication to justify that there is at minimum a need for that drug. This depends on the availability of a comprehensive, structured knowledge base (KB) of medications and their indications. Such a KB can be used to support order entry, for example, through the use of indication-based prescribing.9 It can also be used to identify cases of potential overuse of medications.10
While public resources containing medication-indication data have been available for a long time (including MedicineNet11 since 1996, DrugBank12 since 2006, and DailyMed13 since 2008), these resources only provide the indications in unstructured form (ie, free text). Several vendors have also developed proprietary datasets that include medication-indication data—MediSpan (Wolters Kluwer, Alphen aan den Rijn, Netherlands), Epocrates (Epocrates Inc, San Mateo, California), and Lexicomp (Lexi-Comp, Inc, Hudson, Ohio)—but these products are difficult to use for research due to associated costs and licensing restrictions. Consequently, there have been a number of efforts toward developing comprehensive KBs of on- and off-label indications of medications for public use.
We aimed to identify studies discussing the design and evaluation of medication-indication KBs, determine the similarities and differences of approaches used in these studies, and identify potential knowledge gaps. We accomplished this by conducting a systematic review of the literature and subsequently evaluating whether these KBs support the level of complexity necessary to comprehensively represent medication-indication information.
METHODS
Systematic review
We searched the titles and abstracts of all articles cited in PubMed, PubMed Central, EMBASE, and CINAHL through September 30, 2014 using the following search strategy: (medication$ OR drug$) AND (indication$ OR “off-label”) AND (resource$ OR knowledge$). The search strategy uses “$” as the wildcard character, and it was modified according to the specifications of each bibliographic database.
We used the following inclusion and exclusion criteria for study selection: (1) studies without an English abstract were excluded (we did not impose any language restrictions to the full text of the article); (2) only studies were included that described the design and/or evaluation of a medication-indication KB, terminology, or database; and (3) only studies where the resource was publicly available were included.
Titles and abstracts were reviewed independently by 2 authors (HS and CF), and disagreements were resolved using a third author (HC) as an arbitrator. We then obtained the full texts of all articles that were deemed potentially relevant and excluded articles that did not meet the above criteria. For each of the articles that were included, we also reviewed the reference list to identify any other relevant studies.
Once the final list of included studies was populated, we extracted the following information from each article: name of resource discussed, methods used for developing/evaluating the resource, and the results of such evaluations. We grouped all studies that were about the same resource and analyzed them together.
Evaluation of the KBs
We obtained a copy of each KB identified in the previous step and collected basic descriptive information, including the number and nature of entries, as well as representational characteristics and the scope of knowledge included in the KB. We compared the scope and coverage of the KBs in 3 different ways—based on the number of medications included, number of indications included, and number of medication-indication pairs included. While some KBs defined medications at the level of the main ingredient (IN), others defined it at a finer level of granularity (ie, specific dose forms). To be able to compare these KBs, we brought them all to the same level by mapping every medication concept to the main IN using RxNorm Application Programming Interface.14 Details of this process are explained in online supplement 1.
Similarly, while most resources defined the indications using Unified Medical Language System (UMLS) concepts, some only represented the indications as free text. We normalized the free-text indications to the UMLS using the methods described in online supplement 1, and we additionally mapped those concepts to the Systematized Nomenclature of Medicine—Clinical Terms (SNOMED CT). In the end, we identified the main IN for >80% of the unique medications in each KB. In 2 KBs, this number was 100%; and in another 2, it was >98%. Likewise, in all but 1 KB, we successfully normalized >80% of the indications to SNOMED CT; and in 4 KBs, this figure was above 95%.
We used SNOMED CT because we discovered, during manual exploration of the KBs, that indications may be defined at various levels of granularity. Therefore, we decided to use the hierarchy of concepts in SNOMED CT to assess whether the overlap of KBs would be different if some of these differences in granularity were removed. Specifically, we calculated the overlap between medication-indication in 2 ways. First we counted the number of medication-indication pairs that were shared between 2 KBs once medications and indications were mapped to RxNorm and SNOMED CT. This way, if 1 resource only listed asthma as an indication for budesonide and another resource only listed intermitted asthma as an indication, those 2 indications would not have any overlap in the indications listed for this drug. Next, we augmented the indications in each KB by assigning the “immediate children” of each indication (from SNOMED CT) to all drugs associated with that indication and then repeated the overlap analysis. This way, those 2 resources would have 1 shared indication for budesonide because intermittent asthma is an immediate child of asthma in SNOMED CT.
Additionally, we qualitatively evaluated whether each KB could support representation of complex indication information. In order to accomplish the latter, we asked a group of pharmacists and clinicians to provide us with a list of complex or challenging indications for commonly prescribed medications. We then organized these indications based on the type of complexity and identified 8 different types of complexities—4 corresponded to characteristics of the medication, and 4 were associated with characteristics of the indication (table 1). Finally, we evaluated whether the KBs could represent these complexities.
Table 1.
List of complex or challenging indications that were used to evaluate the medication-indication KBs
| Complexity | Example | Description of the example | |
|---|---|---|---|
| Characteristics of the medication | Dosage forms | Vancomycin for Clostridium difficile infection | Vancomycin is used intravenously for all of its indications, with the exception of recurrent C difficile infection, in which case it is used orally. Therefore, the KB should associate the indication with the correct dose form of the medication. |
| Route | Heparin for prevention and treatment of venous thrombosis | The route of administration of heparin is different when it is used for prevention of venous thrombosis (subcutaneously) vs treatment of venous thrombosis (intravenously). | |
| Strength | Heparin for venous thrombosis | The dosage of heparin used for prevention of venous thrombosis is different (ie, lower) than the dosage used for treatment of venous thrombosis. | |
| Duration | Cystitis vs recurrent cystitis | The appropriate duration of antibiotic therapy for the initial episode of cystitis (3 days) is different from the appropriate duration of therapy for treating recurrent cystitis (5–7 days). | |
| Characteristics of the indication | Primary choice | First-line treatment for essential hypertension | Thiazide diuretics, calcium channel blockers, and angiotensin-converting enzyme inhibitors are the preferred medications as the initial therapy for essential hypertension. Other medications that can be used to lower blood pressure (such as direct vasodilators or alpha agonists) are not indicated as first-line therapy. |
| Comorbidities | Heart failure and asthma | The indicated therapy for heart failure in patients with asthma is different than those without asthma because nonselective beta-blockers may cause bronchospasm and aggravate asthma. | |
| Prevention vs treatment | Aspirin for prevention of cardiovascular risk | Aspirin is indicated in adults with risk factors for cardiovascular disease. The actual indication is preventative, and aspirin is prescribed in the context of other diseases, such as hypertension or diabetes mellitus, that are risk factors for cardiovascular disease. | |
| Comedication | Bismuth for eradication of Helicobacter pylori | Bismuth compounds are only used for eradication of H pylori in conjunction with other medications, including 2 or more antibiotics. Bismuth is not indicated as monotherapy for H pylori eradication. |
RESULTS
Our electronic searches retrieved 3791 documents, of which 968 were duplicates (figure 1). From the remainder, 2459 were excluded after reviewing the titles and abstracts, and 42 were marked as potentially relevant. By reviewing the full texts, 8 articles were deemed relevant.15–22 Citation tracking yielded an additional 7 relevant articles.23–28 After applying the inclusion and exclusion criteria and consolidating multiple articles that were about the same resource, 7 public medication-indication KBs were identified and included in this review (table 2).
Figure 1:
Flow diagram of study selection.
Table 2.
Description of publicly available medication-indication KBsa
![]() |
Abbreviations: KBs, knowledge bases; NDF-RT, National Drug File–Reference Terminology; SPL, structured product label; ID, identifier; EHR, electronic health record; IQR, interquartile range; UMLS, Unified Medical Language System; COSTART, Coding Symbols for a Thesaurus of Adverse Reaction Terms; ICD-9, International Classification of Disease, Ninth Revision.
aShaded cells indicate positive values (ie, SIDER is curated automatically, and NDF-RT includes both on-label and off-label indications).
Below, we will first describe the resources based on the review of the literature, and then we will evaluate the content and structure of the resources as well as their strengths and weaknesses in representing complexities of medication indications. KBs are described in a chronological order based on the earliest publication found for each KB.
Review of the literature
The first public medication-indication KB discussed in the publications is the National Drug File–Reference Terminology (NDF-RT) developed by the US Department of Veteran Affairs.15 The medication-indication data in the NDF-RT was initialized using a list of diseases and drugs frequently co-occurring in abstracts cited in MEDLINE (the list can be obtained from http://mbr.nlm.nih.gov/MRCOC.shtml), and subsequently screened by experts. Apart from indications, the NDF-RT provides additional information about medications, including but not limited to contraindications, pharmacokinetics, physiologic effect, and therapeutic class. The National Drug File–Reference Terminology is also included in the UMLS,29 which can facilitate linking NDF-RT concepts to concepts in other medication-related terminologies, such as RxNorm,30 or disease-related vocabularies, such as SNOMED CT.31 The National Drug File–Reference Terminology has also been incorporated into RxNorm since June 2010.32
First introduced in 2010, Side Effect Resource (SIDER)16 is a KB of medications and their adverse effects, which also includes indication data.33 Medications are represented using their structured product label (SPL) identifiers, and side effects and indications are coded using the Coding Symbols for a Thesaurus of Adverse Reaction Terms that have been normalized into the UMLS. Indication information was automatically extracted from the “indications” section of the SPLs and was limited to those terms from the Coding Symbols for a Thesaurus of Adverse Reaction Terms, which were assigned a semantic type of “Anatomical Abnormality,” “Finding,” or “Natural Phenomenon or Process” in the UMLS.34 The method used for identifying these concepts in the SPLs was based on straightforward string matching (eg, it did not account for negation or other contextual information), and detailed metrics about the accuracy of this method are not available, although a manual review conducted by the authors on a subset of SPLs showed that this method had a sensitivity of 79% in identifying the concepts mentioned in the “Adverse Reactions” section of the SPL (specificity was not reported).
McCoy et al17 used a crowdsourcing approach to infer medication-problem relationships. They used data from an EHR system where the prescribers were required to connect the medications to 1 of a patient’s problems at the time of prescription. The authors used frequently co-occurring concepts to develop a medication-indication KB based on their assumption that frequently recorded medication-problem pairs would be likely valid medication-indication pairs. Using LexiComp as the reference standard, they evaluated the accuracy of the KB using a subset of 100 randomly selected medication-indication pairs. They also assessed the impact of 2 covariates on the accuracy of the medication-indication pairs: patient link (ie, number of unique patients for whom a specific medication-indication pair was recorded in the EHR) and link ratio (ie, number of unique patients for whom a specific medication-indication pair was recorded divided by the number of unique patients for whom that medication and that problem were recorded [but not necessarily connected in the prescription]). The authors concluded that they could acquire medication-indication pairs that were at least 95% correct by using any of the following criteria: patient link ≥10; patient link ≥2 and link ratio ≥0.2; or patient link ≥3 and link ratio ≥0.1. These pairs were included in the final KB, and they accounted for 76.47% of all medication problems found in their EHR.
Fung et al18 used natural language processing (NLP) to extract drug indication information from SPLs downloaded from the DailyMed Website. They configured their approach towards higher sensitivity. After manual evaluation of the results for 300 drugs (corresponding to approximately 3500 medication-indication pairs), the authors concluded that their approach achieved a sensitivity of 95% and specificity of 77% in extracting the indications from drug labels. Primary reasons for errors in specificity included identification of the wrong concept by the NLP system and identifying all disease mentions as indications (including those that are explained in the indication section of the label as comorbidities, exceptions, etc).
Wei et al19 developed an ensemble medication-indication KB called MEDI using 4 knowledge sources as input: (1) NDF-RT; (2) SIDER; (3) MedlinePlus,35 a website maintained by the National Library of Medicine that offers health information to consumers; and (4) Wikipedia,36 a collaborative encyclopedia on the Internet. The authors determined the accuracy of the KB through manual review by expert physicians, and identified that the medication-indication data found in all 4 resources had the highest precision (100%) but very low sensitivity (2%), while data found in only 1 resource had lower precision (56%–97%). No single resource had high sensitivity (20%–51%). The authors noted that medication-indication pairs appearing in at least 2 resources had an average precision of 92% and marked them as the “high-precision subset.” This subset had a precision comparable to that of the NDF-RT but provided 66% more medication-indication pairs. In a later study, Wei et al37 used a large clinical dataset and showed that for 97.25% of medications used in outpatient and inpatient settings, MEDI contained at least 1 indication (the figure was 93.80% for the high-precision set), and medications that were not covered consisted mostly of vaccines, probiotics, nutrition, and inert INs. Authors did not report the number of prescriptions in which an indication could actually be found in the medical record.
Jung et al20 recently published a study that focused on automated detection of novel off-label drug use. They used a large clinical dataset and explored comentions of drugs and diseases in the same clinical record. They used a support vector machine classifier to identify positive cases of drug usage, and subsequently removed all known on-label and off-label drug uses (ie, those medication-indication pairs that were already listed by NDF-RT or MediSpan) to limit the scope of their study to novel off-label uses only. They also used the side effects list included in the SIDER dataset to remove drug-disease pairs that were likely co-occurring frequently because the disease is a known adverse effect (and not the indication) of the medication. Authors identified 6142 drug-disease relationships that were categorized as high-confidence novel off-label uses of medications. The authors then assessed whether the same indication had been listed in the FDA’s Adverse Event Reporting System (FAERS) data; although FAERS is a resource primarily used for collecting data about adverse effects of medications, the report also includes a field where the intended use of the reported medication can be specified. Previous research had shown that this data can be leveraged to identify indication data.21 The authors also assessed whether the novel off-label indications have ever appeared in MEDLINE abstracts, and they reported that out of the 6142 novel off-label uses found using their method, 766 (12.5%) appeared 10 or more times in the FAERS reports, and 537 (8.7%) were also comentioned in 2 or more abstracts indexed by MEDLINE. However, the final set of novel off-label uses was not manually validated by experts.
Most recently, a group of researchers developed and published a new medication-indication KB called LabeledIn, which tries to address some of the limitations of previous resources.22 LabeledIn is curated using a semiautomated approach where NLP is used to facilitate manual annotation of SPLs. They used MetaMap38 to process all SPLs found in DailyMed and subsequently presented the results to 2 professional biomedical annotators in a color-coded form. Researchers showed that this process significantly reduced the time needed for annotation, and that agreement between annotators was high (κ = 88.35%). LabeledIn lists the medication-indication relationships not only at the level of the active IN but also at finer levels of granularity, including dose form and drug strength. An evaluation of the completeness of LabeledIn, using SIDER as the reference standard, showed that out of all indications found for a random subset of 50 drug labels, 47.5% appeared in both resources, 10.1% only appeared in LabeledIn, and 42.4% were only found in SIDER; the majority of indications that were only found in SIDER were attributed to the use of less specific terms to describe indications in SIDER (eg, “infarction” in SIDER vs “myocardial infarction” in LabeledIn).
Analysis of the KBs
Here, we describe which version of the KB was analyzed and how it was structured. We will also explain if we found any redundancy or noticed issues with granularity of indication. The result of critical appraisal of the contents of the KBs is shown in Table 3.
Table 3.
Evaluation of medication-indication KBs for their capability to support complex or challenging indicationsa
| NDF-RT | SIDER | McCoy et al | Fung et al | MEDI | Jung et al | LabeledIn | |
|---|---|---|---|---|---|---|---|
| Therapy | + | + | + | + | + | + | + |
| Preventionb | + | +/– | +/– | +/– | +/– | +/– | +/– |
| Comedication | – | – | – | – | – | – | – |
| Dose forms | – | – | + | + | – | – | + |
| Route | – | – | + | + | – | – | + |
| Strength | – | – | + | – | – | – | + |
| Duration | – | – | – | – | – | – | – |
| Primary choice | – | – | – | – | – | – | – |
| Contraindication | + | – | – | – | – | – | – |
Abbreviations: KBs, knowledge bases; NDF-RT, National Drug File–Reference Terminology.
aFor definition and examples of complexity, please refer to table 1. For a description of KBs please refer to table 2.
bAlthough several resources contain examples of preventative indications in their data, only the NDF-RT specifies whether an indication is preventative or therapeutic. Those resources that do not make this delineation are marked as +/–.
Version of KBs analyzed
The National Drug File–Reference Terminology exists as a standalone dataset, but its data can also be acquired through the UMLS or through RxNorm. We used the version included in UMLS 2014AA, in which medications and indications have been both normalized to UMLS concepts. SIDER was updated once on October 17, 2012, and we used the second version known as SIDER 2.33 None of the other KBs had been updated by the time of our analysis; therefore, no specific version number is available for those.
Structure
All KBs defined medication-indication associations as simple binary relationships represented in tabular format, with the exception of NDF-RT, which specifies the relationship type as well (either may_treat or may_prevent).
Granularity
With the exception of the resource by McCoy et al, all resources defined the medications in their generic form. While the indications for generic and brand names are typically the same, it should be noted that certain brand products are only available at specific dose forms, and the difference in the indication of different dose forms can lead to a difference in the indication of brand medications. For instance, Revatio (Pfizer, Inc, New York, New York) and Viagra (Pfizer, Inc, New York, New York) are both brand names for sildenafil, but the first is produced in 20 mg tablets and the latter in 50 mg tablets. Sildenafil is indicated for treatment of pulmonary artery hypertension and erectile dysfunction, but the typical dose used for erectile dysfunction is 50 mg, while the typical dose used for pulmonary artery hypertension is 20 mg. Therefore, one could say that the indication for Viagra and Revatio is not the same, even though they are both brand names of the same generic substance.39
The National Drug File–Reference Terminology uses an internal unique identifier for medications and indications called the NDF-RT unique identifier (NUI). It also includes a mapping between these NUIs and UMLS Concept Unique Identifiers ([CUIs] for indications) and RxNorm CUIs (for medications). In the NDF-RT, all medication-indication relationships are defined at the main IN level.
SIDER defines medication at the level of SPL. Manual review showed that SIDER frequently uses nonspecific terms as indications; for instance, wherever rheumatic fever is listed as an indication, the nonspecific concept fever is also listed as an indication (examples include but are not limited to azithromycin, cefprozil, and cefazolin).
The resource provided by McCoy et al defines medication concepts at the level of dose form and strength (eg, “Atenolol 50 MG Oral Tablet”). However, they are only represented in free-text form. Similarly, indications are also listed in free-text form.
In the resource provided by Fung et al, medications are represented using RxNorm concepts at the level of semantic clinical drug form, and indications are also defined in relation to the specific dose form; semantic clinical drug forms represent the combination of IN and dose form entities. For example, separate entries for indications of “alprazolam disintegrating tablet,” “alprazolam extended release tablet,” “alprazolam oral solution,” and “alprazolam oral tablet” can each be found to be associated with a different number of indications.
In MEDI, medications are represented using both their name and the corresponding RxNorm concept identifier. Indications are represented using their name, the corresponding UMLS concept identifier, and the corresponding concept(s) from the International Classification of Diseases, Ninth Revision (ICD-9).
Jung et al represented the medications and indications only using their names (free-text), and medications were defined at the level of the main IN. While Jung et al used UMLS CUIs to represent the medications and indications (as described in their paper) during the curation of the KB, UMLS unique identifiers were not included in the final resource. Although Jung et al collected data about the known uses, they did not provide that data, because it was outside the scope of their study.
LabeledIn contains medication-indication relationships obtained from 500 SPLs; but for each label, additional RxNorm identifiers are assigned to the medication when applicable. Medications are defined using RxNorm concepts at the level of dose form and strength. Therefore, similar to the resource by Fung et al, LabeledIn lists different indications for medications that are essentially different dose forms or strengths of the same IN. However, in addition to the previous, the indications are also defined at higher levels, including the main IN.
Redundancy and other issues
Approximately 18.6% of the indication entries in SIDER were duplicates (eg, “infection” and “infections” were both listed as indications for the same SPL while they were both attributed to the same UMLS concept identifier, ie, C0021311).
Redundancy exists in the indications listed in the resource by McCoy et al; for example, “Fever (Symptom)” and “Fever (On Exam)” are 2 separate indications, and no medication is associated with both of these. Redundancy was also observed among medications (generic and brand names were both present) and several items could not be considered medications (eg, “Rapid Bacterial Antigen Ident Kit” or “OneTouch Test in vitro Strip”).
There is a lack of consistency in the use of lower or uppercase letters in the names of diseases and drugs in MEDI, and some entries are blank. For each medication-indication pair, this resource also specifies how many of the 4 original resources that were used to create MEDI had mentioned that specific pair, although the particular resources that are used are not specified.
Only 1 resource (SIDER16) clearly marked combination drugs; it contained 4567 unique SPLs for combination drugs, and these SPLs represented 1672 unique combination drugs (based on their IN s). Other KBs also include combination drugs but do not clearly mark them as such; therefore, identifying them requires using external knowledge.
Overlap of KBs
After normalization to RxNorm and SNOMED CT, 7332 unique medications and 14 821 unique indications were found in the union of all KBs, corresponding to 107 336 unique medication-indication pairs. The number of medications, indications, and medication-indication pairs that appeared in all 7 KBs was only 83, 88, and 2, respectively (figure 2). This figure shows that the majority of medications, indications, and medication-indication pairs are only defined in exactly 1 KB (ie, they are unique to that KB). More details about the results of normalization can be found in table 2 and online supplement 1.
Figure 2:
Frequency of medications, indications, and medication-indication based on the exact number of KBs they appear in.
Figure 3 shows the overlap between KBs in terms of unique medications, indications, and medication-indication pairs. Darker colors in each cell of the heat map indicate higher amounts of overlap between the KBs in that row and column. The diagonal essentially reflects the size of each KB. Rows and columns in each heat map are ordered based on the size of the KBs. The exact numbers comprising these figures are included in online supplement 2. Because heat maps only show the number of shared concepts and do not provide information about the characteristics of nonshared parts, we also depicted the overlap of KBs using chord diagrams that are shown in figures S2–S4 and explained in online supplement 2.
Figure 3:
Overlap between the unique medications (top left), indications (top right), and medication-indication pairs (bottom left) in the 7 KBs, as well as overlap between medication-indication pairs once SNOMED CT hierarchy has been used to include parent-child relationships between indications (bottom right). Numbers used to draw these chord diagrams are included in online supplement 2 along with a description of how to interpret the diagrams.
Once we used SNOMED CT hierarchy to expand the KBs by including the immediate children of indications as well, the overlap did not significantly increase (figure 3, bottom right). This suggests that the lack of overlap is caused by issues other than granularity issues at the level of “immediate parents and children.” Manual review of indications for a random subset of drugs also showed that the primary reason for discordance of KBs was not issues with granularity, and many of the indications that did not overlap were not semantically related.
To further signify this point, we also included a Circos diagram in online supplement 2 that shows the indications for 1 randomly selected drug budesonide, which is a glucocorticoid commonly used for the treatment of asthma and other reactive airway diseases. This medication was randomly picked from the list of 83 medications that had at least 1 indication in every KB. Almost every KB lists asthma as an indication for budesonide, but only some resources include more granular terms like intermittent asthma or chronic obstructive asthma or less granular terms like reactive airway disease. Similarly, while 1 KB lists sinusitis as an indication, another resource only lists more granular terms (ie, chronic maxillary sinusitis, chronic frontal sinusitis, etc) as indications. However, even if all related concepts were mapped together, a large amount of indications would not overlap between KBs (eg, some indications listed by MEDI such as fever, dyspnea, and pain do not have any counterpart in the other KBs).
Discussion
Out of the 7 KBs included in this study, independent evaluations were performed only for the NDF-RT and SIDER. Earlier evaluations of the NDF-RT showed that it had limitations in drug class information and alignment of its concepts with those in other terminologies such as RxNorm,24,26 but these limitations have been since addressed. Additionally, the majority of concepts marked as “Chemical & Drugs” in the NDF-RT are not associated with an indication,27 and some researchers have found the NDF-RT too complex to use for reasons such as lack of metadata annotations and the use of unfamiliar drug classifications.28 Khare et al22 evaluated SIDER and showed that the use of nonspecific terms as indications in SIDER could lead to alignment issues with other resources.
Most KBs represent medications only at the level of the main IN. This can be challenging when different dose forms or strengths of the same medication have different indications. Similarly, resources varied in their support for medications with multiple active INs. While KBs that define medications at the level of SPLs or using semantic clinical drug form identifiers from RxNorm should, in theory, have no difficulty in this regard (because medication with multiple INs generally have a separate label identifier and a RxNorm identifier associated with them), only 1 resource clearly marked combination drugs. Another challenge arose concerning medications used in primary or secondary prevention of disease (eg, vaccines or atorvastatin as a secondary prevention of cardiovascular disorder). Although some of the KBs did contain preventive indications, only the NDF-RT discriminated between may_treat and may_prevent associations.
The granularity of concepts used to describe indications is directly associated with the choice of terminologies. For instance, oral vancomycin is used to treat recurrent infection with Clostridium difficile, and it is not indicated in nonrecurrent infections. Since terminologies such as SNOMED CT or ICD-9 do not have a separate concept for the recurrent form of this infection, resources using these terminologies cannot accurately specify the use of vancomycin for this infection. Issues with granularity also present themselves at the level of normalization to unique concepts of diverse terms used to describe a disease or condition. For example, most KBs listed myocardial infarction or another synonymous term as an indication for the drug nitroglycerine; but because the relationship was defined at this level, these KBs were unable to represent the very important exception, ie, right-sided myocardial infarction which can lead to lethal hypotension when using nitroglycerine. Terminologies such as SNOMED CT or ICD-9 do have the level of granularity necessary for this example, but the KBs used a coarser concept to represent the indication. A relevant problem occurs for those pathologies that have a spectrum; in this case, the challenge would be to include various stages of disease as an indication. For example, Barrett esophagus is commonly treated with proton pump inhibitors (PPIs), but none of the KBs included the association between PPIs and Barret esophagus. Some of them, however, included Erosive esophagitis—the pathophysiological state before Barrett esophagus occurs—or a synonymous term as an indication for PPIs. Table 4 summarizes the issues associated with the granularity of concepts used to represent indications.
Table 4.
Issues with granularity of the concepts used to represent the indications
| Complexity | Example | Description of the example |
|---|---|---|
| Location | Left-sided myocardial infarction | Nitrates are indicated in the treatment of myocardial infraction, except for right-sided myocardial infarction in which they are contraindicated (because they can cause severe hypotension). If the indications for nitrates are defined at a coarse level, they may not capture this complexity. |
| Recurrence | Recurrent infection with Clostridium difficile | Initial infection with C difficile is treated with metronidazole, but recurrent infection is treated with oral vancomycin. If “recurrence” is not captured in the indication, the knowledge base may wrongly imply that vancomycin is indicated in any C difficile infection. |
| Symptoms | Acetaminophen for fever | Although fever is not a disease itself, acetaminophen is indicated in patients with fever when it is necessary to alleviate their symptom. |
None of the KBs reported the recency of the information they contained, and only SIDER and the NDF-RT were updated by the time of this review. This limits their usefulness, as many of these resources have been developed several years ago, and may now contain outdated information. It should be noted, however, that Khare et al22 have proposed a plan for updating LabeledIn on a regular basis and estimates the amount of work needed to be minimal.
Finally, it would be impossible to capture all the possible variables of medication-indication knowledge in all circumstances; and as a result, any CDS system should allow for exceptions to the rule. Patients with refractory diseases, those who have allergies or contraindications for the “indicated” therapy, and those who have rare diseases for which no drug therapy is established yet (eg, Ebola infection, for which treatments are all in the experimental stage) may benefit from receiving prescriptions that may contradict with the information in the medication-indication KBs. Additionally, medication-indication KBs are meant to provide evidence-based knowledge, but there will always be new knowledge which may not be incorporated into these resources but may still qualify as appropriate use of medications.
Recommendations for future research
One of the key limiting aspects of many of the existing medication-indication KBs is the use of simple binary relationships that do not capture characteristics of the medication or indication (see table 1). This challenge is partly addressed in those resources that define the medication concepts at a finer level of granularity (eg, dose form and strength) or use different relationship types to discriminate between preventive and therapeutic indications. Nevertheless, certain nuances of indication knowledge (such as comedication, comorbidities, or primary choice of therapy) are still not captured by any of the existing resources, indicating that there is a knowledge gap on how to capture this information in an automated way.
Various data sources that have been leveraged to identify adverse effect information may potentially be used to extract indication data as well. Harpaz et al40 recently published a review of various data sources used for extracting adverse effect information. Previous studies have leveraged patient-generated data on the Internet to identify adverse effects of medications by, for example, analyzing the Internet search logs,41 Twitter (Twitter, Inc, San Francisco, California) posts,42 or user contributions to online health communities.43,44 Similar data sources and approaches may be plausible for identifying off-label use of medications, such as their effects on various symptoms,45 or other types of information about medication indications. Additionally, while the FAERS database is primarily designed to collect and store adverse effect data, it can be leveraged to extract the “use” information in adverse effect reports to identify true indications for medications.21
Limitations
Our work has several limitations. First, we restricted our literature review to studies with electronic full texts in English and only searched PubMed, PubMed Central, EMBASE, or CINAHL. Other medication-indication KBs that did not meet our inclusion and exclusion criteria were not reviewed in this study, and the results of our review may not be generalizable to those resources. Our review only includes those publications that are about a publicly available KB for medication indications. In other words, we did not evaluate proprietary KBs that contain medication-indication information, and the strengths and weaknesses of those resources may be different than those described here. Also, we excluded publications that only discussed a method for developing a medication-indication KB but did not actually provide the resulting KB, because the scope of this review was to compare the KBs, not the methods; examples of excluded papers include works by Wright et al25 and Chen et al.23 Second, the method we used for analyzing the limitations of KBs in representing complex or challenging indications was based only on an informal query sent to a small group of clinicians and pharmacists. It is possible that other categories of complex or challenging indications exist that were not studied in this work. This signifies the need for developing quality metrics for the evaluation of medication-indication KBs. Finally, the normalization of medication concepts into main INs was not perfect, and it did not handle combination medications properly. Further work is needed to normalize all indications and to aggregate the relevant indications.46
Conclusions
Medication-indication KBs are important resources for data-driven research on appropriate uses and adverse effects of medications as well as for providing automated decision support regarding appropriate use of medications. Each of the KBs reviewed above is a significant step forward in developing a comprehensive, computable, public medication-indication knowledge base. This review also identifies some of the key gaps in the KBs at the level of representation and content, with the aim of motivating researchers to address these important limitations in future research. Characteristics of the ideal KB depend on the task at hand, but the results of this review can help the reader decide which KB meets their needs the most.
Contributors
HS designed the overall methodology, conducted the research, and drafted and revised the manuscript. HS and CF conducted the study selection for the review. HS and THT contributed to data analysis and interpretation of the results. THT assisted with developing the checklist for challenging indications. All authors assisted with revising the manuscript and reviewed and approved the final draft.
COMPETING INTERESTS
None.
FUNDING
This work has been supported by the National Library of Medicine grants R01 LM010016, R01 LM010016-0S1, R01 LM010016-0S2, R01 LM008635, and 5 T15 LM007079.
SUPPLEMENTARY MATERIAL
Supplementary material is available online at http://jamia.oxfordjournals.org/.
REFERENCES
- 1.Weingart SN, Simchowitz B, Padolsky H, et al. An empirical model to estimate the potential impact of medication safety alerts on patient safety, health care utilization, and cost in ambulatory care. Arch Intern Med 2009;169(16):1465–1473. [DOI] [PubMed] [Google Scholar]
- 2.Fitzgerald LS, Hanlon JT, Shelton PS, et al. Reliability of a modified medication appropriateness index in ambulatory older persons. Ann Pharmacother 1997;31(5):543–548. [DOI] [PubMed] [Google Scholar]
- 3.Radley DC, Finkelstein SN, Stafford RS. Off-label prescribing among office-based physicians. Arch Intern Med 2006;166(9):1021–1026. [DOI] [PubMed] [Google Scholar]
- 4.McIntyre J, Conroy S, Avery A, et al. Unlicensed and off label prescribing of drugs in general practice. Arch Dis Child. 2000;83(6):498–501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Turner S, Longworth A, Nunn AJ, et al. Unlicensed and off label drug use in paediatric wards: prospective study. BMJ 1998;316(7128):343–345. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Pandolfini C, Bonati M. A literature review on off-label drug use in children. Eur J Pediatr 2005;164(9):552–558. [DOI] [PubMed] [Google Scholar]
- 7.Cebul RD, Love TE, Jain AK, et al. Electronic health records and quality of diabetes care. N Engl J Med 2011;365(9):825–833. [DOI] [PubMed] [Google Scholar]
- 8.Roth MT, Weinberger M, Campbell WH. Measuring the quality of medication use in older adults. J Am Geriatr Soc 2009;57(6):1096–1102. [DOI] [PubMed] [Google Scholar]
- 9.Falck S, Adimadhyam S, Meltzer DO, et al. A trial of indication based prescribing of antihypertensive medications during computerized order entry to improve problem list documentation. Int J Med Inform 2013;82(10):996–1003. [DOI] [PubMed] [Google Scholar]
- 10.Salmasian H, Freedberg DE, Abrams JA, et al. An automated tool for detecting medication overuse based on the electronic health records. Pharmacoepidemiol Drug Saf 2012;22(2):183–189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.MedicineNet Website. http://www.medicinenet.com (accessed 11 Sep 2014). [Google Scholar]
- 12.DrugBank Website. http://www.drugbank.ca (accessed date 11 Sep 2014). [Google Scholar]
- 13.US National Library of Medicine. DailyMed: Current Medical Information Website. http://dailymed.nlm.nih.gov (accessed 11 Sep 2014). [Google Scholar]
- 14.US National Library of Medicine. RxNorm API Website. http://rxnav.nlm.nih.gov/RxNormAPIs.html (accessed date 11 Sep 2014). [Google Scholar]
- 15.Carter JS, Brown SH, Erlbaum MS, et al. Initializing the VA medication reference terminology using UMLS metathesaurus co-occurrences. Proc AMIA Symp 2002;116–120. [PMC free article] [PubMed] [Google Scholar]
- 16.Kuhn M, Campillos M, Letunic I, et al. A side effect resource to capture phenotypic effects of drugs. Mol Syst Biol 2010;6(343):343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.McCoy AB, Wright A, Laxmisan A, et al. Development and evaluation of a crowdsourcing methodology for knowledge base construction: identifying relationships between clinical problems and medications. J Am Med Inform Assoc 2012;19(5):713–718. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Fung KW, Jao CS, Demner-Fushman D. Extracting drug indication information from structured product labels using natural language processing. J Am Med Inform Assoc 2013;20(3):482–488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Wei W-Q, Cronin RM, Xu H, et al. Development and evaluation of an ensemble resource linking medications to their indications. J Am Med Inform Assoc 2013;20(5):954–961. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Jung K, LePendu P, Chen WS, et al. Automated detection of off-label drug use. PLoS One 2014;9(2):e89324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Li Y, Salmasian H, Harpaz R, et al. Determining the reasons for medication prescriptions in the EHR using knowledge and natural language processing. AMIA Annu Symp Proc 2011;2011:768–776. [PMC free article] [PubMed] [Google Scholar]
- 22.Khare R, Li J, Lu Z. LabeledIn: Cataloging labeled indications for human drugs. J Biomed Inform 2014;52:448–456. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Chen ES, Hripcsak G, Xu H, et al. Automated acquisition of disease-drug knowledge from biomedical and clinical documents: an initial study. J Am Med Informatics Assoc 2008;15(1):87–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Pathak J, Chute CG. Analyzing categorical information in two publicly available drug terminologies: RxNorm and NDF-RT. J Am Med Inform Assoc 2010;17(4):432–439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Wright A, Chen ES, Maloney FL. An automated technique for identifying associations between medications, laboratory results and problems. J Biomed Inform 2010;43(6):891–901. [DOI] [PubMed] [Google Scholar]
- 26.Pathak J, Chute CG. Further revamping VA’s NDF-RT drug terminology for clinical research. J Am Med Inform Assoc 2011;18(3):347–348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Barrière C, Gagnon M. Drugs and disorders: from specialized resources to web data. Paper presented at: The 10th International Semantic Web Conference; October 23-27, 2011; Bonn, Germany. https://files.ifi.uzh.ch/ddis/iswc_archive/iswc/pps/web/iswc2011.semanticweb.org/fileadmin/iswc/Papers/Workshops/WeKEx/paper_7.pdf (accessed 3 Oct 2014). [Google Scholar]
- 28.Pathak J, Murphy SP, Willaert BN, et al. Using RxNorm and NDF-RT to classify medication data extracted from electronic health records: experiences from the Rochester Epidemiology Project. AMIA Annu Symp Proc 2011;2011:1089–1098. [PMC free article] [PubMed] [Google Scholar]
- 29.US National Library of Medicine. Unified Medical Language System Website. http://www.nlm.nih.gov/research/umls/ (accessed date 11 Sep 2014). [Google Scholar]
- 30.US National Library of Medicine. RxNorm Website. http://www.nlm.nih.gov/research/umls/rxnorm/ (accessed date 11 Sep 2014). [Google Scholar]
- 31.US National Library of Medicine. SNOMED Clinical Terms Website. http://www.nlm.nih.gov/research/umls/Snomed/snomed_main.html (accessed date 11 Sep 2014). [Google Scholar]
- 32.Nelson SJ, Zeng K, Kilbourne J, et al. Normalized names for clinical drugs: RxNorm at 6 years. J Am Med Inform Assoc 2011;18(4):441–448. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.SIDER 2—Side Effect Resource Website. http://sideeffects.embl.de/ (accessed date 11 Sep 2014). [Google Scholar]
- 34.Campillos M, Kuhn M, Gavin A-C, et al. Drug target identification using side-effect similarity. Science 2008;321(5886):263–266. [DOI] [PubMed] [Google Scholar]
- 35.US National Library of Medicine. MedlinePlus Webiste. http://www.nlm.nih.gov/medlineplus/ (accessed date 11 Sep 2014). [Google Scholar]
- 36.English Wikipedia Website. http://en.wikipedia.org (accessed date 11 Sep 2014). [Google Scholar]
- 37.Wei W-Q, Mosley JD, Bastarache L, et al. Validation and enhancement of a computable medication indication resource (MEDI) using a large practice-based dataset. AMIA Annu Symp Proc. 2013;2013:1448–1456. [PMC free article] [PubMed] [Google Scholar]
- 38.Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp 2001;17–21. [PMC free article] [PubMed] [Google Scholar]
- 39.Grissinger M. Multiple brand names for the same generic drug can cause confusion. P T 2013;38:305. [PMC free article] [PubMed] [Google Scholar]
- 40.Harpaz R, Callahan A, Tamang S, et al. Text mining for adverse drug events: the promise, challenges, and state of the art. Drug Saf 2014;37(10):777–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.White RW, Harpaz R, Shah NH, et al. Toward enhanced pharmacovigilance using patient-generated data on the internet. Clin Pharmacol Ther 2014;96(2):239–246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Freifeld CC, Brownstein JS, Menone CM, et al. Digital drug safety surveillance: monitoring pharmaceutical products in twitter. Drug Saf 2014;37(5):343–350. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Mukherjee S, Weikum G, Danescu-Niculescu-Mizil C. People on drugs. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’14. New York, New York: ACM Press; 2014;65–74. http://dl.acm.org/citation.cfm?doid=2623330.2623714 (accessed 11 Sept 2014). [Google Scholar]
- 44.Yang CC, Yang H, Jiang L, et al. Social media mining for drug safety signal detection. In: Proceedings of the 2012 International Workshop on Smart Health and Wellbeing - SHB ’12. New York, New York: ACM Press; 2012;33 http://dl.acm.org/citation.cfm?doid=2389707.2389714 (accessed 11 Sept 2014). [Google Scholar]
- 45.Hughes S, Cohen D. Can online consumers contribute to drug knowledge? A mixed-methods comparison of consumer-generated and professionally controlled psychotropic medication information on the internet. J Med Internet Res 2011;13(3):e53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Pivovarov R, Elhadad N. A hybrid knowledge-based and data-driven approach to identifying semantically similar concepts. J Biomed Inform 2012;45(3):471–481. [DOI] [PMC free article] [PubMed] [Google Scholar]




