Skip to main content
AMIA Annual Symposium Proceedings logoLink to AMIA Annual Symposium Proceedings
. 2010 Nov 13;2010:607–611.

Use of Standard Drug Vocabularies in Clinical Research: A Case Study in Pediatrics

Jyotishman Pathak 1, Rachel L Richesson 2
PMCID: PMC3041282  PMID: 21347050

Abstract

Clinical and epidemiological researchers across all medical specialties need tools and knowledge representations to support the classification, aggregation, and analysis of medication data. The Veterans Affairs National Drug File Reference Terminology (NDF-RT) is a named standard for classifying medications. We describe our experience applying NDF-RT to aggregate RxNorm-encoded medications that were collected from an international cohort of over 8,000 children. We detail the researchers’ analysis objectives and subsequent requirements for a drug classification representation, and assess the ability of NDF-RT to provide classes that are meaningful to pediatric researchers. In addition, we explore the completeness of RxNorm – NDF-RT mappings (i.e., the coverage of NDF-RT) for this sample of pediatric medications. We conclude that NDF-RT is sufficient to address the knowledge representation needs for this research study, though only a small subset of NDF-RT is needed for research analyses. Researchers from all domains would benefit from tools for easily extracting a set of relevant classes from the NDF-RT knowledge structure.

Introduction

Standard knowledge representations that organize medications by various characteristics or properties can support a multitude of clinical research questions across the spectrum of health and disease. The linkages and mappings between medication entities (ingredients or products) and classes of properties can save much re-coding work for hundreds or even thousands of clinical research investigators across domains, and reduce variation and errors in these grouping activities across studies, thereby supporting interoperability and data sharing.

In this paper, we present our approach for using an existing standardized drug vocabulary (NDF-RT [1]) to classify medications reported in an ongoing longitudinal observational study of young children in four countries. Here, we investigate the categorization of 745 unique reported medications, coded in RxNorm [2], with respect to NDF-RT drug classification hierarchies for Chemical Structure and Pharmaceutical Preparations. In particular, we extract the associations between the medications reported by study participants and a set of broad drug “groupings” (suggested by a convenience sample of investigators) that would provide meaningful groups for the analysis of medication data. On this set of important medication “groupings” we explore: (i) coverage in terms of how reported medications are classified under the drug classes; and (ii) coverage in terms of identifying important (as identified by research investigators) drug classes that either cannot be represented in NDF-RT and/or do not have any drug products classified. Further, we describe unique requirements for drug classifications, particularly in the context of pediatric diabetes research, and offer recommendations for NDF-RT to address these needs.

Background

Standards for Naming and Classifying Drugs

The current U.S. standards for representing medications are RxNorm, developed and maintained by the National Library of Medicine (NLM) [2], and the National Drug File Reference Terminology (NDF-RT [1]), created by the Department of Veteran Affairs (VA). RxNorm is a vocabulary for drugs at various levels of specificity (e.g., active ingredients, dosage formulary, administration route and packaging) where each drug entity is indicated by a unique concept identifier (RxCUI). While it does provide extensive coverage for drug entities, RxNorm does not at present offer clinical researchers a sensible way to aggregate or classify clinical drugs or active ingredients for analysis.

NDF-RT, on the other hand, includes information about drugs and ingredients, but also contains a multi-axial hierarchical knowledge structure that classifies various ingredients and drug products. In particular, NDF-RT uses a description logic-based formal reference model that groups drug products into the high-level drug classes for Chemical Structure (e.g., Acetanilides), Mechanism of Action (e.g., Prostaglandin Receptor Antagonists), Physiological Effect (e.g., Decreased Prostaglandin Production), drug-disease relationship describing the Therapeutic Intent (e.g., Pain), Pharmacokinetics describing the mechanisms of absorption and distribution of an administered drug within a body (e.g., Hepatic Metabolism), and legacy VA-NDF classes for Pharmaceutical Preparations (VHA Drug Class; e.g., Non-Opioid Analgesic). Figure 1 shows the structure of NDF-RT: the hexagons represent multiple-inheritance reference hierarchies, whereas the rectangles are named sets of concepts each representing a level of abstraction used to describe medications. Note that NDF-RT also augments a “legacy” classification system (called VA-NDF [3]) which classified drug products into groupings developed by VA (denoted by VHA Drug Class) to support organization and decision support for medication usage in clinical care settings.

Figure 1.

Figure 1.

Linkage Points Between NDF-RT and RxNorm Information Models (blue dotted arrows; figures adapted from [1, 2])

As shown in Figure 1, the linkage points between NDF-RT and Rx-Norm are at the levels of ingredients and clinical drugs (denoted by blue dotted arrows). Linking RxNorm to NDF-RT at the ingredient level versus the clinical drug level will not necessarily yield the same result sets. This is due to several reasons. Foremost, RxNorm and NDF-RT represent different terminologies maintained by different organizations. Whilst they both contain drug ingredients and packaged drugs (and can be linked by each), the timing of updates and the boundaries of content scope are different. (VA health system, for example, are for adults, so would not necessarily include medications exclusive for pediatric patients.) Secondly, because the relationship between drugs and ingredients is developed separately for each, there are many issues with respect to the maintenance and curation of the linkages. (See the Discussion section for more details.) To the best of our knowledge, no one has done a formal evaluation comparing the results, errors, or other findings between the two different means of linking NDF-RT and RxNorm. For researchers to use NDF-RT in a standard way there should be consistency in methods and formal guidance for linking and aggregating RxNorm encoded data into NDF-RT classes.

Materials

We used NDF-RT January 12, 2010 release that has been synchronized with the RxNorm January 04, 2010 release. The mappings between RxNorm and NDF-RT entities between “Clinical Drug” and “Pharmaceutical Ingredient” (Figure 1 denoted by blue dotted arrows) were obtained from the respective source files using the unique identifiers for the concepts contained in the source files. For example, RxNorm drug entity Acetaminophen 160MG Oral Tablet (RxCUI=282464) with Acetaminophen (RxCUI=161) as active ingredient maps to NDF-RT Clinical Drug ACETAMINOPHEN 160MG TAB [VA Product] (NDF-RT code=C32112) and NDF-RT Pharmaceutical Ingredient ACETAMINOPHEN (NDF-RT code=C9708). In the following, we describe our techniques in using these mappings for classifying TEDDY medication data using NDF-RT’s Pharmaceutical Preparations (VHA Drug Class) and Chemical Structure classes.

Methods

Our methods for this investigation involve two approaches and subsequent observation/discussion for linking records from an RxNorm-coded set of medication ingredients (n=745) to NDF-RT. Additionally, we define a preliminary subset of NDF-RT classes that are appropriate for analysis of the medication data in the context of the study.

Sample of Pediatric Medications

In this study, we explore methods for the linkage and utility of RxNorm-NDF-RT for medication data reported in The Environmental Determinants of Diabetes in the Young (TEDDY) study [4] [5], which is an international longitudinal study exploring genetic-environmental interactions in relation to the development of Type I Diabetes Mellitus (T1DM) in children. Because only a minority of children enrolled in TEDDY will actually develop T1DM, this data can be considered as a large data corpus of data for a “general” pediatric population. Medication data that we explore for this study was collected on 8,111 newborns (from Finland, Germany, Sweden and North America), some of whom enrolled at the start of the study in 2004 and have been observed quarterly since. The University of South Florida serves as the Data Coordinating Center for TEDDY and coordinates the use of RxNorm for coding medications data reported in the study.

Specifically, each reported medication is coded in our database as a study specific medication code that is linked to (1 or more) RxNorm ingredient identifiers. Our previous research [6] using data from 2004–5 demonstrated that RxNorm included codes for virtually all of the unique active ingredients (282/284 = 99%) from over 5,000 medications reported for over 1200 children. As of January 2010, the TEDDY study data contained 109,574 instances of reported medications on 8,111 study participants.

The medication data used for this work is not tied to individual subjects or any subject-specific data. Rather our medication data is merely a listing of the unique medications reported in the study. Approximately 12% (101 of 846) of unique drug ingredients reported in the TEDDY study did not have RxNorm codes, and were removed from our sample before linkage to NDF-RT (leaving 745 unique drug ingredients for our analysis). Most of these 101 drug entities without RxNorm codes were natural or homeopathic medications that are not within the scope of RxNorm. Further, many were underspecified terms (e.g., “unknown steroid”, “non-specified antibiotic”) that could not be mapped to specific RxNorm codes. Despite the lack of RxNorm codes, it is likely that these could be important to include in a broader class within NDF-RT. We revisit this topic in the Discussion section.

Defining NDF-RT Views Appropriate for TEDDY Researchers

NDF-RT is a large and complex vocabulary comprising approximately 43,000 drug entities (e.g., orderable clinical drugs, ingredients) and classes (e.g., mechanism of action, physiological effects). For this study, TEDDY researchers identified 20 drug classes of interest having clinical similarities (e.g., antivirals, steroids) that were compared and refined against NDF-RT drug classes. All these classes mapped to either legacy VA-NDF classes for Pharmaceutical Preparations (VHA Drug Class) and/or Chemical Structure classes in NDF-RT, and hence, we evaluated both these sets of classes for this study. Thus, in essence, through this process we created a “view” of NDF-RT that is more manageable and relevant for TEDDY researchers to classify of their medication data.

Mapping TEDDY Medications to NDF-RT Chemical Structure Classes

The version of NDF-RT used in this study has approximately 8,400 Chemical Structural classes. The relationship between the NDF-RT Pharmaceutical Ingredient and these classes is denoted via the has_Ingredient association in NDF-RT which our method uses for evaluating the categorization of TEDDY medications based on Chemical Structure classes. For example, Acetaminophen (RxCUI=161) is categorized under the class Acetanilides (NDF-RT code=C25074).

Mapping TEDDY Medications to NDF-RT Pharmaceutical Preparation Classes

The version of NDF-RT used in this study has approximately 485 Pharmaceutical Preparation classes (VHA Drug Class). The relationship between NDF-RT Clinical Drug and these classes is denoted via the PAR association in NDF-RT which our method uses for evaluating the categorization of TEDDY medications. In particular, for a given drug product in RxNorm, our algorithm first identifies all the RxNorm and NDF-RT ingredient concepts for the drug. The method then determines the drug product(s) in NDF-RT that contain only those NDF-RT ingredient concepts identified from the first step, and extracts the corresponding VHA Drug Classes. For example, Cimetidine 2 MG/ML Oral Solution (RxCUI=104072) contains the ingredient Cimetidine (RxCUI=2541). This ingredient is, in turn, present in seven different drug products in NDF-RT all of which are categorized under the class Histamine Antagonists. As a result, our method assigns Cimetidine 2 MG/ML Oral Solution the VHA drug class Histamine Antagonists (NDF-RT code=C9050).

Results

Of our sample of 745 unique drugs with RxNorm ingredient codes, 86% (643 / 745) and 92% (685 / 745) had an NDF-RT Chemical Structural and Pharmaceutical Preparation class assignment, respectively. In the following, we will describe our observations by the mapping method and qualitative results.

Mapping TEDDY Medications to NDF-RT Chemical Structure Classes

Out of 745 TEDDY medications, approximately 86% were assigned a Chemical Structural class. Examples include Acetaminophen (RxCUI=161) and Neomycin (RxCUI=7299) categorized under the classes Acetanilides (NDF-RT code=C25074) and Aminoglycosides (NDF-RT code=C26066), respectively.

Furthermore, 241 TEDDY medications were assigned more than one Chemical Structural class (Figure 2). For example, Albuterol (RxCUI=435) was categorized under three different classes: Triiodobenzoic Acids (NDF-RT code=C25556), Phenethylamines (NDF-RT code=C25408), and Ethanolamines (NDF-RT code=C25970). Note that this is an artifact of multiple inheritances of Chemical Structural classes in NDF-RT.

Figure 2.

Figure 2

Classification of TEDDY medications based on NDF-RT Chemical Structural classes

Mapping TEDDY Medications to NDF-RT Pharmaceutical Preparation Classes

Out of 745 TEDDY medications, approximately 92% were assigned a VHA Drug class. For example, Acetaminophen (RxCUI=161) and Lidocaine (RxCUI=6387) were categorized under classes Non-Opioid Analgesic NDF-RT code=C8838) and Anti-arrhythmic (NDF-RT code=C8910), respectively. Furthermore, 247 TEDDY medications were categorized under more than one VHA Drug class (Figure 3). For example, Aspirin (RxCUI=1191) was assigned two different classes Non-Opioid Analgesic (NDF-RT code=C8838) and Salicyclates, Antirheumatic (NDF-RT code= C4859814134757).

Figure 3.

Figure 3

Classification of TEDDY medications based on NDF-RT VHA Drug classes

Discussion

Coverage

The number of reported (RxNorm) medications that were found in the NDF-RT drug classes can be interpreted as a measure of coverage of selected RxNorm drugs by NDF-RT. The NDF-RT coverage of the 745 pediatric medications was quite high using both linkage methods. The Pharmaceutical Preparations classes (VA–NDF legacy classes) had slightly better coverage than the Chemical Structure (92% versus 86%), and one of the plausible reasons is better coverage of linkage between RxNorm and NDF-RT for clinical drugs compared to ingredients. Regardless, the high coverage for both linkage methods using pediatric medication data was an interesting finding considering that the VA developed NDF-RT for the adult veteran population. As NDF-RT goes into wider use and is now accessible via the UMLS, we expect coverage of pediatric medications to increase. We also expect that the increasing availability of NDF-RT and our reported experience of the high coverage of pediatric medications in NDF-RT will lead to more usage of this in pediatric settings and stimulate a validation of the drug-class relationships in NDF-RT, an area badly needed but not addressed by us.

In terms of desired classes, 18 out of the 20 drug classes of interest to TEDDY researchers could be mapped to NDF-RT, although it was clearly evident that the researchers were mostly interested in the legacy VA-NDF classes (VHA Drug Classes), which were primarily defined by clinicians for applications in the clinical practice. One class that seemed reasonable from a research perspective (especially in pediatrics) but noticeably absent in NDF-RT was a general grouping for “antibiotics.” In particular, we did not find a VHA Drug class “Antibiotic” directly (i.e., via string matching for the class name/synonyms, and via hierarchical navigation), although antibiotic classes [e.g., Lincomycins (NDF-RT code=C8748) or Tetracycline (NDF-RT code=C8748)] certainly exist in NDF-RT under the hierarchy of Antimicrobials (NDF-RT code=C8716). While we could have broadly mapped TEDDY researchers’ “Antibiotics” class to NDF-RT’s class “Antimicrobials”, it would conflict some of the key differences in what comprises an antibiotic versus an antimicrobial. Similar observation was made for the class of “Steroids”, where NDF-RT instead had a class called “Hormones/Synthetic/Modifiers” (NDF-RT code=C9102).

Complexity and Usability

NDF-RT provides a large and comprehensive coverage for drug classes. However, from a usability perspective in epidemiological research, only a handful of subsets of NDF-RT classes are relevant and useful for a given disease or research project. The “desired” classes are usually identified from a top-down approach as we did for our study, i.e., identifying the classes of interest for TEDDY researchers, and manually mapping them to the most appropriate NDF-RT drug class in an iterative manner. In future, we plan to investigate tools such as vSparQL [7] that could assist in semi-automatically creating usable subsets or views of large terminologies such as NDF-RT for clinical research projects.

Data Curation Issues

Our techniques also discovered multiple issues in mappings between RxNorm and NDF-RT concepts. In general, RxNorm contains mappings from its concepts to one or more concepts in external drug terminologies, and in the case of mappings to NDF-RT, the RxCUI of an RxNorm concept is mapped to one or more NDF-RT concept codes. However, in many cases, the NDF-RT concept code present in the RxNorm database did not match with the RxCUI entry in NDF-RT. As an example, Clemastine 0.1 MG/ML Oral Solution (RxCUI=755824) has a NDF-RT code=4006429. However, this code in NDF-RT maps to RxCUI=197513, which according to the RxNorm database, has been archived on 02/06/2008, and subsequently, merged with RxCUI=755824. Similarly, Polyvinyl Alcohol 0.014 ML/ML Ophthalmic Solution (RxCUI=142004) maps to two NDF-RT concepts in RxNorm: 4009166 and 4009167. In NDF-RT, both these concept codes correspond to RxCUI=312485, which was archived on 09/09/2008 and merged with RxCUI=142004. Furthermore, we observed several “gaps” between the mappings. For example, RxNorm ingredients Thyrotropin (RxCUI=10579) and Aceorphan (RxCUI=16738) were both absent in the NDF-RT database. We believe such discrepancies are caused due different file release cycles between RxNorm and NDF-RT (NDF-RT data lags behind the RxNorm updates).

Research Needs

Although we dropped them from our data set, two types of medication data were common in our data and are unaddressed by RxNorm and NDF-RT. A significant gap for the TEDDY medication data, and likely other clinical research projects, is the coverage of dietary supplements such as mineral and herbal preparations in drug terminologies (e.g., homeopathies). These were removed from our analysis as out of scope for RxNorm, although they are important variables to capture in many investigations. Use of these (mostly) over-the-counter products is highly prevalent in the US and elsewhere, and although these are out of scope for RxNorm and NDF-RT, there is no other recognized standard for coding these products. While RxNorm does include a small percentage of these products, most of the estimated 30,000 dietary supplements sold in the USA are excluded. This raises the issue on standardized practices for drug classifications [8].

Additionally, we encountered many underspecified terms (e.g., “unknown steroid”, “non-specified antibiotic”) that were not specific enough to represent in RxNorm, but were clearly detailed enough to be represented and classified using NDF-RT classes, and arguably have research value. Future tools to code medication concepts that are out of scope or too general for RxNorm directly into NDF-RT for analysis would fulfill an important research need.

Limitations and Future Work

In this study, we limited our investigation to Chemical Structural and Pharmaceutical Preparation class hierarchies of NDF-RT. It remains to be seen how other NDF-RT axes (e.g., Mechanism of Action, Pharmacokinetics) can be used to classify and aggregate medication information to study how it relates to patient data and outcomes. Additionally, this study only evaluated drug classification from one terminology: NDF-RT. In the future, we plan to expand our investigation by incorporating other publicly available terminology resources, such as SNOMED-CT.

Significance

The goal of this work is to use NDF-RT drug classes to support the aggregation and analysis of medication data coded in RxNorm. While the data happen to have come from a large clinical research data set in pediatrics, this is a challenge with broad relevance. RxNorm has shown good coverage for medications collected in this research study and is a good choice for coding research medications, as it is the standard for clinical medication data and can enable re-use of EHR data records for research purposes.

Aggregating medication data is a common problem for any kind of research, particularly research on large populations. Methods, validation, guidance, and training for use of RxNorm and NDF-RT mappings would be useful resources for virtually all clinical research.

Conclusion

The coding of data, including medication data, supports data sharing and re-use. RxNorm codes at a very granular level and allows researchers to identify specific medications and ingredients were reported. NDF-RT classifies then in many ways and can likely support many or all medications views (or groupings) needed in research.

Acknowledgments

We wish to thank Kendra Vehik, Mike Haller, and Helena Larsson from the TEDDY project, and Christopher Chute from Mayo for their helpful reviews and cooperation. TEDDY is funded by several NIH institutes, Juvenile Diabetes Research Foundation (JDRF), and Centers for Disease Control and Prevention (CDC). This work is also funded in part by the eMERGE grant.

References

  • 1.Carter J, et al. Initializing the VA Medication Reference Terminology Using UMLS Metathesaurus Co-Occurences. AMIA Annual Symposium. 2002:116–120. [PMC free article] [PubMed] [Google Scholar]
  • 2.Liu S, et al. RxNorm: Prescription for Electronic Drug Information Exchange. IT Professional. 2005;7(5):17–23. [Google Scholar]
  • 3.Nelson SJ, et al. A Semantic Normal Form for Clinical Drugs in the UMLS: Early Experiences with the VANDF. AMIA Ann. Symp. 2002:557–561. [PMC free article] [PubMed] [Google Scholar]
  • 4.Rewers M, TS Group The Environmental Determinants of Diabetes in the Young (TEDDY) Study. Annals of the New York Academy of Sciences. 2008;1150:1–13. doi: 10.1196/annals.1447.062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.PEC TEDDY Project Website. 2006. [cited 2006; Available from: http://teddy.epi.usf.edu/.
  • 6.Richesson R, et al. Achieving Standardized Medication Data in Clinical Research Studies: Two Approaches and Applications for Implementing RxNorm. Journal of Medical Systems. 2007 doi: 10.1007/s10916-009-9278-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Shaw M, et al. Generating Application Ontologies from Reference Ontologies. AMIA Annual Symposium. 2008:672–676. [PMC free article] [PubMed] [Google Scholar]
  • 8.Moyers S, Richesson R, Krischer J. Trans-Atlantic Data Harmonization in the Classification of Medicines and Dietary Supplements: A Challenge for Epidemiologic Study and Clinical Research. International Journal of Medical Informatics. 2008;77(1):58–67. doi: 10.1016/j.ijmedinf.2006.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES