Skip to main content
Journal of the American Medical Informatics Association : JAMIA logoLink to Journal of the American Medical Informatics Association : JAMIA
. 2010 Nov-Dec;17(6):671–674. doi: 10.1136/jamia.2010.008607

Drug safety surveillance using de-identified EMR and claims data: issues and challenges

Prakash M Nadkarni
PMCID: PMC3000764  PMID: 20962129

Abstract

The author discusses the challenges of pharmacovigilance using electronic medical record and claims data. Use of ICD-9 encoded data has low sensitivity for detection of adverse drug events (ADEs), because it requires that an ADE escalate to major-complaint level before it can be identified, and because clinical symptomatology is relatively under-represented in ICD-9. A more appropriate vocabulary for ADE identification, SNOMED CT, awaits wider deployment. The narrative-text record of progress notes can potentially be used for more sensitive ADE detection. More effective surveillance will require the ability to grade ADEs by severity. Finally, access to online drug information that includes both a reliable hierarchy of drug families as well as structured information on existing ADEs can improve the focus and predictive ability of surveillance efforts.


In this issue, Reisinger et al1 describe the creation of a database intended to facilitate drug safety surveillance by monitoring for adverse events, using extracted data from two de-identified databases, a claims database and an electronic medical record (EMR) database provided by a large healthcare company. The proposed data model is a subset of a more detailed model specified by the Observational Medical Outcomes Partnership.2 That commercial enterprises engage in such work is highly laudable.

The proposed data model is fairly straightforward. A Persons (Patients) table records basic demographic elements and related tables list the encounters, medications, procedures, and clinical conditions for each person. The latter three tables encode the concepts being recorded using standard medical vocabularies whose contents, as well as associated hierarchical relationships, are extracted from the US National Library of Medicine's Unified Medical Language System (UMLS) Metathesaurus.3

Chronological information is essential in surveillance databases: to suspect a medication-related adverse event, a condition must follow the onset of medication, though of course a post hoc phenomenon does not by itself prove cause and effect. To create the chronological information, Reisinger et al pre-processed the raw data by coalescing consecutive records for the same patient for the same medication, clinical condition, or procedure into a single record. The resultant record represents an ‘era’ for the therapeutic intervention or condition. Each is tagged by start and end dates that denote an episode of continued medication administration or of ongoing care visits for a condition. The coalescing heuristic used was: if one encounters a sequence of records where the start date of intervention in a subsequent record follows the end date in the preceding record by 30 days or less, the sequence can be merged into a single record.

The resulting database is impressive in terms of its data volume: 43 million subjects and 1 billion drug exposures. However, both the data model and the vocabularies employed in the work bring with them significant limitations in terms of the inferences one can make with regard to medication safety. To be fair, some of these limitations only serve to illustrate the challenges inherent in the problem.

Identifying adverse effects: data source and vocabulary issues

In the above work, the only source of adverse event data that was utilized from the EMR/claims data was clinical-condition information that was encoded using the International Classification of Diseases, 9th edition (ICD-9)4: this was converted, where possible, into equivalent codes in MedDRA (Medical Dictionary of Regulatory Activities)5 using exact-correspondence information in UMLS. Because they are used for billing purposes, ICD-9 data are the most readily available structured data in EMRs for identifying clinical conditions. However, such data have several issues.

Problems with the use of claims data for adverse event detection

The majority of adverse drug effects (ADEs) are recorded in the narrative text associated with the initial post-event visit or progress note, if at all. Only if severe enough to constitute a chief complaint or a major finding will they be coded using ICD-9. The requirement that ADE findings must escalate to a major-complaint level to be picked up lowers the system's sensitivity. Under-recognition of a seemingly common problem (weight gain) was an issue with the antipsychotic risperidone, overuse of which is now the focus of federal concerns6: the prevalence of this ADE was only recognized when the problem escalated into obesity sufficient to cause type II diabetes or became pathological.

The relatively weaker coverage of ICD-9 for (non-billable) symptomatology, in comparison to other vocabularies such as the Systematic Nomenclature of Medicine Clinical Terms (SNOMED CT),7 is well documented.8 9 For example, it would be unusual to code a complaint of dry mouth due to anticholinergic medications using ICD-9. The encoding process itself is vulnerable to inaccuracies, because it is not always performed by the care provider during the time of the clinical encounter. Depending on the healthcare organization and the specialty, a significant portion of the clinical record may be recorded in narrative text, which is then encoded a day or two later by medical records staff for billing and reporting purposes. As such, the encoding does not reflect ‘ground truth’. For example, Stein et al10 studied the phenomenon of post-operative pulmonary embolism as recorded in narrative text and in encoded form, and found not only significant discrepancies between the two, but also false positives and negatives in both.

Some groups now promote encoding of problem lists using SNOMED CT,11 because the latter captures symptomatology much better than ICD-9. However, there are significant hurdles to the intended widespread deployment of SNOMED CT. Critical aspects include the large size of the terminology and the significant redundancy in its content. Projects such as the construction by the National Library of Medicine (NLM) of a CORE Subset of SNOMED CT12 aim to address both of these issues. Nevertheless, recent work by Nadkarni and Darer13 indicates that such subsets, while undoubtedly useful, cannot provide the necessary coverage in all circumstances–access to the complete SNOMED CT content is still required.

Another concern is that encoding of fine details of the encounter, when performed by humans with software assistance, is time consuming. Consequently, busy clinicians may find this an unacceptable chore, and relegate this task to their medical records staff. Doing so would propagate the aforementioned concerns regarding accuracy. Conversely, for clinical encounters documented primarily as narrative text, a currently popular question is whether automated natural-language-processing (NLP) techniques can adequately extract all ADE-related information from the text. Wang et al14 explored the feasibility of using NLP for ADE signal detection in a recent JAMIA paper. In their proof of concept study, Wang et al evaluated patients treated with bupropion, and the results were promising. However, the field must replicate such work on a much larger scale to determine where the pitfalls lie.

The choice of MedDRA as an adverse event terminology

The FDA uses MedDRA to collect and encode reports of adverse events. Thus, mapping of ICD-9 codes to MedDRA is necessary for communication to the FDA. Using MedDRA has some advantages, notably in the area of ‘standardized MedDRA queries’. Through a knowledge base representing the findings of various syndromes using MedDRA terms, one can search for patients whose individual findings are consistent with disorders such as anaphylaxis, extrapyramidal manifestations, hemolysis, or renal failure. However MedDRA's design deviates significantly from modern controlled-vocabulary-design principles as articulated in Cimino's classic paper15: its limitations have been discussed by other authors.16–18 Concerns about MedDRA include that it is not concept-oriented, it is non-compositional, its hierarchy is arbitrarily constrained to five levels, and, at the higher levels, it is artificially mono-hierarchical, which leads to difficulties in formulating queries.

Because the SNOMED CT concept hierarchy is significantly richer than MedDRA's, Bodenreider attempted to map, using automated approaches, MedDRA preferred terms (the equivalent of concepts) to SNOMED CT concepts.19 He found that 58% of MedDRA's preferred terms could be mapped this way. Thus, the incorporation of additional intermediate-level concepts from SNOMED may make MedDRA-encoded data easier to categorize, aggregate, and analyze meaningfully.

Grading of adverse events

Early and sensitive adverse event detection requires adverse event grading. Merely recording that a drug causes an adverse event is not enough–one must know how severe it is. In the running example of the Reisinger et al paper, acute myocardial infarction represents only the tip of the iceberg of coronary artery disease leading to occlusion. Patients with acute myocardial infarction frequently experience symptoms such as anginal pain beforehand. It is important to catch adverse events before they escalate into full-blown emergencies. While Reisinger et al mention ischemic heart disease in passing, it is not clear how their model would represent progression of given disorders along a spectrum from mild to severe such that all intermediate states would fit as recognizable components of the same disease process.

Some adverse events, by their very nature–such as anaphylaxis or toxic epidermal necrolysis–occur in a severe form. Most ADEs, however, can occur with varying grades of severity. For example, National Cancer Institute (NCI)-sponsored clinical trials of cancer therapies utilize the Common Toxicity Criteria for Adverse Effects (CTC AE),20 where adverse events are graded on a 1–5 scale (5 represents death), though, depending on the particular adverse event, not all points on the scale may be used. For example, dry mouth can occur as 1–2 on the scale, while ‘secondary malignancy’, if present, is automatically grade 4. One motivation for grading is to enable consistent reporting of adverse events to the local Institutional Review Board, to other collaborating sites in a multi-site study, and to the study's sponsor, for example, by requiring reporting of major adverse events of grade 3 and above.

While originally developed for oncology, because CTC AE grading is ‘anchored’, it has found application in non-cancer-related studies such as stem cell transplantation21 and, in a modified form, for rheumatology.22 The use of CTC AE minimizes inter-rater variability. ‘Anchoring’ implies that rather than simply using terms like ‘mild’, ‘moderate’, or ‘severe’ without definition or qualification, CTC AE specifies a particular grade of an adverse event in unambiguous detail, often in terms of numerical ranges or the extent of functional disability. For all its strengths, however, CTC AE is not comprehensive enough to use for all drug categories or for all types of clinical studies. For example, psychiatric findings are under-represented, as are certain physical findings such as tendon rupture. The latter can occur with fluoroquinolone antimicrobial administration or after periarticular corticosteroid injections. With CTC AE, tendon rupture can only be encoded as ‘musculoskeletal, other (specify)’.

Grading of adverse events is not always possible or feasible to perform in real time. While the grades of some adverse events (eg, those based on measurable physical or laboratory findings) can be readily computed algorithmically, grading of subjective findings typically requires careful inspection of the clinical record or detailed interviewing of the patient. Electronic support in the form of check-lists can facilitate its implementation. A concern regarding the model proposed by Reisinger et al is that adverse event grade information is not easily gleaned from ICD-9 data. First, only a small proportion of clinical conditions are graded in ICD-9 as mild/minimal, moderate, or severe. More importantly, as already stated, the billing and administrative practices related to ICD-9 usage tend to leave adverse events of a low-level grade as narrative-text portions of the clinical record rather than formally encoding them.

Drug information: choice of reference content

For hierarchical relationships among drugs, Reisinger et al chose to use the drug hierarchy of SNOMED CT. While SNOMED CT's strengths with respect to encoding much of clinical medicine are well known, SNOMED CT is a suboptimal source for information about relationships among drugs.

In the discussion below it is important to note that the relationship between drugs and drug families/categories is poly-hierarchical (ie, one drug may belong to more than one family). A given drug may have multiple therapeutic actions–for example, aspirin is both an anti-inflammatory and anti-platelet agent–or a drug may bind multiple receptors, as in the case of chlorpromazine.

The authors' choice of rofecoxib as an exemplar was fortuitous: SNOMED CT characterizes it correctly as a cox-2 inhibitor. However, in SNOMED CT chlorpromazine falls under the single category ‘phenothiazine’, a chemical classification that is not useful from the pharmacological or therapeutic perspectives. The SNOMED CT classification for the widely used drug acetaminophen is incorrect: ‘Para-aminophenol derivative anti-inflammatory agent (substance)’. Acetaminophen has antipyretic and analgesic effects, but has no clinically significant anti-inflammatory effects. The antimicrobial ciprofloxacin (a fluoroquinolone) is placed in the less useful, broader category ‘quinolones’ along with nalidixic acid, an older drug with a significantly different adverse event profile. Such classification problems can have real-world complications. For example, the fluoroquinolone drug family, which is not a distinct concept in SNOMED CT, is the focus of complaints regarding overuse from groups such as the Fluoroquinolone Toxicity Research Foundation, and the Health Research Group of Public Citizen, which has petitioned the FDA to require black-box label warnings.23

An accurate and comprehensive drug hierarchy is important for analyses of groups of related drugs. Useful drug hierarchies have been constructed, but are not always freely available. For example, the Cerner Multum Drug Lexicon database24 was freely available in its earlier versions, and correctly classified chlorpromazine both as a phenothiazine antipsychotic and a phenothiazine antihistamine. Unfortunately, its distribution has been constrained in its more recent versions, and one can now only obtain it by purchasing the content.

Determining dose-related effects: challenges

Reisinger et al state explicitly that their data model does not support analyses by drug strength. More concerning, the model does not record dose information. Many adverse events occur as dose-related extensions of pharmacological actions, such as congestive heart failure with β-blockers and fluid retention with the thiazolidinedione anti-diabetic agents.

The absence of dose data again limits the model's utility. There are several issues related to performing such analyses.

  • Many of the standard sources of drug information–such as the NLM's RxNorm25 and the drug hierarchy of SNOMED CT that the authors used–do not treat the numbers associated with a pharmaceutical preparation specially. Instead, the numbers are simply part of the string that describes a formulation. The UMLS reflects this design limitation as well. More advanced data models, such as the previously mentioned Multum Lexicon, explicitly separate the numeric part of the drug strength (as well as the units in which the strength is expressed) from the medication itself. The Multum data model is sophisticated enough to recognize that in many cases, both strength and units are expressed in two parts, numerator and denominator (eg, milligrams per 100 ml), and so these parts were modeled separately where necessary.

  • Of course, even if one knows what strength of preparation was being prescribed for a given patient, that is not enough to reliably compute the quantity of medication that the patient is actually receiving per unit time. For ambulatory patients, one may try to rely on the quantity dispensed for a given period, but that is not the same as what is ingested. For several drugs, the dose must be continually titrated based on the values of a laboratory measure (eg, the International Normalized Ratio (INR) for warfarin), so the number of tablets taken per day or per week may change frequently.

One practical issue is that many EMRs (eg, EpicCare) record the caregiver's orders for a given prescription only as narrative text, for example, ‘1 bid’, even though it should not be particularly difficult in principle to enforce at least partial structure in the data through the use of pull-down lists and separation of the numeric part of the order from the dose frequency (although narrative text is still necessary for special instructions). Because of the considerable variation that can occur in such text, attempting to extract computable dose information can become a difficult pattern-recognition or NLP project.

The full Observational Medical Outcomes Partnership data model allows recording of the number of refills, the number of days' supply, and the total quantity of drug, but does not try to address dosage issues explicitly. This illustrates the overall challenges related to determining actual administered drug dose information reliably.

Utilizing known adverse event information for drug surveillance

Pharmocovigilance (drug surveillance) efforts can utilize existing knowledge about adverse events in several ways:

  • Drug surveillance may resemble data mining with hypothesis generation. Formally designed studies must later confirm (or disprove) initial signals or trends detected in the raw data. A significant problem in data mining exercises is the over-abundance of signals. Such problems multiply if the software lacks information on what is already common knowledge, as in the apocryphal story of the software program that discovered that ovarian cancer only occurs in women. One way for software to reduce pharmacovigilance study noise levels is to post-process signals by checking against known adverse event information for the drugs under suspicion, so that only novel signals are considered for further exploration.

  • Existing adverse event knowledge about closely related chemical compounds can also serve to focus the surveillance. For example, programs should monitor new aminoglycoside antibiotics for adverse renal or vestibulo-cochlear effects, and new statin-class drugs for hepatotoxicity and myopathy.

  • If one knows a drug's pharmacological mechanisms of action, one can predict part of its potential adverse event profile before case reports appear in the literature. A new drug with anticholinergic side effects will likely cause urinary retention in elderly males with benign prostatic hypertrophy, and can potentially exacerbate glaucoma in those patients known to have the condition. Such patients are not commonly subjects in clinical trials of drugs, which often may not specifically target older populations.

While commercial drug databases store such content, they are proprietary and vary considerably in design. Since comparative descriptions of commercial content have not been published in the literature, avoiding a blanket assessment is necessary here. However, a considerable portion of the proprietary content reproduces entirely the prose in the FDA-mandated package insert, and the latter is now freely available through NLM's DailyMed.26 The added value of proprietary sources comes in part from categorizing the textual content into functional categories (side effects, pregnancy, and lactation), organ systems, and an occasional severity indicator, but this is not sufficient for drug surveillance purposes.

The time is now appropriate for systematic efforts (preferably combining public and private resources) to extract the information that is present in the numerous primary and secondary adverse event data repositories into a single, over-arching structured representation with standard form and content. Such structuring will possibly be facilitated by the creation of a standard terminology of adverse event content that has much richer inter-relationships than are present in MedDRA, and where aspects of the same spectrum of disease are correlated along a time-and-severity spectrum–for example, angina pectoris and myocardial infarction–as opposed to merely being related concepts. Bodenreider's pilot work at using SNOMED CT represents the starting point for such efforts. A larger consortium should build upon this work.

Acknowledgments

The author wishes to thank Randolph Miller for valuable feedback on the manuscript.

Footnotes

Competing interests: None.

Provenance and peer review: Not commissioned; not externally peer reviewed.

References


Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of Oxford University Press

RESOURCES