Figure 3. Structured and unstructured data generate high-quality phenotypes.
Upper Left: Recent advances in Natural Language Processing (NLP) allow extremely accurate reconstruction of comprehensive medication histories. Upper Right: Structured medication data generated by computerized provider order entry software (e.g. name-value pairs, such as “medication = tamoxifen”) can be easier to collate and analyze. However, structured data must be normalized across diverse systems of care. The National Library of Medicine (NLM) has developed a terminology called RxNorm [http://www.nlm.nih.gov/research/umls/rxnorm/], linking drug names (dose, ingredient, and formulation) with drug vocabularies commonly used in pharmacy management systems (e.g., First Databank, Micromedex, MediSpan, MediSpan, Gold Standard Alchemy, Multum). Bottom: Structured and unstructured data can be merged to yield high-quality drug exposure phenotypes that facilitate pharmacogenomic studies using EMRs.