Skip to main content
. Author manuscript; available in PMC: 2011 Oct 30.
Published in final edited form as: Clin Pharmacol Ther. 2011 Jan 19;89(3):379–386. doi: 10.1038/clpt.2010.260

Figure 3. Structured and unstructured data generate high-quality phenotypes.

Figure 3

Upper Left: Recent advances in Natural Language Processing (NLP) allow extremely accurate reconstruction of comprehensive medication histories. Upper Right: Structured medication data generated by computerized provider order entry software (e.g. name-value pairs, such as “medication = tamoxifen”) can be easier to collate and analyze. However, structured data must be normalized across diverse systems of care. The National Library of Medicine (NLM) has developed a terminology called RxNorm [http://www.nlm.nih.gov/research/umls/rxnorm/], linking drug names (dose, ingredient, and formulation) with drug vocabularies commonly used in pharmacy management systems (e.g., First Databank, Micromedex, MediSpan, MediSpan, Gold Standard Alchemy, Multum). Bottom: Structured and unstructured data can be merged to yield high-quality drug exposure phenotypes that facilitate pharmacogenomic studies using EMRs.