Skip to main content
. 2023 Dec 5;6(4):ooad100. doi: 10.1093/jamiaopen/ooad100

Table 4.

Main challenges and solutions of the current work.

Challenge Example Solution
The same health event is represented in several source datasets without a clear link between them, potentially leading to duplicates. The same diagnosis code for a patient may be recorded in an EHR, claim, and prescription files. However, it may be difficult to link these documents to a single event due to the absence of a unique identifier for the case. Transform each record as they are (even if duplicates) but add the provenance information to the record so one can use it when making cohorts.
No clear guidelines for choosing target vocabulary when multiple standard OMOP vocabularies are available. Additionally, there is no roadmap indicating which standard vocabularies may no longer be considered standard for OMOP CDM in the near future. Physician Current Procedural Terminology Fourth Edition (CPT4) and SNOMED CT are both standard OMOP vocabularies for procedures; similarly, LOINC and SNOMED CT are for lab tests. The National Cancer Institute Thesaurus (NCIt) was a standard OMOP vocabulary at the beginning of our study, but not standard anymore. Use the target vocabulary you are more familiar with. Keep in mind that what constitutes a standard OMOP vocabulary may change over time.
Hard to keep manual mapping files up to date as the standard target concepts change over time when updating the vocabularies. Local code “9124,” which is used for vaccination against diphtheria and tetanus, was mapped to SNOMED CT code “73152006” (administration of diphtheria and tetanus vaccine). That target concept changed from standard to nonstandard at some point in time. Thus, we had to remap it to the concept code “1657590” from RxNorm vocabulary (diphtheria toxoid vaccine, inactivated/tetanus toxoid vaccine, inactivated injection). Whenever updating the vocabulary, recheck the mappings in Usagi before running the transformation. Usagi automatically creates the list of nonstandard mappings so one can fix them before the actual data transformation.
Hard to keep track of all the historical coding versions of the same event to use similar target mapping for these. The atypical squamous cells of undetermined significance (ASC-US) result of the Papanicolaou test have been recorded in our datasets by SNOMED CT code “39035006,” SNOMED morphology code “M-697102,” local codes “D,” “D1,” and “D1.1,” and in free-text format. When working with historical codes and data, always check the most recent target code for this event to reuse the same code.
Broad source codes are hard to map to specific target codes. Local code “7004” is used for all kinds of biopsies in the claims database. Also, “HPV test” (referring to human papillomavirus testing) or “eGFR” (estimated glomerular filtration rate) are noted in the text without further details on which particular test (LOINC code) was carried out. Try to use additional information from the same medical record to specify the target code. For example, a diagnosis code referring to the prostate may help to map the biopsy to a more specific prostate biopsy.
Which of these events to transform—the prescription of the drug or the purchase? Or both? After a drug is prescribed to the patient, they may or may not purchase it. Sometimes the buy-out happens several months after the prescription. Although the OMOP CDM allows recording both types of events separately, drug era calculation does not differentiate between these.

Prefer the purchase information as it better reflects what the patient may have consumed.

Do not hesitate to consult with the OHDSI community and other research groups working with OMOP CDM, as their experience can give invaluable input on how to deal with your data most effectively.

ATC codes for drugs consisting of several ingredients map to nonstandard RxNorm codes. ATC code “C09BX01” (perindopril, amlodipine, and indapamide; systemic) refers to an angiotensin-converting enzyme (ACE) inhibitor combination drug. It has 3 different ingredients. In OMOP CDM, there is no single standard concept for that combination drug, and mapping it into 3 separate ingredients (perindopril, amlodipine, indapamide) can lead to other problems. Use extra information about the drug, such as product information, to find standard RxNorm codes for the mapping.
It is not systematically specified in the LOINC coding system what the expected results of a lab test are, making it difficult to decide on which LOINC answer code the results should be mapped to. Depending on a particular lab test, the negative results can be given as “Negative,” “Not present,” “Not detected,” “Absent,” etc. For some tests, the expected results are not given at all in LOINC nomenclature. In cases where the official result code is not specified in LOINC, use LOINC standard “negative” code.
It is difficult to achieve the best mapping quality without a complete understanding of the underlying medical practices. Although we have a specific code for prostate biopsy in a local code system, it is rarely used. Broader “biopsy” is used instead. Talk to medical personnel who can describe the underlying medical and data recording process.
Mapping is usually never 100% complete. In each study, there is a need to map some additional data. For a cancer study, we need to extract tumor specific TNM and stage information from free text parts of the source data. Build the whole mapping and transformation process as a repeatable software code and workflow so that each following study can reuse the mappings from previous studies. Be aware of the additional mapping needed and keep the necessary expertise in the team.