Skip to main content
AMIA Annual Symposium Proceedings logoLink to AMIA Annual Symposium Proceedings
. 2025 May 22;2024:693–702.

Optimizing Medication Querying Using Ontology-Driven Approach with OMOP: with an application to a large-scale COVID-19 EHR dataset

Xiaojin Li 1,3,*, Yan Huang 1,3, Licong Cui 2,3, Shiqiang Tao 1,3, Guo-Qiang Zhang 1,2,3,*
PMCID: PMC12099415  PMID: 40417523

Abstract

Efficient querying for medication information in Electronic Health Record (EHR) datasets is crucial for effective patient care and clinical research. To address the complexity and data volume challenges involved in efficient medication information retrieval, we propose an ontology-driven medication query (ODMQ) optimization approach, leveraging the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). Integrating semantic ontology structures from the OMOP CDM can help enhance query accuracy and efficiency by broadening the scope of relevant medication terms like drug names, National Drug Codes, and generics, resulting in more comprehensive query outcomes than traditional methods. ODMQ significantly reduces manual search time and enhances query capabilities. We validate ODMQ’s efficacy using real-world COVID-19 EHR data, demonstrating improved query performance. Through a comprehensive manual review, ODMQ ensures that expanded search terms are relevant to user inputs. It also includes an intuitive query interface and visualizes patient history for result validation and exploration.

1. Introduction

Electronic Health Record (EHR) serve as digital repositories containing comprehensive patient health information, encompassing medical histories, diagnoses, treatments, and medication records [1, 2]. These digitized records have revolutionized healthcare delivery by enabling efficient data storage, retrieval, and sharing among healthcare providers [3, 4]. However, the sheer volume and heterogeneity of EHR datasets present significant challenges in extracting actionable insights, particularly regarding medication-related information [5, 6].

To facilitate standardized data representation and analysis, several tools have been developed, such as Unified Medical Language System (UMLS) Terminology Services, and Systematized Medical Nomenclature for Medicine–Clinical Terminology (SNOMED CT), RxNorm and Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). The UMLS Terminology Services, maintained by the United States National Library of Medicine, play a crucial role in facilitating interoperability and standardization in biomedical and healthcare informatics by providing access to a rich and diverse set of standardized terminologies and ontologies [7]. SNOMED CT offers a standardized and scientifically validated framework for representing clinical data collected by healthcare professionals. Its integration into EHR enhances the potential for efficient data utilization and ensures improved documentation quality [8]. The OMOP CDM, managed by the Observational Health Data Sciences and Informatics (OHDSI), provides a standardized framework for organizing and harmonizing disparate healthcare data sources, thereby facilitating interoperability and enabling large-scale analytics across diverse healthcare settings [9] .

Efficient querying of medication data within EHR datasets is essential for various healthcare applications, including clinical decision support, pharmacovigilance, and epidemiological studies [10, 11]. However, conventional querying methods often suffer from limitations such as lack of semantic interoperability, and difficulty in handling complex medication hierarchies [12]. As a result, there is a growing need for advanced querying methodologies capable of efficiently extracting medication-related information from EHR datasets while addressing these inherent challenges [13].

Existing approaches to medication query optimization often rely on traditional database querying techniques, such as Structured Query Language (SQL) queries or keyword-based search methods. While these methods may suffice for simple querying tasks, they often struggle to handle the complexity and heterogeneity of medication data present in EHR datasets [14]. Moreover, they may lack semantic understanding and fail to capture nuanced relationships between medications, thereby limiting their effectiveness in comprehensive medication querying tasks [14, 15]. Therefore, clinical researchers have traditionally relied on manual extraction of the necessary medication query criteria, including brand names, drug names, National Drug Codes (NDCs), or generic names, before requesting data to identify eligible patients. However, this conventional approach is not only time-consuming and labor-intensive but also prone to limitations stemming from the intricate relationships and hierarchies within medication classifications. As a result, researchers often encounter challenges in obtaining comprehensive and precise medication-related information, leading to query results that may fail to capture the full spectrum of relevant data. For instance, a single medication may be available under multiple brand names or have various dosage strengths, making it challenging to identify all relevant instances through manual query formulation alone.

Additionally, variations in medication nomenclature and coding standards further compound the difficulties in accurately retrieving medication information from EHR datasets [16]. Consequently, the reliance on manual methods for specifying query criteria not only obstructs efficiency and productivity but also introduces the risk of errors and inconsistencies in the querying process. Therefore, there is a pressing demand for advanced querying methodologies that can transcend the limitations associated with manual approaches. These methodologies are essential for harnessing the wealth of medication data embedded within EHR datasets, enabling more insightful and comprehensive analyses. [14].

This study proposes an ontology-driven optimization approach to enhance medication querying capabilities within EHR datasets, named ODMQ. By integrating semantic ontology structures with the robust framework of OMOP CDM, our methodology offers a novel solution to improve the efficiency, accuracy, and semantic richness of medication queries. By performing experiments and evaluations on the real-world COVID-19 EHR dataset, the results demonstrate the effectiveness and scalability of our approach in addressing the challenges posed by medication querying in healthcare analytics. ODMQ’s effectiveness extends beyond simply covering the medication terms provided by domain experts; it also ensures that the expanded search terms remain pertinent to the user’s input. It enhances the precision and relevance of medication queries, thereby improving the overall recall and accuracy of healthcare analytics processes. The experiment results serve as evidence of ODMQ’s capability to handle the intricacies of real-world EHR data, paving the way for its widespread adoption and utilization in diverse healthcare settings. Moreover, ODMQ includes an intuitive query interface that simplifies the process of formulating queries for medication information within EHR datasets. It further enhances usability by providing visualizations of patient history, facilitating straightforward validation of query results and enabling deeper exploration of medication-related data points.

2. Background

2.1. Observational Medical Outcomes Partnership (OMOP)

The OHDSI initiative represents a promising international endeavor aimed at optimizing the secondary utilization of observational data. This effort involves harmonizing and standardizing clinical data while developing scalable analytical tools. Central to this initiative is the OMOP CDM, which is a standardized framework for organizing and analyzing healthcare data from various sources. The OMOP CDM structures patient demographics, clinical events, healthcare encounters, medications, procedures, and other data, promoting interoperability and facilitating consistent analyses across diverse healthcare databases. [9]. The terminologies utilized within the OMOP CDM for diagnoses/conditions, observations, and drugs are drawn from established sources such as the International Classification of Diseases codes [17, 18], SNOMED-CT [8, 19], and the normalized naming system for generic and branded drugs (RxNorm) [20, 21]. To leverage these concepts effectively, users typically retrieve the mapped tables from the Automated Terminology Harmonization, Extraction and Normalization for Analysis (ATHENA) [22] standardized vocabulary tool provided by OHDSI. Subsequently, the harmonized data, stored in the OMOP CDM format, can be employed in systematic studies, population-level estimations, drug and biomarker evaluations, and further patient-level predictions [23, 24]. One of the key strengths of the OMOP CDM is its flexibility and extensibility. The data model accommodates a wide range of data sources and healthcare vocabularies, allowing researchers to incorporate additional data elements or customize the model to suit specific research needs [25]. This adaptability makes the OMOP CDM well-suited for diverse research applications, including pharmacoepidemiology, comparative effectiveness research, and post-market drug safety surveillance. Furthermore, researchers can use common analytical tools and methodologies developed within the OMOP community to perform robust and consistent analyses across different datasets. This standardization enhances the reliability and validity of research findings, thereby advancing the scientific understanding of healthcare interventions and outcomes [26].

2.2. Ontology-based query expansion approach with EHR data

Several ontology-based query expansion approaches has been proposed recently. Martinez et al. developed an automatic query expansion method utilizing the UMLS [11]. Lu proposed a query expansion method leveraging WordNet and UMLS [27]. Chen et al. presented an approach to semantic query expansion systems grounded in the Hepatitis ontology [28]. Karunakaran et al. introduced a method focused on enriching user queries through the utilization of UMLS [29]. Malik et al. introduced a query expansion framework utilizing the Clinical Diagnosis Information Ontology [30]. These approaches leverage the structured nature of ontological frameworks to enhance the precision and recall of search queries, thereby improving the effectiveness of information retrieval systems. Through the utilization of ontologies, these methods aim to broaden the scope of search queries by incorporating relevant concepts, synonyms, hierarchies, and semantic relationships, ultimately facilitating more comprehensive and accurate retrieval of relevant information from large datasets.

2.3. Medication Attributes

Medications are complex entities consisting of various attributes that serve different purposes in healthcare systems. The medication attributes play a crucial role in medication identification, differentiation, and classification within healthcare databases and information systems. By accurately representing these attributes, healthcare providers can ensure safe and effective medication use, minimize medication errors, and optimize patient outcomes. Additionally, these attributes are essential for conducting medication-related research, including pharmacoepidemiological studies, comparative effectiveness research, and drug safety surveillance efforts.

Brand Name: Brand name, also known as trade name or proprietary name, are is identifiers assigned to medications by pharmaceutical companies for marketing purposes [31]. Brand names are often distinctive and may vary between different regions or countries. They play a crucial role in medication branding, promotion, and recognition among healthcare providers and patients.

Generic Name: Generic name, also known as non-proprietary name or active ingredient, refers to the chemical compound or substance responsible for the therapeutic effects of a medication [32]. Unlike brand names, generic names are not proprietary and are assigned based on the medication’s pharmacological properties. Generic names are standardized and internationally recognized, facilitating communication among healthcare professionals and ensuring consistency in medication labeling and prescribing practices.

National Drug Code (NDC): NDC is a unique identifier assigned to each medication product marketed in the United States [33]. NDC comprise three segments: the labeler code, product code, and package code. The labeler code identifies the manufacturer or distributor of the medication, while the product code specifies the formulation and strength. The package code indicates the package size and type. NDC facilitate medication tracking, inventory management, and reimbursement processes within healthcare systems.

Drug Name: Drug Name, or Medication Name, serves as a fundamental descriptor for capturing and documenting the specific pharmaceutical agents prescribed or administered to patients within their medical histories. It encompasses a comprehensive array of information crucial for healthcare professionals, including the Clinical Drug Component, which delineates the active pharmaceutical ingredient; the Branded Drug Component, which signifies the proprietary brand name associated with the medication; Clinical Drug or Pack, denoting the standard clinical packaging of the drug; Branded Drug or Pack, representing the commercial packaging associated with the brand; Clinical Dose Form Group, outlining the standardized form in which the medication is administered clinically; and Branded Dose Form Group, indicating the commercial form in which the medication is marketed [20].

3. Methods

The workflow of query without expansion, referred to as simple medication query (SMQ) system, is illustrated in Figure 1.A. In this workflow, the query translator takes the user input directly as the search term and formulates the corresponding query statement. This statement is then executed within the backend database to retrieve relevant information. As a result, the returned results encompass the database records that match the user’s query criteria. In this simplified approach, the query translator serves as the primary interface between the user and the database system. By accepting user input directly, the translator initiates the query formulation process. However, while this method offers simplicity and straightforwardness, it may lack the sophistication necessary to capture the full capture of user intent. The direct use of user input as the search term can sometimes lead to limited query scope and potentially overlook relevant information. This limitation becomes apparent especially in scenarios where users may not articulate their queries in a precise or exhaustive manner. As a consequence, the query results obtained through this approach might be incomplete or fail to capture the nuanced aspects of the user’s information needs.

Figure 1:

Figure 1:

The workflow of simple medication query (A) and ODMQ (B).

The workflow of ODMQ is shown in Figure 1.B. In contrast to the direct utilization of user input for querying, ODMQ employs a different methodology, which initiates the querying process by subjecting the user input to thorough analysis. Leveraging the inherent benefits of integrating diverse terminologies, such as RxNorm and NDC, through the OMOP framework, this approach facilitates the acquisition of supplementary information pertaining to the user input. By leveraging this combination of terminologies, ODMQ enriches the querying process with a broader array of medication search terms. These expanded search terms includes not only brand names but also drug names, NDCs, and generic names, thereby encompassing a comprehensive range of medication options. Consequently, this approach extends and broadens the search scope beyond the limitations of the initial user input. The integration of multiple terminologies enables ODMQ to extract a comprehensive set of relevant information from the database. By considering various medication options and their associations within the OMOP framework, ODMQ reduces the risk of missing any relevant database records. This medication query optimization enhances result accuracy and completeness, facilitating healthcare decision-making and analysis.

3.1. Data and Data Model

As an application of ODMQ, we use the COVID-19 EHR data that comes from OPTUM® de-identified COVID-19 Electronic Health Record data set, which is drawn from dozens of healthcare providers in the United States, including more than 700 hospitals and 7,000 clinics. It includes EHR data for 7 million unique individuals who have documented clinical care with a documented diagnosis of COVID-19 or acute respiratory illness after 02/01/2020 and/or documented COVID-19 testing regardless of their results. The data incorporates a wide swath of raw clinical data, including new, unmapped COVID-specific clinical data points from both inpatient and ambulatory electronic medical records, which include patient-level information: demographics, diagnoses, procedures, lab tests, care settings, medications prescribed or administered, and mortality. These data are certified as de-identified by independent statistical experts in accordance with Health Insurance Portability and Accountability Act (HIPAA) statistical de-identification rules and are managed in accordance with the OPTUM® customer data use agreement.

We build the Event-level Inverted Index (ELII) with the OPTUM® COVID-19 data. ELII consists of 4 components [34]: 1) conventional inverted index, which contains the inverted indices of time-invariant variables, 2) timeline inverted index, which consists of inverted indices of time-dependent variables, 3) patient timeline, which includes all clinical events and related information for each patient, and 4) a global lookup table, which contains forward indices of all variables and associated inverted indices. With such a design, the performance of temporal queries can be significantly improved (26-88 times improvement) [34]. Leveraging the advantages of ELII in temporal queries, we design and implement a query engine for ODMQ and further data exporting.

We have developed a database encompassing the OMOP CDM information sourced from ATHENA [22], which stands as a curated database that helps researchers easily identify and match codes to standard OMOP equivalents. The constructed database includes 3 components: 1) a concept table containing all mapped concepts, 2) a concept ancestor table establishing ancestor/descendant relationships between concepts, and 3) a concept relationship table detailing mappings between different terminologies.

3.2. ODMQ Query Engine

The ODMQ for cohort exploration with a specific medication search term is visually depicted in Figure 2. This query execution process comprises five distinct steps: 1) Acquisition of search term (Figure 2.1), the system retrieves the search term from the query interface, serving as the starting point for query formulation; 2) Ontology-driven query expansion (Figure 2. 2), initially, the system identifies the OMOP concept associated with the search term and subsequently explores other concepts linked through specific relationships (e.g., ancestor, descendant, and sibling). Once related concepts are identified, the system gathers all mapped concepts, such as RxNorm concepts mapped to NDC concepts. This process generates three lists: a drug name list, an NDC list, and a generic name list. Subsequently, it constructs an object in JavaScript Object Notation (JSON) format representing the query definition. The object encompasses not only the query terms derived from the user’s input but also incorporates the outcomes of query expansion. Additionally, it contains supplementary metadata that offers a descriptive overview of the query; 3) Query translation (Figure 2. 3), the system seamlessly translates the expanded query, embodied within the generated object, into specific MongoDB statements. This translation process involves mapping query components to three essential fields within the OPTUM® COVID-19 data: “DRUG NAME,” “NDC,” and “GENERIC DESC;” 4) Query execution (Figure 2.4), the translated MongoDB statements are transmitted to the relevant data sources for execution. This phase involves interfacing with the designated databases or repositories to retrieve the necessary information. ELII plays a crucial role in enhancing the performance and efficiency of the query execution process. ELII leverages advanced indexing and integration techniques to optimize data retrieval operations, significantly improving search speed and accuracy [34]; and 5) Result retrieval and presentation (Figure 2.5), in contrast to the conventional approach in existing query systems, leveraging the advantages of our ELII design, ODMQ can return the list of unique patient identifiers instead of returning the patient counts. Such list provides additional functionality to the interface, facilitating access to row-level cohort data. Users can access detailed patient-level information with a single mouse click, increasing the ability to gain actionable insights from query results.

Figure 2:

Figure 2:

The system architecture of ODMQ query engine.

3.3. Patient-level information visualization

Patient-level information visualization involves graphically representing individual patient data in healthcare settings. This method provides healthcare professionals with an intuitive means to interpret complex patient information, helping them identify patterns, trends, and anomalies more effectively. It enables healthcare providers to gain deeper insights into patient histories, treatment responses, and health outcomes. By visualizing data such as laboratory results, medication records, procedure history, and clinical notes, clinicians can as- Figure 2: The system architecture of ODMQ query engine. sess a patient’s medical histroy, track their progress over time, and make informed care decisions. Visualization techniques, including interactive dashboards, timelines, charts, and graphs, can be tailored to specific clinical contexts and user preferences, allowing professionals to focus on relevant information and explore details as needed.

Utilizing the benefits of ELII, ODMQ efficiently gathers extensive medication data for designated patients across all time frames. We’ve devised and integrated an interactive patient-level information visualization interface tailored explicitly for presenting temporal EHR data. This interface serves as a conduit, bridging the gap between data organization within the source and the temporal- and record-specific views crucial for medical and research endeavors. Key features of this interface include: 1) flexibility in defining time intervals to observe medication data, 2) seamless integration with patient demographic details to provide contextual insights, 3) interactive filtering tools for pinpointing specific medications of interest, 4) enhanced export capabilities for generating reports or sharing visualizations with collaborators, and 5) ability to superimpose medication data onto other clinical parameters, such as laboratory findings or procedure history, for comprehensive patient surveillance.

4. Experimental Results

4.1. Data repository

We efficiently processed over 30 billion records derived from the OPTUM® COVID-19 data version of 20220120, comprising data sourced from a vast pool of raw text files totaling more than 3 TB. This wealth of information was meticulously stored within our MongoDB database infrastructure. In total, the dataset encapsulated details pertaining to over 8.8 million patients, exhibiting a distribution of approximately 56.1% female, 43.6% male, and 0.3% categorized as unknown. Within our ELII framework, comprehensive patient profiles were constructed, encompassing a spectrum of demographic attributes, diagnoses, procedures, laboratory tests, care settings, and medication records.

4.2. Query expansion result

In this study, we utilized a COVID-19 research investigation as a practical case study to elucidate and validate the efficacy of the ODMQ. Our prior research focused on COVID-19 and multiple sclerosis (MS) [36], where the extraction of medication data for all patients within the cohort proved to be a laborious and time-intensive endeavor. The SMQ system, restricted to querying based solely on provided search terms, necessitated physicians to manually compile lists of both brand and generic medication names. Additionally, the acquisition of drug names and NDCs posed challenges, requiring specialized searches through external sources such as Google search or the RxNorm website. Of primary concern to physicians was whether specific components (e.g., “Interferon” or “Fingolimod”) would impact the study participants. Consequently, physicians were primarily interested in identifying the generic or brand names, rendering the manual retrieval of drug names or NDCs a considerable impediment to their research endeavors. Our investigation revealed that 80% of the time allocated was spent on collecting drug names and NDCs.

To assess the effectiveness of ODMQ, we used the brand names, drug names, and NDCs manually acquired by domain experts as the baseline and compared them with the outcomes derived from ODMQ. We conducted two tests to evaluate ODMQ: 1) using 18 brand names as the search term (Test 1), including “Betaseron,” “Rebif,” “Avonex,” “Copaxone,” “Plegridy,” “Tysabri,” “Extavia,” “Gilenya,” “Aubagio,” “Tecfidera,” “Lemtrada,” “Glatopa,” “Ocrevus,” “Mavenclad,” “Mayzent,” “Zepopsia,” “Vumerity,” and “Rituxan,” and 2) using 13 generic names as the search term (Test 2), including “Interferon,” “Glatiramer acetate,” “Natalizumab,” “Fingolimod,” “Teriflunomide,” “Dimethyl fumarate,” “Alem-tuzumab,” “Ocrelizumab,” “Cladribine,” “Siponimod,” “Ozanimod,” “Diroximel fumarate,” and “Rituximab.” Table 1 shows the query results. In the table, “N*” indicates the count of terms, while “N*_expert” denotes the number of terms supplied by domain experts. For instance, “Ndrug_expert” means the number of drug names provided by the domain expert, whereas “Ndrug” represents the number of drug names generated by ODMQ. We have manually reviewed all query expansion outcomes, and recall refers to the percentage of ODMQ output that encompasses the baseline, while precision represents the percentage of ODMQ output relevant to the search term.

Table 1:

Query expansion results with different tests

Search Term Test Group Ndrug expert Nndc expert Ngeneric expoert Ndrug Nndc Ngeneric Runtime (s)
Betaseron Test 1 6 6 1 278 27 2 3.82
Rebif Test 1 8 8 1 868 62 1 4.35
Avonex Test 1 8 8 1 884 46 1 4.37
Copaxone Test 14 4 1 327 32 1 3.94
Plegridy Test 1 12 12 1 738 25 1 4.24
Tysabri Test 1 1 1 1 106 7 1 3.72
Extavia Test 1 2 2 1 311 27 1 3.93
Gilenya Test 1 4 4 1 92 74 1 3.81
Aubagio Test 1 5 5 1 164 147 1 3.85
Tecfidera Test 1 3 3 1 113 150 1 3.77
Lemtrada Test 1 1 1 1 137 25 1 3.79
Glatopa Test 1 4 4 1 266 32 1 3.94
Ocrevus Test 1 1 1 1 38 2 1 3.69
Mavenclad Test 1 8 8 1 71 17 1 3.78
Mayzent Test 1 5 5 1 39 13 1 3.68
Zepopsia Test 1 6 6 1 0 0 0 3.66
Vumerity Test 1 3 3 1 13 6 1 3.72
Rituxan Test 1 1 1 1 306 23 1 3.95
Interferon Test 2 36 36 1 3459 238 12 9.47
Glatiramer acetate Test 2 8 8 1 255 32 1 3.86
Natalizumab Test 2 1 1 1 97 7 1 3.74
Fingolimod Test 2 4 4 1 105 74 1 3.69
Teriflunomide Test 2 5 5 1 172 147 1 3.90
Dimethyl fumarate Test 2 3 3 1 221 150 2 4.01
Alemtuzumab Test 2 1 1 1 128 25 1 3.79
Ocrelizumab Test 2 1 1 1 39 2 1 3.61
Cladribine Test 2 8 8 1 194 38 1 4.05
Siponimod Test 2 5 5 1 42 13 1 3.82
Ozanimod Test 2 6 6 1 43 14 1 3.86
Diroximel fumarate Test 2 3 3 1 14 6 1 3.75
Rituximab Test 2 1 1 1 393 23 2 4.22

In Test 1, the majority of query expansion results (16/18) fully covered the medication terms (including drug names, NDCs, and generic names) provided by domain experts, achieving a recall rate of 100%. Additionally, all expanded terms generated by query expansion were relevant to the input, resulting in a precision of 100%. Such findings were to be expected, given the robustness and precision inherent in the OMOP CDM, meticulously formulated and curated to ensure highly accurate mappings and inter-concept relationships. There were two search terms, “Betaseron” and “Zepopsia”, that did not achieve 100% recall or precision. In the case of “Betaseron”,the query expansion results included the additional generic name “sodium chloride”. While “sodium chloride” is indeed one of the ingredients in “Betaseron” [37], including it in the medication query would retrieve numerous EHR medication records unrelated to “Betaseron.” Consequently, we considered this inclusion as a false positive, resulting in a decreased precision of 50% for generic names. But, it still achieved 100% recall and precision for drug names and NDCs. For “Zepopsia”, ODMQ failed to identify a matching concept within the OMOP, leading to recall and precision values of 0 for all medication terms. The average processing time for each search term in Test 1 was 3.89 seconds. In Test 2, the results were similar to those of Test 1, indicating a high level of consistency. Specifically, 13 out of the total 14 search terms obtained results where both recall and precision achieved 100%. However, there was a discrepancy for one specific search term, “Interferon,” with lower recalls for drug names and NDCs, standing at 69.4%. In this case, the missed NDCs are classified within the descendants of “Interferons” in the National Drug File Reference Terminology, rather than being categorized under the descendants of “Interferon” in RxNorm. During the query expansion process, ODMQ prioritized matching within RxNorm under “Interferon,” thereby resulting in the exclusion of these NDCs. Moreover, the generic names “sodium chloride” and “ribavirin” were erroneously classified as false positives, resulting in a reduction of the precision of generic names to 83.3%. The average processing time for each search term in Test 2 was 3.99 seconds.

4.3. Query Interface

As shown in Figure 3, the ODMQ query interface consists of three components: 1) user input (Figure 3.A), 2) a summary of expanded mediation search terms (Figure 3.B), and 3) query results display area (Figure 3.C). Once the user enters the search term and adds it to the query, users will find the expanded medication search terms readily available for review on the right side of the interface. This feature offers users an opportunity to thoroughly examine the expanded terms, which include details such as drug names, NDCs, and generic names. By providing this overview, users can make well-informed decisions regarding the continuation of their query based on the expanded search terms. Moreover, to facilitate further utilization and analysis, users can download a summary of the expanded medication search terms by simply clicking on the designated download icon. This additional functionality empowers users to extend their exploration and investigation beyond the confines of the query interface, enabling in-depth analysis and collaboration across various research endeavors. Within the query results display area, users can find a list of returned patient IDs (as shown in Figure 3.C, with some IDs partially concealed to adhere to the OPTUM® COVID-19 data usage agreement in the Figure). This list enables users to navigate through patient-level information with ease by simply clicking on the corresponding patient ID. Moreover, to accommodate diverse analytical needs and workflows, users have the added convenience of exporting the query results as a comma-separated values (CSV) file. This export feature empowers users to conduct in-depth analysis and manipulation of the data outside of the query interface, thereby enhancing their ability to derive meaningful insights from the dataset.

Figure 3:

Figure 3:

The ODMQ query interface.

4.4. Patient-level information rendering

Figure 4 depicts the patient-level information visualization interface, illustrating the EHR history of a patient identified by the patient ID “PT3816xxxxx” spanning from January 1, 2020, to March 13, 2024. Users have the flexibility to adjust these criteria in the search area, as demonstrated in Figure 4.A.

Figure 4:

Figure 4:

The patient-level information visualization interface.

The visualization area encompasses the patient’s demographic data and historical records, as showcased in Figure 4.B. Within this visualization, records are grouped by date, with each circle representing a day. The connecting rectangle summarizes the number of various record types for each day, such as diagnoses, medications, and procedures. Hovering over a circle reveals the details of the records corresponding to that date. Clicking on a circle displays all records for that particular day, as depicted in Figure 4.B2, with different colors representing distinct record types. Additionally, BMI, blood pressure, and hemoglobin data are visualized, as illustrated in Figure 4.B3.

The record filtering area, depicted in Figure 4.C, empowers users to filter records by selecting different record types and to highlight records containing specific characters entered by the user. For instance, Figures 4.C1-C3 exemplify the process of filtering records for “Medication.” Figure 4.C2 exclusively showcases days containing medication records, while Figure 4.C3, similar to Figure 4.B2, exclusively presents medication records from July 8, 2020.

5. Discussion

The experimental results highlight the effect of ODMQ in improving medication query outcomes. The findings from the query expansion affirm ODMQ’s effectiveness as a powerful tool for enhancing medication queries.In analyzing the expansion results generated by ODMQ, it becomes evident that its effectiveness extends beyond the mere coverage of medication terms prescribed by domain experts. Notably, ODMQ goes a step further, ensuring that the expanded search terms align with the user’s input. This approach not only improves the comprehensiveness of the search results but also enhances their relevance and applicability in clinical practice. At the core of ODMQ’s effectiveness lies in its utilization of the comprehensive and meticulously maintained OMOP framework. OMOP’s robust infrastructure, curated mappings, and intricate relationships between concepts form the bedrock upon which ODMQ operates. By harnessing the wealth of data encapsulated within OMOP, ODMQ is empowered to achieve exceptional levels of recall and precision in medication query outcomes.

OHDSI ATLAS [38] stands as a comprehensive software platform meticulously crafted by the OHDSI community, leveraging the OMOP CDM framework. This platform serves as a powerful tool for cohort exploration and analysis, providing researchers, clinicians, and data scientists with a robust infrastructure to navigate and analyze vast repositories of healthcare data [39]. Compare to OHDSI ATLAS, ODMQ is tailored specifically to address the challenges associated with medication querying within a large-scale EHR database. ODMQ’s primary objective is to optimize the efficiency and precision of medication queries through the utilization of advanced methodologies. Unlike OHDSI ATLAS, ODMQ only relies on medication-related concepts extracted from the OMOP CDM, eliminating the need for the installation of the entire OMOP framework. This focused approach results in a more streamlined and targeted implementation, enhancing ODMQ’s agility and effectiveness. Moreover, ODMQ is engineered to be lightweight and seamlessly integrable into existing query systems or applications. Its specialized functionality and concentrated scope render it ideal for incorporation into various healthcare data infrastructures, facilitating enhanced medication querying capabilities without imposing significant overhead. Thus, ODMQ emerges as a tailored solution tailored to the unique challenges posed by medication querying within large-scale EHR environments. It serves a broad range of users, including healthcare professionals such as physicians and clinical researchers, along with analytical experts like statistical analysts. ODMQ’s advanced capabilities empower stakeholders to efficiently access and analyze medication-related data within EHR datasets. In addition to facilitating research queries, ODMQ’s framework and algorithms can be adapted for integrating drug data across EHRs from various hospital systems. Furthermore, they also can be used to enhance the data capture process in medical systems to improve data standardization and uniformity.

The ODMQ system was implemented with Ruby on Rails and use MongoDB as the backend database. Ruby on Rails is a robust web application framework written in Ruby, designed for developing database-backed web applications quickly and efficiently. Its advantages include a strong emphasis on convention over configuration, which reduces the amount of code developers need to write, and a rich ecosystem of libraries and tools that accelerate development and improve productivity. We use MongoDB [35] as the backend database since it has the following advantages: 1) good query performance with the large-scale dataset, 2) flexible data models so that we can build our own models for the inverted index to improve the query performance, especially for temporal queries, 3) highly and easily scalable, and we can easily scale-up by adding commodity servers, and 4) easy for developers and leading to faster development time and fewer bugs. Furthermore, ELII, which demonstrates superior query performance, is built on MongoDB. Therefore, the ODMQ system adopts MongoDB to ensure optimal compatibility with ELII. However, this choice does not affect ODMQ’s ability to interface with other databases, as it can effortlessly generate query statements (e.g., SQL) tailored to the specific database it needs to connect to.

Limitations. During our experimentation, we have encountered instances where ODMQ faces limitations in certain scenarios. For instance, there are cases where corresponding concepts cannot be located within the OMOP database, as observed with “Zepopsia”, or where drug names are categorized under alternative concept categories, as observed by “Interferon”. These issues will be addressed to ensure the robustness and effectiveness of the query process in the future work. Our strategy involves the development of innovative search algorithms capable of maximizing the inclusion of pertinent concepts while minimizing the presence of extraneous or irrelevant information. Additionally, we will collaborate with diverse domain experts to construct a comprehensive testing dataset. The ODMQ has only been tested with one EHR dataset, and this collaborative effort aims to thoroughly assess ODMQ’s performance across various EHR datasets. In enhancing the query interface, our plan entails enabling users to select all expanded medication terms, thereby allowing them to craft customized queries tailored to their specific research objectives. We will conduct a usability evaluation to assess the effectiveness and user-friendliness of these interface enhancements. Currently, ODMQ is not publicly available as we are actively improving its performance and related documentation. We plan to release it as an open-source project in the future.

6. Conclusion

We proposed ODMQ, an ontology-driven optimization approach designed to enhance medication querying capabilities using the OMOP CDM. By integrating semantic ontology structures with OMOP, ODMQ simplified the process of obtaining comprehensive medication information, including drug names, NDCs, and generic names. Our approach streamlined the querying process, reducing the time required for clinical researchers to manually search for medication data and improving query efficiency. Through experiments and evaluations conducted on a substantial real-world COVID-19 electronic health record dataset, we validated the efficacy of ODMQ in terms of significantly improving medication query performance. Our results demonstrated that ODMQ not only covers medication terms provided by domain experts but also ensures the relevance of expanded search terms to user input. ODMQ also provides an intuitive query interface and visualizes patient history, facilitating easy result validation and exploration. Overall, our study contributes to advancing data-driven techniques aimed at optimizing medication querying processes, paving the way for more efficient and accurate retrieval of medication-related information in healthcare settings.

Acknowledgement.

This work was supported in part by the National Science Foundation (NSF) grant 2047001 and the National Institutes of Health (NIH) grants R01LM013335 and R01NS126690. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NSF or NIH.

Figures & Tables

References

  • [1].Tao S, Lhatoo S, Hampson J, Cui L, Zhang G.-Q. A bespoke electronic health record for epilepsy care (epitome): Development and qualitative evaluation. Journal of Medical Internet Research. 2021;23(2):e22939. doi: 10.2196/22939. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Thakkar M, Davis D. C. Risks, barriers, and benefits of ehr systems: a comparative study based on size of hospital. Perspectives in Health Information Management/AHIMA, American Health Information Management Association. 2006;3 [PMC free article] [PubMed] [Google Scholar]
  • [3].Meystre S. M, Lovis C, B¨urkle T, Tognola G, Budrionis A, Lehmann C. U. Clinical data reuse or secondary use: current status and potential future progress. Yearbook of medical informatics. 2017;26(01):38–52. doi: 10.15265/IY-2017-007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Walker J, Pan E, Johnston D, Adler-Milstein J, Bates D. W, Middleton B. The value of health care information exchange and interoperability: There is a business case to be made for spending money on a fully standardized nationwide system. Health affairs. 2005;24(Suppl1):W5–10. doi: 10.1377/hlthaff.w5.10. [DOI] [PubMed] [Google Scholar]
  • [5].Gonzalez-Hernandez G, Krallinger M, Muñoz M, Rodriguez-Esteban R, Uzuner Ö, Hirschman L. Challenges and opportunities for mining adverse drug reactions: perspectives from pharma, regulatory agencies, healthcare providers and consumers. Database. 2022;2022:baac071. doi: 10.1093/database/baac071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Bellazzi R, Diomidous M, Sarkar I N, Takabayashi K, Ziegler A, McCray A. T. Data analysis and data mining: current issues in biomedical informatics. Methods of information in medicine. 2011;50(06):536–544. doi: 10.3414/ME11-06-0002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Bodenreider O. The unified medical language system (umls): integrating biomedical terminology. Nucleic acids research. 2004;32(suppl1):D267–D270. doi: 10.1093/nar/gkh061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Donnelly K, et al. Snomed-ct: The advanced terminology and coding system for ehealth. Studies in health technology and informatics. 2006;121:279. [PubMed] [Google Scholar]
  • [9].Ahmadi N, Peng Y, Wolfien M, Zoch M, Sedlmayr M. Omop cdm can facilitate data-driven studies for cancer prediction: a systematic review. International Journal of Molecular Sciences. 2022;23(19):11834. doi: 10.3390/ijms231911834. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Zhu D, Carterette B. In 2012 IEEE International Conference on Bioinformatics and Biomedicine. IEEE; 2012. Improving health records search using multiple query expansion collections; pp. p. 1–7. [Google Scholar]
  • [11].Martinez D, Otegi A, Soroa A, Agirre E. Improving search over electronic health records using umls-based query expansion through random walks. Journal of biomedical informatics. 2014;51:100–106. doi: 10.1016/j.jbi.2014.04.013. [DOI] [PubMed] [Google Scholar]
  • [12].Jain H, Thao C, Zhao H. Enhancing electronic medical record retrieval through semantic query expansion. Information systems and e-business management. 2012;10:165–181. [Google Scholar]
  • [13].Xu B, Lin H, Lin Y. Learning to refine expansion terms for biomedical information retrieval using semantic resources. IEEE/ACM transactions on computational biology and bioinformatics. 2018;16(3):954–966. doi: 10.1109/TCBB.2018.2801303. [DOI] [PubMed] [Google Scholar]
  • [14].Holmes J. H, Beinlich J, Boland M. R, Bowles K. H, Chen Y, Cook T. S, Demiris G, Draugelis M, Fluharty L, Gabriel P. E, et al. Why is the electronic health record so challenging for research and clinical care? Methods of information in medicine. 2021;60(01/02):032–048. doi: 10.1055/s-0041-1731784. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Sarwar T, Seifollahi S, Chan J, Zhang X, Aksakalli V, Hudson I, Verspoor K, Cavedon L. The secondary use of electronic health records for data mining: Data characteristics and challenges. ACM Computing Surveys (CSUR) 2022;55(2):1–40. [Google Scholar]
  • [16].Bennett C. C. Utilizing rxnorm to support practical computing applications: capturing medication history in live electronic health records. Journal of biomedical informatics. 2012;45(4):634–641. doi: 10.1016/j.jbi.2012.02.011. [DOI] [PubMed] [Google Scholar]
  • [17].Hong Y, Zeng M. L. International classification of diseases (icd) KO Knowledge Organization. 2023;49(7):496–528. [Google Scholar]
  • [18].ICD. Available online. https://www.who.int/standards/classifications/classification-of-diseases (accessed on 15 March 2024)
  • [19].SNOMED-CT. Available online. https://www.snomed.org/ (accessed on 15 March 2024)
  • [20].Nelson S. J, Zeng K, Kilbourne J, Powell T, Moore R. Normalized names for clinical drugs: Rxnorm at 6 years. Journal of the American Medical Informatics Association. 2011;18(4):441–448. doi: 10.1136/amiajnl-2011-000116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].RxNorm. Available online. https://www.nlm.nih.gov/research/umls/rxnorm/index.html (accessed on 15 March 2024)
  • [22].ATHENA. Available online. https://athena.ohdsi.org/search-terms/start (accessed on 15 March 2024)
  • [23].Reps J. M, Schuemie M. J, Suchard M. A, Ryan P. B, Rijnbeek P. R. Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data. Journal of the American Medical Informatics Association. 2018;25(8):969–975. doi: 10.1093/jamia/ocy032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Rijnbeek P, Reps J. Chapter 13 patient-level prediction. The Book of OHDSI. 2021 [Google Scholar]
  • [25].Danese M. D, Halperin M, Duryea J, Duryea R. The generalized data model for clinical research. BMC medical informatics and decision making. 2019;19:1–13. doi: 10.1186/s12911-019-0837-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Kent S, Burn E, Dawoud D, Jonsson P, Østby J. T, Hughes N, Rijnbeek P, Bouvy J. C. Common problems, common data model solutions: evidence generation for health technology assessment. Pharmacoeconomics. 2021;39(3):275–285. doi: 10.1007/s40273-020-00981-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Lu B. Health query expansion using wordnet and umls. 2015 [Google Scholar]
  • [28].Yunzhi C, Huijuan L, Shapiro L, Travillian R. S, Lanjuan L. An approach to semantic query expansion system based on hepatitis ontology. Journal of Biological Research-Thessaloniki. 2016;23:11–22. doi: 10.1186/s40709-016-0044-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Kumar K. S, Deepa K. Medical query expansion using umls. Indian Journal of Science and Technology. 2016 [Google Scholar]
  • [30].Malik S, Shoaib U, El-Sayed H, Khan M. A. 2020 14th International Conference on Innovations in Information Technology (IIT) IEEE; 2020. Query expansion framework leveraging clinical diagnosis information ontology; pp. p. 18–23. [Google Scholar]
  • [31].Steinman M. A, Chren M.-M, Landefeld C. S. What’s in a name? use of brand versus generic drug names in united states outpatient practice. Journal of general internal medicine. 2007;22:645–648. doi: 10.1007/s11606-006-0074-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Ross J. S, Rohde S, Sangaralingham L, Brito J. P, Choi L, Dutcher S. K, Graham D. J, Jenkins M. R, Lipska K. J, Mendoza M, et al. Generic and brand-name thyroid hormone drug use among commercially insured and medicare beneficiaries, 2007 through 2016. The Journal of Clinical Endocrinology & Metabolism. 2019;104(6):2305–2314. doi: 10.1210/jc.2018-02197. [DOI] [PubMed] [Google Scholar]
  • [33].Schellekens H. National drug codes. the national drug. Nature. 201:4. [Google Scholar]
  • [34].Huang Y, Li X, Zhang G.-Q. Elii: A novel inverted index for fast temporal query, with application to a large covid-19 ehr dataset. Journal of Biomedical Informatics. 2021;117:103744. doi: 10.1016/j.jbi.2021.103744. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Mongodb. https://www.mongodb.com/ (Online; accessed March, 2024)
  • [36].P´erez C. A, Zhang G.-Q, Li X, Huang Y, Lincoln J. A, Samudralwar R. D, Gupta R. K, Lindsey J. W. Covid-19 severity and outcome in multiple sclerosis: Results of a national, registry-based, matched cohort study. Multiple Sclerosis and Related Disorders. 2021;55:103217. doi: 10.1016/j.msard.2021.103217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Betaseron. Available online. https://www.betaseron.com/resources/frequently-asked-questions (accessed on 15 March 2024)
  • [38].OHDSI ATLAS. Available online. https://atlas-demo.ohdsi.org/#/home/ (accessed on 15 March 2024)
  • [39].Bhattacharjee T, Kiwuwa-Muyingo S, Kanjala C, Maoyi M. L, Amadi D, Ochola M, Kadengye D, Gregory A, Kiragga A, Taylor A, et al. Inspire datahub: a pan-african integrated suite of services for harmonising longitudinal population health data using ohdsi tools. Frontiers in Digital Health. 2024;6:1329630. doi: 10.3389/fdgth.2024.1329630. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES