Skip to main content
AMIA Annual Symposium Proceedings logoLink to AMIA Annual Symposium Proceedings
. 2017 Feb 10;2016:715–723.

Feasibility of Representing Data from Published Nursing Research Using the OMOP Common Data Model

Hyeoneui Kim 1, Jeeyae Choi 2, Imho Jang 3, Jimmy Quach 4, Lucila Ohno-Machado 1
PMCID: PMC5333244  PMID: 28269868

Abstract

We explored the feasibility of representing nursing research data with the Observational Medical Outcomes Partners (OMOP) Common Data Model (CDM) to understand the challenges and opportunities in representing various types of health data not limited to diseases and drug treatments. We collected 1,431 unique data items from 256 nursing articles and mapped them to the OMOP CDM. A deeper level of mapping was explored by simulating 10 data search use cases. Although the majority of the data could be represented in the OMOP CDM, potential information loss was identified in contents related to patient reported outcomes, socio-economic information, and locally developed nursing intervention protocols. These areas will be further investigated in a follow up study. We will use lessons learned in this study to inform the metadata development efforts for data discovery.

Introduction and Background

Wide adoption of Electronic Health Record (EHR) systems and the development of secure cloud-based infrastructure to store large amounts of clinical and health research data have been opening new opportunities for data-driven discovery and validation of knowledge. Observational and pragmatic studies are gaining traction as alternatives to Randomized Controlled Trials (RCT) [13]. Although the highest quality of clinical evidence is still produced via RCTs [4] it is infeasible to perform an RCT for every clinical problem of interest for every possible patient situation. Pharmaco-surveillance [5,6], various comparative effectiveness studies [7,8] and cohort discoveries [9,10] are a few example areas that have benefited by the increasing availability of health data. However, these data also pose non-trivial challenges such as the handling of missing and/or noisy data and the integration of data generated from various sources, each with its own idiosyncratic representation. The Big Data to Knowledge (BD2K) initiative by NIH aims to address such challenges and to promote the utilization a large amount of biomedical data (i.e., “big data”) by improving discoverability, accessibility, interoperatbility, and reusability of data and advanced data analytics [11,12].

bioCADDIE – indexing for big data discovery

bioCADDIE (biomedical and healthCAre Data Discovery Index Ecosystem) is a BD2K consortium dedicated to establishing a user friendly and robust means to discover and index biomedical data [13,14]. At its core are the bioCADDIE metadata, a minimum set of information about a dataset that needs to be made available to data seekers to facilitate search for data sets. The bioCADDIE metadata specification was developed by incorporating existing major metadata schemas in the biomedical domain (i.e., top-down) and high-priority dataset search use cases identified by the bioCADDIE leadership group and the user community. The version 1 metadata specification was released in August of 2015 and is currently under revision based on the feedback obtained from the bioCADDIE community [15]. Some of the referenced existing metadata and use cases reflect clinical and/or healthcare domains to a certain extent (mostly diagnoses, laboratory tests, and medications). However, overall, clinical and healthcare research domains are relatively under-represented in bioCADDIE metadata when compared to omics research domains. As a first step to augmenting the clinical and healthcare domains, the bioCADDIE team has started exploring the interoperability between the bioCADDIE metadata and the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), a widely accepted, standardized model for healthcare data [16].

OMOP CDM – a common data model for healthcare research data

OMOP was formed as a partnership between public and private sectors to establish the safe and effective use of observational healthcare data to study the effectiveness of medical products [17]. Observational Health Data Sciences and Informatics (OHDSI) is an international collaboration that aims to promote data reusability of the observational healthcare data produced from various venues by achieving data interoperability among them [18]. The OMOP CDM is adopted and now continuously revised by OHDSI [19]. Patient Centered Outcomes Research network (PCORnet), a national network funded by PCORI, is another major effort that aims at utilizing large scale clinical and healthcare data research networks for patient-centered outcomes research. A CDM plays a critical role in achieving data interoperability [2022]. PCORnet CDM [23] and OMOP have recently been cross mapped by the OHDSI research team [24].

Gaps in representing nursing data for big data science

Many studies explored the use of the OMOP CDM to conduct large scale drug surveillance studies as well as to integrate clinical data repositories [22,2529]. However, few studies reported the experience on standardizing and integrating data generated from clinical or healthcare research with OMOP CDM, except for studies that involved drugs or medical devices.

The recent Precision Medicine Initiatives (PMI) calls for collecting and utilizing a full spectrum of biomedical data to improve patient care: from microbiology to physical exams, laboratory tests, behavioral and environmental exposures [30]. In addition, the National Academy of Medicine (NAM), formerly Institutes of Medicine (IOM), recommended the incorporation of social and behavioral domains of data in EHRs in such a way that would support the reuse of those data in patient care and research [31]. Therefore, integrating different types of data in a way that supports secondary analyses has become critical. Nursing research targets a wide range of nursing interventions and problems, which often may not involve specific drugs, medical devices, or medical diagnoses. It often includes topics such as patients’ functional status, education, satisfaction toward the provided care and quality of life. This might indicate that nursing and nursing research data present a unique opportunity to understand the challenges and special considerations in integrating data generated in variety of health domains. However, few studies investigated the feasibility of representing various types of nursing and nursing research data with a CDM except for a very specific nursing domain such as pressure ulcer risk assessment and prevention [32,33].

The purpose of this study was to test the feasibility of representing nursing research data with the OMOP CDM using data reported in published articles and subsequently indexing them with the bioCADDIE tools. Specifically, we evaluated (1) the content coverage of the OMOP CDM (version 5) toward the data content reported in 256 published nursing research articles and (2) the feasibility of conducting data searches using the use cases extracted from the 10 randomly selected research articles.

Materials and Methods

Data preparation

In a prior separate study, we reviewed 192 articles on clinical trials published in nursing journals between 2006-2015. From this review, we annotated the articles with a list of metadata such as study types, target research problems, key interventions, comparison groups, sample demographics, data items used/generated, data sources, funding supports, data access information, and locations and settings of the studies. To augment this collection with observational studies we conducted additional literature review. Using PubMed and its structured search filters, we retrieved articles classified as Observational Studies published in Nursing Journals for the past 3 years (i.e., 2013 ~ February of 2016). This search yielded 203 articles. We randomly selected about half of the retrieved articles (N=98). We then read the full text articles and annotated each article with the same set of metadata.

During the full text review we excluded 34 articles for which full text papers were not easily obtainable (N=3) or the articles that did not investigate patient problems from nursing perspectives thus consequently no patient outcomes were reported (N=31). For example, behavioral issues among nursing students [34], assessment scale development [35], observing hand washing behavior among nurses without reporting associated patient safety outcomes [36] and nursing management topics such as nursing shifts and workload estimation [37,38] were excluded from this review. With this additional review, we generated metadata for 64 articles on nursing observational studies. Therefore, combining the new articles with the previously tagged ones, we included a total of 256 articles to this study. The article selection process is summarized in Figure 1. We also classified the data items reported in 256 articles based on the modes of data collection – i.e., routinely collected clinical data, Patient Reported Outcomes (PRO), and data collected through mobile devices.

Figure 1.

Figure 1.

Article selection process.

Mapping to the OMOP CDM

A total of 2,438 data items were extracted from the 256 articles. After removing duplicate entries we classified 1,431 unique data items into concept classes of the OMOP CDM to establish a high (i.e., general) level of mapping. This mapping was largely guided by the definition and usage description provided for each concept class in OMOP CDM. However, we also checked the attributes of concept classes to ensure a given data item had a relevant attribute that could be used to hold the information about the data item. For example, one might expect that various socio-economic, and demographic items such as marital status and education level would belong to the PERSON class as done in many healthcare databases. However OMOP’s PERSON class does not provide a place to hold those items as it was designed to hold permanent information about a person such as birth year and race. Two of the authors (HK, JC) collaboratively conducted this high level mapping. We also aimed at identifying data items that did not fit into any of the OMOP CDM classes during this high level mapping activity.

Next we conducted a small scale, deeper level mapping and investigated to what extent the semantics of the content items presented in the selected articles were represented in the OMOP CDM. We fully utilized the attributes and the constraints associated with the concept classes for this mapping. We first created 10 data search use cases from 10 articles randomly selected from the 256 articles pool by asking ourselves “how do you query a clinical data repository to replicate the study reported in this article?” These use cases incorporated the eligibility criteria of the studies and thus provided an opportunity to evaluate additional content items not reflected in the data items at the deeper level (i.e., value level). Next we simulated the dataset search use cases by converting them into a data query form similar to the SQL code style (i.e., pseudo SQL codes). We used the mapping established between the OMOP CDM and the data items as a database schema of a hypothetical data repository. A use case and its data query form are presented as an example in Figure 2.

Figure 2.

Figure 2

Data search use case and data query form translation

To simplify the conversion we omitted certain details and employed a few of our own conventions: we did not specify the primary key and foreign key relations to simplify the conversion. We used standardized concept id attributes (e.g., gender_concept_id, observation_concept_id, etc.) to represent data item names and non-numeric data item values. Instead of specifying mapped standardized concept codes, we put data item or value names in brackets to imply the values are the concept codes of the concept names presented in the brackets. For example, the code observation_concept_id [] = [marital status] in Figure 2 implies that the observation_concept_id attribute takes the concept code of an OMOP-recognized standardized vocabulary of choice, for instance “125680007” when the Systematized Nomenclature of Medicine- Clinical Term (SNOMED-CT) is used to encode values.

Results

The majority of the data items identified from the 256 articles were mapped to the concept classes in the OMOP CDM, except for 16 data items. A few examples of unmapped items were the socio-economic and/or demographic information of health care providers (e.g., “number of years working as a nurse”) and the items that potentially embedded multiple content items, each of which would need to have been mapped to different concept classes. For example, “operating room nurses’ task list” and “advice provided by healthcare provider” might need to be mapped to different classes depending on what values they take.

MEASUREMENT was the concept class that most data items were related to, followed by OBSERVATION. The data items mapped to each concept class in the OMOP CDM is summarized in examples in Table 1.

Table 1.

Number of data items mapped to each concept class

Mapped OMOP CDM Class Count (N=1,431) Examples
Measurement 529 Charlson comorbidity index score, Ejection fraction, Vital signs
Observation 319 Treatment history, Smoking status, Marital status, Catheter leakage
Condition_Occurrence 223 Problem list, Admitting diagnoses
Procedure_Occurrence 109 Renal transplant, Chemotherapy, Detoxification protocol, Relaxation
Person 94 Age, Race, Gender, Ethnicity
Drug_Exposure 48 Opioid dose, Type of opioid used, Pain medication
Device_Exposure 25 Mechanical ventilator use, Traction device type, CPAP use
Visit_Occurrence 27 Admission date, Discharge date
Care_Site 15 Type of hospital, Admitting service
Provider 3 APN specialty, Service provider type
Payer_Plan_period 6 Payer, Health insurance type, Insurance status
Visit_Cost 5 Average cost per patient per patient day, Average cost per patient
Drug_Era 4 Period of hormone therapy, Period of chemotherapy
Death 4 Hospital death, Death
Procedure_Cost 3 Wound dressing cost, Overhead
UNMAPPED 17 Influence of ward condition, Counselor’s age, Hygiene compliance

PRO items collected using a questionnaire (both standardized or custom developed) were reported in 82 articles and data generated through mobile devices were reported in 2 articles.

Our attempt to simulate data search from the use cases was not completely successful in all 10 cases. We observed potential information loss in 2 cases where (1) the study outcomes were assessed by comparing dates and times of a surgical procedure and (2) a device study that required dates and times of certain clinical findings. The PROCEDURE and DEVICE classes of the OMOP CDM stored only date information, thus precise comparison was not deemed possible. The metadata annotated for the 256 articles, the full results of CDM mapping, and data search use case simulations are available to download from https://idash-data.ucsd.edu/download/folder/4825/AMIA2016.zip.

Discussion

We attempted to represent the data contents reported in 256 nursing articles on health studies using the OMOP CDM version 5. Our goal was to start understanding the scope of the representation supported by the OMOP CDM, a major information model standard, with regards to content areas not related to conventional comparative effectiveness research, using the data from nursing research. It is promising that the majority of the clinical contents used in nursing research were well represented in the OMOP CDM, especially through the Standardized Clinical Data domain. However, it is our conclusion that the OBSERVATION and the MEASUREMENT classes had to be somewhat abused in this mapping as a place to hold many non-clinical and/or unusual clinical findings and assessment items.

Representing socio-economic and demographic information of patients, family members, and healthcare providers

PERSON and PROVIDER are two classes of the OMOP CDM that hold information on human subjects in the healthcare domain. The former is dedicated to describing patients, and the latter describes healthcare providers. Family dynamics is expected to have a significant impact on patient outcomes therefore it is an important research topic in nursing. Many articles reviewed in our study reported information about family members, especially those who play the role of a main care taker. We mapped those items into the OBSERVATION class, although this class was designed to capture the clinical facts not represented in other concept classes in the OMOP CDM. Similarly, many socio-economic items (e.g., marital status, education level, employment status, etc.) about patients had to be mapped to the OBSERVATION class since attributes in the PERSON class are limited to key “unchangeable” demographics such as age, gender, race, and ethnicity. Similar challenges were observed with the socio-demographic information about healthcare provider. We did not map these items to OBSERVATION, as the latter is to capture “any clinical facts about a patient obtained in the context of examination, questioning, or a procedure”[17]. These additional socio-demographic items on healthcare providers seem to be potential areas for expansion to fully support observational studies in nursing.

Representing patient reported outcomes

A third of the articles reviewed in this study reported assessment of patient status using standardized and/or custom-built assessment scales and survey questionnaires. Some of these assessment tools are completed by patients and some are by healthcare providers. Differentiating the source of information is important as it provides additional context to consider when studying patient outcomes. Of note, the CONDITION table of the PCORnet CDM captures this information through the condition_source attribute, whose structured value set contains patient reported medical history. It might be possible to infer the source of information for the outcomes measured with standardized scales that are widely used and well documented, although this practice still increases the risk for information loss by requiring additional steps for obtaining and associating the information. However, the source of information for outcomes measured with custom-built survey questionnaire is still highly likely be lost.

Representing nursing specific intervention concepts

Many articles we reviewed reported the effectiveness of specific nursing interventions, which often include new care protocols and complementary therapies. As recognized by the leaders in nursing informatics as a high priority task in nursing informatics, having nursing specific data sufficiently represented in “big data” sets is important to promote the incorporation of nursing’s holistic approaches to patient care into data-driven knowledge generation [39,40]. Despite the relative small scale of concept mapping for nursing interventions pursued in this study, we found that the majority of complementary therapies are covered by SNOMED-CT. However, locally developed specific care protocols are challenging to represent, since they are built from a collection of nursing actions. Consequently, a care protocol name used to describe an intervention concept might be considered less informative than the listing of nursing actions. Although it seems somewhat convoluted, this challenge can be addressed by establishing a member relation between a set of nursing interventions and a specific care protocol through the FACT_RELATIONSHIP table. We will continue investigating this issue in future studies.

Nursing problems cover wide areas of health domains: from disease-oriented matters to emotional wellbeing. Nursing research thus also includes concepts there are not frequently used in other clinical studies. However, we found that the OMOP CDM provides sufficient representation capability, even for unusual nursing problem concepts through less restrictive concept classes like OBSERVATION and MEASUREMENT, along with CONDITION_OCCURRENCE.

Limitations and future directions

This study used data items reported in a small sample of nursing research articles. Therefore, its findings may not be generalized to the entire content domains of nursing research data. In addition, the OMOP CDM strives for a high level of standardization and hence all key concepts in the CDM classes are required to be encoded with a standardized vocabulary. This means that the class and attribute levels of association does not always result in successful representation of nursing research concepts, as information loss can occur due to the lack of a standardized concept code. Using the OMOP CDM as a hypothetical database model to simulate data search use cases is another limitation. Information loss can happen during the transformation of a local database model to the OMOP CDM, and our simulation approach might not have been robust enough to capture this type of loss.

Recognizing these limitations, along with potential over-use of certain concept classes as previously described, we plan to continue this work by expanding the scope and revisiting identifiably challenging cases. We also plan to seek collaboration with OHDSI to discuss and substantiate lessons learned in this study.

Conclusion

Nursing research targets wide ranges of topic areas that are not limited to diseases or medical treatments. Therefore, investigating the standardized representation of the data contents in published nursing research provides a unique opportunity to understand challenges and opportunities in representing various types of health data so that they can be found and reused. In this study, we explored the feasibility of representing nursing research data with the OMOP CDM, using the data items reported in 256 nursing articles. The OMOP CDM provided a good representation for the majority of data items but we also observed potential gaps that might lead to information loss. These gaps will be further investigated in a follow-up study. Lessons learned in this study will help inform metadata development efforts for the Data Discovery Index.

Acknowledgment

This project was supported in part by the grant 1U24AI117966 (NIH/BD2K).

References


Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES