Mapping Local Biospecimen Records to the OMOP Common Data Model

Chelsea L Michael; Evan T Sholle; Regina T Wulff; Gail J Roboz; Thomas R Campion, Jr

. 2020 May 30;2020:422–429.

Mapping Local Biospecimen Records to the OMOP Common Data Model

Chelsea L Michael ^1,², Evan T Sholle ³, Regina T Wulff ⁴, Gail J Roboz ⁴, Thomas R Campion Jr ^2,^3,^5,⁶

PMCID: PMC7233045 PMID: 32477663

Abstract

Research to support precision medicine for leukemia patients requires integration of biospecimen and clinical data. The Observational Medical Outcomes Partnership common data model (OMOP CDM) and its Specimen table presents a potential solution. Although researchers have described progress and challenges in mapping electronic health record (EHR) data to populate the OMOP CDM, to our knowledge no studies have described populating the OMOP CDM with biospecimen data. Using biobank data from our institution, we mapped 26% of biospecimen records to the OMOP Specimen table. Records failed mapping due to local codes for time point that were incompatible with the OMOP reference terminology. We recommend expanding allowable codes to encompass research data, adding foreign keys to leverage additional OMOP tables with data from other sources or to store additional specimen details, and considering a new table to represent processed samples and inventory.

Introduction

Leukemia poses a substantial burden to the US population with 61,780 new diagnoses and 22,840 deaths projected for 2019.(1) About 1.5% of US men and women will have a leukemia diagnosis in their lifetime.(2) Within leukemia sub-types, acute myeloid leukemia (AML) will account for 21,450 new diagnoses and 10,920 deaths in 2019 projections.(1) Some AML patients will first develop myelodysplastic syndrome (MDS) before progressing to AML.(3) While precision medicine has dramatically improved outcomes for other leukemia sub-types, such as Gleevec for Philadelphia-positive chronic myeloid leukemia, the current AML standard of care includes only conventional treatments.(4) Targeted therapies for AML are under investigation,(4) but more research is needed to better understand the biology and molecular etiology of the disease.

To develop targeted therapies and to identify patients most likely to benefit from them, researchers must understand genetic variants, their associated pathways, and their role in oncogenesis. Such research depends in part on biospecimens collected from patients at various disease points. However, biospecimens alone are not sufficient to drive innovation in translational research. To best utilize samples, researchers need data about sample characteristics and inventory, but they also require increasingly rich, detailed clinical data about the patients whose samples are available.(5)

One potential approach to address the need for detailed clinical context for each biospecimen is to integrate biospecimen data with electronic health record (EHR) data. To promote EHR-data-driven research, a wide range of academic medical centers and pharmaceutical companies have adopted the Observational Medical Outcomes Partnership common data model (OMOP CDM). Maintained today by the Observational Health Data Sciences and Informatics (OHDSI) consortium, the OMOP CDM provides a common modality for mapping clinical data from both transactional/claims-derived data sets and EHRs to a universal, standardized format consisting of a relational data model with distinct tables for representing patients, visits, conditions, procedures, measurements, and specimens, among other domains.(6) The OMOP CDM also requires each site to map its terminologies (whether local or standardized) to standardized clinical reference terminologies – for example, sites that store diagnostic data using ICD-9/10 codes must map these diagnoses to the “standard” SNOMED-CT codes.(6) Several studies have reported successes and failures in mapping and loading drugs, conditions, procedures, and other clinical data from EHR and claims databases to the OMOP CDM.(7-12) However, to the best of our knowledge, no studies have described mapping biospecimen data to the OMOP CDM.

The goal of this study was to evaluate the feasibility of transforming data from a biospecimen database at our institution to the OMOP CDM to facilitate integration with EHR data to support AML and MDS research.

Methods

Setting

Weill Cornell Medicine (WCM) Physician Organization is a multi-specialty quarternary care center with 20 sites in New York City and 2 million patients. WCM’s 900 physicians have admitting privileges at its affiliated hospital, NewYork-Presbyterian Hospital (NYP). For each patient, WCM and NYP clinical systems share a unique medical record number (MRN). The WCM-NYP Leukemia Program is a field leader, treating a high volume of patients with a broad variety of hematologic malignancies and disorders, including AML and MDS. The Leukemia Program also conducts related basic, translational, and clinical research and coordinates studies with multiple participating institutions.

Leukemia Program Research Data Management

To support its robust research portfolio, the Leukemia Program maintains a central biobank with an associated Leukemia Program Database (LPDB).(13) When a biospecimen arrives to the biobank, research staff enter specimen- level data into the LPBD, including patient MRN, patient diagnosis, study protocol number, study time point, specimen source, specimen preservative, specimen collection time, specimen collection date, total specimen volume (in mL), and other data. Like many research databases, the LPDB uses local codes (e.g., “AML” and “MDS”) rather than reference terminologies.

The available data varies with specimen; for example, samples collected from external institutions do not have a corresponding WCM MRN. The most common Diagnosis field codes are “AML” and “MDS,” but other diseases (both malignant and non-malignant) are also available. Time Point has codes representing when a patient donated a sample, either in the context of the patient’s disease course (e.g., “Newly diagnosed” or “Remission”) or the study calendar for the research protocol under which they donated the specimen (e.g, “Day 100”). Study calendar time points do not represent the individual’s disease status, but rather where the patient falls within a specific research protocol. On “Day 100” of a research protocol, some patients may still have malignant disease, while others may be in remission. Time point data may also be unavailable or unreliable. The most common Specimen Source codes are “Peripheral blood” and “Bone marrow” and the most common Specimen Preservative codes for blood samples are “EDTA” and “Sodium heparin.” For controls, studies may collect samples from healthy individuals and/or collect healthy tissue samples from patients. The biobank staff also records data about samples after processing (e.g., number of aliquots and storage location) in the LPDB. Researchers seeking samples with specific characteristics, e.g. “all remission samples for female patients aged 60-85 with AML who have never received a bone marrow transplant, have an IDH- 2 mutation, and have been treated with decitabine,” must identify these samples based on clinical data points captured in other research data systems, as well as the EHR, which complicates the process of identifying samples for secondary analysis.

OMOP CDM Implementation at WCM

WCM previously has implemented the OMOP CDM to support secondary use of EHR data.(14) OMOP includes the CDM Specification, Standardized Vocabularies and various tools to support OMOP implementers and users.(6) The CDM Specification includes table definitions, such as the Specimen table described in Table 1. OMOP uses Standardized Vocabularies for consistent data representation across implementations. Implementers use mappings to translate local codes or non-OMOP-preferred terminologies to the Standardized Vocabulary codes, called Concept IDs, permitted for each field. The Concept IDs in each table serve as foreign keys to the OMOP Standardized Vocabularies tables. OMOP primarily uses SNOMED-CT as a Standard Vocabulary for concepts in the Specimen table. When a source code is unable to map to a standard, implementers use Concept ID = 0 to indicate no appropriate standard concept is available. Tables also include Source ID and Source Value fields to store raw (untransformed) data as they appear in the originating data source. When various systems or fields populate the same OMOP table fields, WCM has a local convention to use JSON annotations, including field labels for provenance, in the relevant Source Value fields to facilitate future use. The additional annotation necessitates increasing the character limits on these optional fields. Finally, OMOP tools include Athena, which allows users to search and browse standard concept by OMOP field and table, and by reference terminology, hierarchies and codes.

Table 1.

OMOP SPECIMEN table description, reproduced from the OMOP CDM v6.0 Specifications.(15)

Field	Required	Type	Description
specimen_id	Yes	integer	A unique identifier for each specimen.
person_id	Yes	integer	A foreign key identifier to the Person for whom the Specimen is recorded.
specimen_concept_id	Yes	integer	A foreign key referring to a Standard Concept identifier in the Standardized Vocabularies for the Specimen.
specimen_type_concept_id	Yes	integer	A foreign key referring to the Concept identifier in the Standardized Vocabularies reflecting the system of record from which the Specimen wasrepresented in the source data.
specimen_date	No	date	The date the specimen was obtained from the Person.
specimen_datetime	Yes	datetime	The date and time on the date when the Specimen was obtained from the person.
quantity	No	float	The amount of specimen collection from the person during the sampling procedure.
unit_concept_id	No	integer	A foreign key to a Standard Concept identifier for the Unit associated with the numeric quantity of the Specimen collection.
anatomic_site_concept_id	Yes	integer	A foreign key to a Standard Concept identifier for the anatomic location of specimen collection.
disease_status_concept_id	Yes	integer	A foreign key to a Standard Concept identifier for the Disease Status of specimen collection.
specimen_source_id	No	varchar(50)	The Specimen identifier as it appears in the source data.
specimen_source_value	No	varchar(50)	The Specimen value as it appears in the source data. This value is mapped to a Standard Concept in the Standardized Vocabularies andthe original code is, stored here for reference.
unit_source_value	No	varchar(50)	The information about the Unit as detailed in the source.
anatomic_site_source_value	No	varchar(50)	The information about the anatomic site as detailed in the source.
disease_status_source_value	No	varchar(50)	The information about the disease status as detailed in the source.

Open in a new tab

Data Collection

We collected data from the LPDB on the most high-volume specimens, specifically those with an AML or MDS diagnosis sourced from a bone marrow or peripheral blood specimen that included a local WCM MRN. For peripheral blood specimen sources, we only selected records with EDTA or sodium heparin specimen collection preservatives. We selected the fields corresponding to these criteria, as well as fields for specimen ID, total volume and time point.

Mapping to OMOP CDM

We used Athena to determine standard concepts for the leukemia data. When one-to-one field mappings were inadequate, we considered many-to-one options, as well as drew on biobank knowledge (both general biobanking experience and knowledge of LPDB) to select the best data representation.

Evaluation

Based on the mapping rules we identified, we performed an extract, transform, and load (ETL) of the LPDB data into the OMOP Specimen table using structured query language (SQL) implemented in Microsoft SQL Server 2012. We analyzed how well the Specimen table represented the LPDB source data. Specifically, we measured the percentage of LPDB codes and records that mapped to the Specimen table; one-to-one compared to many-to-one mapping and one-to-many mappings; and where mapping lost detail or was not feasible.

Results

Mapping

When necessary, we combined multiple LPDB fields to map to the best OMOP Concept ID (Figure 1). For example, mapping only the LPDB Specimen Type field would have mapped to a more general blood specimen concept, such as “Peripheral blood specimen” (Concept ID = 4047495)(16) or “Blood specimen” (Concept ID = 4001225),(17) whereas a combination of LPDB Specimen Type and LPDB Specimen Collection Preservative allowed us to map to OMOP concepts of “Blood specimen with EDTA” (Concept ID = 40482922)(18) and “Blood specimen submitted in heparinized collection tube” (Concept ID = 40486989)(19) for the Specimen Concept ID field. In contrast, while we had specimen and preservative available for bone marrow, OMOP concepts were less granular and did not differentiate between preservatives. We simply mapped the LPDB Specimen Type to “Bone marrow specimen” (Concept ID = 4000623)(20) whether it was collected in tubes with EDTA, sodium heparin or another preservative. For other fields, we drew on domain-specific knowledge to identify the best concepts for mapping. For example, the LPDB data did not record an anatomic site (a required OMOP Specimen field) for blood draws, but “Peripheral blood” specimen types would be obtained through venipuncture. As OMOP had Procedure tables and SNOMED-CT codified a variety of relationship types, we used Athena to identify “Venous structure” (Concept ID = 4104340)(21) as the direct procedure site for “Venipuncture for blood test”, and chose that as the most appropriate OMOP Anatomic Site Concept ID. We used a similar approach to map bone marrow samples to “Bone marrow structure” (Concept ID = 4029619).(22)

OMOP Disease Status Concept ID was difficult to map based on the available LPDB data. The OMOP Disease Status Concept ID field designated the disease status of the individual sample, with three allowed concepts: “Normal” (Concept ID = 4069590),(23) “Abnormal” (Concept ID = 4135493)(24) and “Malignant” (Concept ID = 4066212).(25) The data set included only AML and MDS samples, which could have malignant or normal blood and bone marrow, depending on disease status at collection. We examined LPDB Time Points and mapped common values which indicated clear disease status that aligned with OMOP allowed disease status concepts. We did not map other disease status time points (e.g., “CRi," which is the local code for complete remission with incomplete blood count recovery, and “Remission”) due to relatively low frequency in the database. We could not map samples with study calendar time points (e.g., “Screen,” “Day 100”) or blank time points. To facilitate later expansion to include additional codes, we used LPDB Specimen Source, Diagnosis and Time Point to map to Disease Status Concept ID (Figure 1), but within selected data set, only Time Point was essential.

We also needed to combine separate LPDB Date and Time fields for the required Specimen Datetime field. However, we mapped LPDB Date to Specimen Date and LPDB Volume to Quantity without manipulation. We generated Specimen ID using auto-number and we joined the LPDB data on MRN to the OMOP Person table to obtain the previously-generated Person ID for the patient who donated each sample.

Two required OMOP Specimen fields, Specimen Type Concept ID and Unit Concept ID, did not correspond to LPDB fields. Specimen Type Concept ID only had one allowed concept to indicate the data source was an “EHR Detail” (Concept ID = 581378), which was an OMOP-generated concept, rather than drawn from reference terminologies. The “EHR Detail” concept did not accurately represent our research data. Therefore, we mapped to Concept ID = 0. Unit Concept ID coded the unit for the sample quantity (i.e., sample volume for mapped LPDB records). Based on knowledge of the data set, we inferred the unit for the LPDB blood and bone marrow sample volume was mL.

The last five fields are optional Source ID and Source Value fields. We did not manipulate LPDB data to populate these fields. However, we followed the local WCM JSON convention (described above) when source data came from more than one LPDB field.

Population of OMOP CDM

Of the 5,453 AML and MDS records with local MRNs in the data set, we mapped 1,397 (26%) to the OMOP CDM. The remaining records (n = 4,056) did not map due to the required Disease Status field. Of 24 time point codes in the LPBD data set, we mapped only 5 (20.8%)— “Complete response,” “Relapse,” “Newly diagnosed,” “Baseline,” and “Residual disease”— to allowed Disease Status Concepts; the other 19 codes were study calendar time points, blank or codes with low frequency in the data set. For one field, Specimen Type Concept ID, we had to map to Concept ID = 0 to indicate no appropriate Concept ID was available.

We populated all 15 OMOP fields (required and optional) for each record. Of the 15 fields, we mapped LPDB data to populate 13 fields and used knowledge of the LPDB to populate two fields (Specimen Type Concept ID and Unit Concept ID). We hard coded mappings for all standard Concept ID fields (three from LPDB data, two from inferred data). Of the 15 fields, we used many-to-one mappings for three fields (Specimen Concept ID, Disease Status Concept ID, Specimen Datetime), and we did not manipulate the source data for two fields (Specimen Date and Quantity) in addition to the Source Value and Source ID fields, which are intended to store unmanipulated data. We used auto- number for the primary key field and used a join to the corresponding OMOP table in the ETL to populate one foreign key field.

Within the LPDB data, we used one field, Specimen Source, in mappings to three Concept ID fields (Specimen Concept ID, Anatomic Site Concept ID and Disease Status Concept ID). We also used Specimen Date for two fields (Specimen Date and Specimen Datetime). While OMOP captured specimen preservative for blood samples, it did not for bone marrow. We also lost time point detail in mapping to only three available OMOP disease statuses.

Discussion

Using a biospecimen database from one laboratory at an academic medical center, we successfully mapped 26% of source records to populate the OMOP Specimen table with an ETL. All remaining records did not map due to challenges related to the disease status. One field did not have any allowed concepts appropriate for the data. Overall, the mapping was not clean and straightforward; we combined source data fields (sometimes only for a sub-set of records) for some OMOP fields, as well as drew on the same source field to populate more than one OMOP field. The populated OMOP table was 100% complete, with data for all fields for every record.

While others have described OMOP implementations using EHR and claims data, to our knowledge this is the first study to describe population of the Specimen table with local biospecimen data. Our findings suggest areas where the Specimen table could be improved. Additional Concepts may be useful; for example, Specimen Type Concept ID only allows “EHR detail”, an OMOP-generated concept. However, other OMOP tables have “Research administration” (Concept ID = 44803476)(26) as an option, which would better fit our use case. Within the research and leukemia realms, additional disease statuses may be relevant (e.g., differentiating between samples at diagnosis, relapse and residual disease, rather than grouping them as “malignant”), though OHDSI must also balance specific group needs with general requirements.

Alternately, at the field level, the Specimen table could add a Visit Occurrence ID foreign key to the join to the Visit Occurrence table; research specimens are often collected with clinical samples which may have results available to supply additional nuanced data (e.g., residual disease vs relapse). Without that field, we may be able to join to the Visit Occurrence table using Person ID and Specimen Date or Specimen Date Time to leverage linked results to map additional records (i.e., those with calendar time points) or when additional details are needed for a project query. Ideally, we would also ETL data from other WCM biobank databases into the WCM OMOP Specimen table, which would introduce a need to identify the biobank laboratory housing each respective specimen. We could address this via data provenance (data from LPDB is about samples managed by the LPDB biobank team) and use JSON to annotate source values; alternately, each biobank could be conceptualized as a physically distinct Care Site in the corresponding OMOP table, and the Specimen table could include a Care Site ID foreign key.

Finally, the Specimen table captures specimens collected from the patient, not what is physically stored after processing (e.g., aliquots). Another table would be needed to adequately represent both clinical sample (slides and formalin-fixed paraffin-embedded tissue blocks (FFPE)) and research aliquots, as well as additional derivatives (slides cut from FFPE, DNA extractions) and their status (e.g., exhausted, sent for additional processing).

While common data models have many strengths including standardizing data to facilitate research at an institution or multi-institution level, our results demonstrate implementation challenges for research data, specifically biospecimens. Biospecimen data may be incomplete or local codes may not align well with the common data model vocabularies, as in the case of study calendar time points. Some fields could be easily manipulated (e.g. combining date and time fields and applying the appropriate format), while standard concepts required hard coding in the ETL (e.g., Specimen Concept ID, which combined two fields for blood samples).

We examined unmapped records to explore opportunities to expand the ETL. Within the selected data set, all remaining records failed mapping due to time point. Some codes used in low frequency like “Pre-transplant” (79/5,453 records, 1%), “CRi” (76/5,453 records, 1%) and “Pre-induction” (12/5,453 records, 0.2%) could potentially be mapped. Majority of remaining time points were ambiguous and would require additional data sources (e.g., discrete pathology results data for clinical samples taken on same date, if available). For records we did not select, we do not expect to include the 932/8,523 (11%) records without an MRN. Of the 2,138 records with an MRN, 2,108 had other diagnoses, of which acute lymphoblastic leukemia (ALL) was the most common (528 records); however, only 16 of those ALL records had time points that could map to disease status. The 30 AML and MDS records we did not select had other sample types (e.g., 4 blood samples with other collection preservatives, 24 oral wash samples). We could map control samples (e.g. control sample types or normal diagnosis), which would not depend on time point; however, we would only gain 88 records. Overall, this emphasizes that the main challenge in mapping the LPBD data to the OMOP Specimen table was populating the disease status field. To expand our ETL to as many records as possible, we would need to explore additional data sources or to code the disease status as Concept ID = 0. If we were to use Concept ID = 0, each query to identify samples would need to incorporate additional fields to determine the disease status or returned records may require manual chart review.

In the current analysis, we focused on data from a single biobank. The Leukemia Program biobank was one of the more mature efforts at WCM, and its databases resulted from prior informatics collaborations. Therefore, certain LPDB characteristics may not be representative of other biobanks, but our experience may inform efforts at other institutions. Like many biobanks, available data depends on the study for which each sample was collected. Some samples come from outside institutions (and therefore will not have an MRN), and biobank staff depends on collaborating researchers to supply data. (Some collaborators may have lower motivation or resources available to enter high-quality data for future use, as their primary goal is their current study.) The hematologic malignancy focus also brings separate priorities and challenges which may not align with use cases that informed the OMOP Specimen table design. From our general biobanking experience, we suspect the OMOP Specimen table was designed from a solid tumor perspective, where tissue may be divided or macrodissected into tumor and normal samples and anatomic site is more relevant.

We implemented the OMOP Specimen table at WCM using leukemia biobank data as a novel way to improve biobank efficiency and utilization. We selected sample characteristics that were most common in the database to create our study data set. We created mappings for codes and fields, using varied approaches across codes and combinations of codes. Finally, we analyzed the resulting ETL, opportunities to both expand our ETL and potential considerations for future OMOP versions.

Conclusion

Our OMOP Sample table implementation demonstrates challenges within curating clinical research data in general, and for biospecimens in particular. We expect other biobank informatics teams to face similar challenges and trade- off decisions as they implement OMOP or other common data models. While imperfect and incomplete, the opportunity to combine limited data collected for biobanking with more comprehensive and standardized EHR datasets in a common data model dramatically increases the utility of samples for additional studies.

Figures & Table

References

1.Atlanta, GA: American Cancer Society; 2019. Key statistics for acute myeloid leukemia (AML) [Internet]. [cited 2019 Jul 20]. Available from: https://www.cancer.org/cancer/acute-myeloid-leukemia/about/key-statistics.html. [Google Scholar]
2.Cancer stat facts: leukemia [Internet] National Cancer Institute. 2019. [cited 2019 Apr 3]. Available from: https://seer.cancer.gov/statfacts/html/leuks.html.
3.Myelodysplastic syndromes treatment (PDQ®)–patient version [Internet] National Cancer Institute. [updated 2019 Mar 28; cited 2019 Jul 29]. Available from: https://www.cancer.gov/types/myeloproliferative/patient/myelodysplastic-treatment-pdq.
4.Adult acute myeloid leukemia treatment (PDQ®)–patient version [Internet]. National Cancer Institute. [updated 2019 Jul 23; cited 2019 Jul 22]. Available from: https://www.cancer.gov/types/leukemia/patient/adult-aml-treatment-pdq#_1.
5.Paradiso AV, Daidone MG, Canzonieri V, Zito A. Biobanks and scientists: supply and demand. Journal of translational medicine. 2018;16(1):136. doi: 10.1186/s12967-018-1505-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Hripcsak G, Duke JD, Shah NH, Reich CG, Huser V, Schuemie MJ, et al. Observational health data sciences and informatics (OHDSI): opportunities for observational researchers. Studies in health technology and informatics. 2015;(216):574–8. [PMC free article] [PubMed] [Google Scholar]
7.Reisinger SJ, Ryan PB, O"Hara DJ, Powell GE, Painter JL, Pattishall EN, et al. Development and evaluation of a common data model enabling active drug safety surveillance using disparate healthcare databases. Journal of the American Medical Informatics Association : JAMIA. 2010;17(6):652–62. doi: 10.1136/jamia.2009.002477. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Overhage JM, Ryan PB, Reich CG, Hartzema AG, Stang PE. Validation of a common data model for active safety surveillance research. Journal of the American Medical Informatics Association : JAMIA. 2012;19(1):54–60. doi: 10.1136/amiajnl-2011-000376. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Voss EA, Makadia R, Matcho A, Ma Q, Knoll C, Schuemie M, et al. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases. Journal of the American Medical Informatics Association : JAMIA. 2015;22(3):553–64. doi: 10.1093/jamia/ocu023. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Maier C, Lang L, Storf H, Vormstein P, Bieber R, Bernarding J, et al. Towards implementation of OMOP in a German university hospital consortium. Applied clinical informatics. 2018;9(1):54–61. doi: 10.1055/s-0037-1617452. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Rinner C, Gezgin D, Wendl C, Gall W. A clinical data warehouse based on OMOP and i2b2 for Austrian health claims data. Studies in health technology and informatics. 2018;248:94–9. [PubMed] [Google Scholar]
12.Hripcsak G, Levine ME, Shang N, Ryan PB. Effect of vocabulary mapping for conditions on phenotype cohorts. Journal of the American Medical Informatics Association : JAMIA. 2018;25(12):1618–25. doi: 10.1093/jamia/ocy124. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Chen C, Wulff RT, Sholle ET, Roboz GJ, Kraemer DA, Campion TR. Evaluating generalizability of a biospecimen informatics approach: support for local requirements and best practices. AMIA Joint Summits on Translational Science proceedings AMIA Joint Summits on Translational Science. 2018;2017:55–62. [PMC free article] [PubMed] [Google Scholar]
14.Sholle ET, Kabariti J, Johnson SB, Leonard JP, Pathak J, Varughese VI, et al. Secondary use of patients" electronic records (SUPER): an approach for meeting specific data needs of clinical and translational researchers. AMIA Annual Symposium proceedings AMIA Symposium. 2017;(2017):1581–8. [PMC free article] [PubMed] [Google Scholar]
15.Specimen [Internet] Observational Health Data Sciences and Informatics; [updated 11 Oct 2018; cited 2019 Jul 29]. Available from: https://github.com/OHDSI/CommonDataModel/wiki/SPECIMEN.
16.Peripheral blood specimen [Internet] Odysseus Data Services, Inc.; [updated 2018 Dec 28; cited 2019 Jul 29]. Available from: http://athena.ohdsi.org/search-terms/terms/4047495.
17.Blood specimen [Internet] Odysseus Data Services, Inc.; [updated 2018 Dec 28; cited 2019 Jul 29]. Available from: http://athena.ohdsi.org/search-terms/terms/4001225.
18.Blood specimen with EDTA [Internet] Odysseus Data Services, Inc.; [updated 2018 Dec 28; cited 2019 Jul 29]. Available from: http://athena.ohdsi.org/search-terms/terms/40482922.
19.Blood specimen submitted in heparinized collection tube [Internet] Odysseus Data Services, Inc. ; [updated 2018 Dec 28; cited 2019 Jul 29]. Available from: http://athena.ohdsi.org/search-terms/terms/40486989.
20.Bone marrow specimen [Internet] Odysseus Data Services, Inc.; [updated 2018 Dec 28; cited 2019 Jul 29]. Available from: http://athena.ohdsi.org/search-terms/terms/4000623.
21.Venous structure [Internet] Odysseus Data Services, Inc.; [updated 2018 Dec 28; cited 2019 Jul 29]. Available from: http://athena.ohdsi.org/search-terms/terms/4104340.
22.Bone marrow structure [Internet] Odysseus Data Services, Inc.; [updated 2018 Dec 28; cited 2019 Aug 12]. Available from: http://athena.ohdsi.org/search-terms/terms/4029619.
23.Normal [Internet] Odysseus Data Services, Inc.; [updated 2018 Dec 28; cited 2019 Aug 14]. Available from: http://athena.ohdsi.org/search-terms/terms/4069590.
24.Abnormal [Internet]. Odysseus Data Services, Inc.; [updated 2018 Dec 28; cited 2019 Aug 14]. Available from: http://athena.ohdsi.org/search-terms/terms/4135493.
25.Malignant [Internet] Odysseus Data Services, Inc.; [updated 2018 Dec 28; cited 2019 Aug 14]. Available from: http://athena.ohdsi.org/search-terms/terms/4066212.
26.Research administration [Internet] Odysseus Data Services, Inc.; [updated 2018 Dec 28; cited 2019 Aug 14]. Available from: http://athena.ohdsi.org/search-terms/terms/ext-link>44803476.

[r1-3269642] 1.Atlanta, GA: American Cancer Society; 2019. Key statistics for acute myeloid leukemia (AML) [Internet]. [cited 2019 Jul 20]. Available from: https://www.cancer.org/cancer/acute-myeloid-leukemia/about/key-statistics.html. [Google Scholar]

[r2-3269642] 2.Cancer stat facts: leukemia [Internet] National Cancer Institute. 2019. [cited 2019 Apr 3]. Available from: https://seer.cancer.gov/statfacts/html/leuks.html.

[r3-3269642] 3.Myelodysplastic syndromes treatment (PDQ®)–patient version [Internet] National Cancer Institute. [updated 2019 Mar 28; cited 2019 Jul 29]. Available from: https://www.cancer.gov/types/myeloproliferative/patient/myelodysplastic-treatment-pdq.

[r4-3269642] 4.Adult acute myeloid leukemia treatment (PDQ®)–patient version [Internet]. National Cancer Institute. [updated 2019 Jul 23; cited 2019 Jul 22]. Available from: https://www.cancer.gov/types/leukemia/patient/adult-aml-treatment-pdq#_1.

[r5-3269642] 5.Paradiso AV, Daidone MG, Canzonieri V, Zito A. Biobanks and scientists: supply and demand. Journal of translational medicine. 2018;16(1):136. doi: 10.1186/s12967-018-1505-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r6-3269642] 6.Hripcsak G, Duke JD, Shah NH, Reich CG, Huser V, Schuemie MJ, et al. Observational health data sciences and informatics (OHDSI): opportunities for observational researchers. Studies in health technology and informatics. 2015;(216):574–8. [PMC free article] [PubMed] [Google Scholar]

[r7-3269642] 7.Reisinger SJ, Ryan PB, O"Hara DJ, Powell GE, Painter JL, Pattishall EN, et al. Development and evaluation of a common data model enabling active drug safety surveillance using disparate healthcare databases. Journal of the American Medical Informatics Association : JAMIA. 2010;17(6):652–62. doi: 10.1136/jamia.2009.002477. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r8-3269642] 8.Overhage JM, Ryan PB, Reich CG, Hartzema AG, Stang PE. Validation of a common data model for active safety surveillance research. Journal of the American Medical Informatics Association : JAMIA. 2012;19(1):54–60. doi: 10.1136/amiajnl-2011-000376. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r9-3269642] 9.Voss EA, Makadia R, Matcho A, Ma Q, Knoll C, Schuemie M, et al. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases. Journal of the American Medical Informatics Association : JAMIA. 2015;22(3):553–64. doi: 10.1093/jamia/ocu023. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r10-3269642] 10.Maier C, Lang L, Storf H, Vormstein P, Bieber R, Bernarding J, et al. Towards implementation of OMOP in a German university hospital consortium. Applied clinical informatics. 2018;9(1):54–61. doi: 10.1055/s-0037-1617452. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r11-3269642] 11.Rinner C, Gezgin D, Wendl C, Gall W. A clinical data warehouse based on OMOP and i2b2 for Austrian health claims data. Studies in health technology and informatics. 2018;248:94–9. [PubMed] [Google Scholar]

[r12-3269642] 12.Hripcsak G, Levine ME, Shang N, Ryan PB. Effect of vocabulary mapping for conditions on phenotype cohorts. Journal of the American Medical Informatics Association : JAMIA. 2018;25(12):1618–25. doi: 10.1093/jamia/ocy124. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r13-3269642] 13.Chen C, Wulff RT, Sholle ET, Roboz GJ, Kraemer DA, Campion TR. Evaluating generalizability of a biospecimen informatics approach: support for local requirements and best practices. AMIA Joint Summits on Translational Science proceedings AMIA Joint Summits on Translational Science. 2018;2017:55–62. [PMC free article] [PubMed] [Google Scholar]

[r14-3269642] 14.Sholle ET, Kabariti J, Johnson SB, Leonard JP, Pathak J, Varughese VI, et al. Secondary use of patients" electronic records (SUPER): an approach for meeting specific data needs of clinical and translational researchers. AMIA Annual Symposium proceedings AMIA Symposium. 2017;(2017):1581–8. [PMC free article] [PubMed] [Google Scholar]

[r15-3269642] 15.Specimen [Internet] Observational Health Data Sciences and Informatics; [updated 11 Oct 2018; cited 2019 Jul 29]. Available from: https://github.com/OHDSI/CommonDataModel/wiki/SPECIMEN.

[r16-3269642] 16.Peripheral blood specimen [Internet] Odysseus Data Services, Inc.; [updated 2018 Dec 28; cited 2019 Jul 29]. Available from: http://athena.ohdsi.org/search-terms/terms/4047495.

[r17-3269642] 17.Blood specimen [Internet] Odysseus Data Services, Inc.; [updated 2018 Dec 28; cited 2019 Jul 29]. Available from: http://athena.ohdsi.org/search-terms/terms/4001225.

[r18-3269642] 18.Blood specimen with EDTA [Internet] Odysseus Data Services, Inc.; [updated 2018 Dec 28; cited 2019 Jul 29]. Available from: http://athena.ohdsi.org/search-terms/terms/40482922.

[r19-3269642] 19.Blood specimen submitted in heparinized collection tube [Internet] Odysseus Data Services, Inc. ; [updated 2018 Dec 28; cited 2019 Jul 29]. Available from: http://athena.ohdsi.org/search-terms/terms/40486989.

[r20-3269642] 20.Bone marrow specimen [Internet] Odysseus Data Services, Inc.; [updated 2018 Dec 28; cited 2019 Jul 29]. Available from: http://athena.ohdsi.org/search-terms/terms/4000623.

[r21-3269642] 21.Venous structure [Internet] Odysseus Data Services, Inc.; [updated 2018 Dec 28; cited 2019 Jul 29]. Available from: http://athena.ohdsi.org/search-terms/terms/4104340.

[r22-3269642] 22.Bone marrow structure [Internet] Odysseus Data Services, Inc.; [updated 2018 Dec 28; cited 2019 Aug 12]. Available from: http://athena.ohdsi.org/search-terms/terms/4029619.

[r23-3269642] 23.Normal [Internet] Odysseus Data Services, Inc.; [updated 2018 Dec 28; cited 2019 Aug 14]. Available from: http://athena.ohdsi.org/search-terms/terms/4069590.

[r24-3269642] 24.Abnormal [Internet]. Odysseus Data Services, Inc.; [updated 2018 Dec 28; cited 2019 Aug 14]. Available from: http://athena.ohdsi.org/search-terms/terms/4135493.

[r25-3269642] 25.Malignant [Internet] Odysseus Data Services, Inc.; [updated 2018 Dec 28; cited 2019 Aug 14]. Available from: http://athena.ohdsi.org/search-terms/terms/4066212.

[r26-3269642] 26.Research administration [Internet] Odysseus Data Services, Inc.; [updated 2018 Dec 28; cited 2019 Aug 14]. Available from: http://athena.ohdsi.org/search-terms/terms/ext-link>44803476.

PERMALINK

Mapping Local Biospecimen Records to the OMOP Common Data Model

Chelsea L Michael, MS

Evan T Sholle, MS

Regina T Wulff, MS

Gail J Roboz, MD

Thomas R Campion Jr, PhD

Abstract

Introduction

Methods

Setting

Leukemia Program Research Data Management

OMOP CDM Implementation at WCM

Table 1.

Data Collection

Mapping to OMOP CDM

Evaluation

Results

Mapping

Figure 1.

Population of OMOP CDM

Discussion

Conclusion

Figures & Table

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Mapping Local Biospecimen Records to the OMOP Common Data Model

Chelsea L Michael, MS

Evan T Sholle, MS

Regina T Wulff, MS

Gail J Roboz, MD

Thomas R Campion Jr, PhD

Abstract

Introduction

Methods

Setting

Leukemia Program Research Data Management

OMOP CDM Implementation at WCM

Table 1.

Data Collection

Mapping to OMOP CDM

Evaluation

Results

Mapping

Figure 1.

Population of OMOP CDM

Discussion

Conclusion

Figures & Table

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases