Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 May 5.
Published in final edited form as: Inform Med Unlocked. 2023 May 5;39:101259. doi: 10.1016/j.imu.2023.101259

Building the observational medical outcomes partnership's T-MSIS Analytic File common data model

Nick Williams 1
PMCID: PMC10249773  NIHMSID: NIHMS1901835  PMID: 37305615

Abstract

Objectives

This effort used Databricks to create an Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) for Transformed MSIS Analytic File (TAF) Medicaid records.

Materials and methods

Our process included data volume and content assessment of TAF, translation mapping of TAF concepts to OMOP concepts and the creation of Extract Transform and Load (ETL) code.

Results

The final CDM contained 119,048,562 individuals and 24,806,828,121 clinical observations from 2014 through 2018.

Discussion

The transformation of TAF into OMOP can support the generation of evidence with special attention to low-income patients on public insurance. Such patients are perhaps underrepresented in academic medical center patient populations.

Conclusion

Our effort successfully used Databricks to transform TAF records into OMOP CDM. Our CDM can be used to generate evidence for OMOP network studies.

Keywords: Common Data Model, Medicaid, Observational Medical Outcomes Partnership, Centers for Medicare and Medicaid Services, Observational Health Data Sciences and Informatics

Lay Summary

Generating evidence from academic medical centers, rather than safety net insurance programs may bias clinical evidence. Patients who use academic medical centers may differ from safety net patients. Safety net patients have meaningful preexisting conditions, barriers to care and real world precarity which may be lacking in patients who seek services at academic medical centers.

Changing the data source for evidence generation may improve the utility of evidence-based medicine in the United States. Using federal claims data from the Centers for Medicare and Medicaid Services (CMS) as an alternative data source has several barriers to success.

We investigate the complexity of CMS data available to researchers and further attempt to transform current vintage of Medicaid records (TAF) into the Observational Medical Outcomes Partnership’s (OMOP) Common Data Model (CDM) for 2014-2018 data years. The OMOP CDM is commonly used by researchers for multi-center network studies and evidence generation. A TAF OMOP CDM can provide over 119 million patients and tens of billions of clinical observations researchers. Further development of Medicare and legacy Medicaid records could see a maximum of 297 million individuals from 1999 through made available to researchers.

Background and Significance

Evidence-based medicine requires reoccurring assessments of clinical practice across a diversity of settings and patients to maintain efficacy and respond to change14. The increase in demand for multi-center network studies has largely been driven by the quality of clinical evidence they provide when compared with single institutions5,6. Multi-center network study methodology has improved remarkably over the years allowing for complex, international studies to be delivered. These network studies work by harmonizing retrospective data in electronic health record systems around the world into a specific research study. These studies work across languages and are delivered at highly competitive costs when compared with bedside studies. The Observation Medical Outcomes Partnership (OMOP) offers a Common Data Model (CDM) to support the interoperability of observational research study data710.

The Centers for Medicare and Medicaid Services (CMS) curates’ identifiable records for clinical services billed to CMS from 1999 onward to support data reuse research. Made available via the Chronic Conditions Warehouse (CCW), CMS data is traditionally vended as SAS data sets for analysis using SAS Enterprise Guide and SAS Grid11. CCW recently transitioned to the ‘cloud’ and now offers SAS Cloud Studio and Databricks as alternative points of access to CMS data. Databricks is more ‘accessible’ than SAS, as it can use Scala, Python, R or Spark-SQL for analysis. Yet Databricks does not explicitly support OMOP network studies12,13. This effort sought to evaluate the capacity of CCW’s implementation of Databricks to transform CCW/CMS files into the OMOP CDM. Such an Extract, Transform and Load (ETL) program could support future network studies that can produce evidence-based medicine learned from CMS records.

Methods

Data Volume and Complexity

We considered ‘Transformed MSIS Analytic Files’, or TAF records. TAF was chosen as its data model is the newest (launched in 2014) When compared with Medicare which has only four major programs, (parts A, B, C and D) TAF is more complex as it has a multi-state and territory, multi-plan and multi-qualification data ingestion. If Databricks can support TAF transformation it is highly likely that other CMS vintages such as Medicare and legacy Medicaid could also be transformed. We used a 100% sample of TAF records from our CCW instance from 2014-2018.

The complexity of our data can be understood through the contents of the SAS metadata table. The SAS metadata table was extracted from our CCW instance, and its contents are described under results in Table 1. Note that in this study the term Medicare Research Identifiable Files, or ‘RIF’ includes Medicare Part A, B, C and D records and CCW Medicare ancillary files. In this study the term Medicaid includes Children’s Health Insurance Program(s) (CHIP) records as well as CCW Medicaid ancillary files.

Table 1.

SAS Meta Library Results by Source Type, 1999-2020

MAX RIF TAF Grand Total
Tables Availability 6 5 11

Tables Records 1,664 9,828 576 12,068

Tables Demography 46 110 6 162

Tables Plan 2,729 282 3,011

Tables Provider 12 54 66

Total Tables 1,716 12,679 923 15,318

Variables Availability 60 50 110

Variables Records 93,620 369,350 43,848 506,818

Variables Demography 17,816 10,697 1,128 29,641

Variables Plan 43,892 9,078 52,970

Variables Provider 528 1,566 2,094

Total Variables 111,496 424,467 55,670 591,633

Observations Availability 342 285 627

Observations Records 43,401,901,190 182,116,907,321 54,014,468,190 279,533,276,701

Observations Demography 1,914,414,847 4,839,326,379 468,509,866 7,222,251,092

Observations Plan 29,076,094,144 3,522,794,988 32,598,889,132

Observations Provider 11,932,223 874,142,573 886,074,796

Total Observations 45,316,316,379 216,044,260,067 58,879,915,902 320,240,492,348

Byte Size Availability 1,179,648 983,040 2,162,688

Byte Size Records 8,444,199,370,752 23,261,504,733,184 13,295,086,665,728 45,000,790,769,664

Byte Size Demography 1,245,394,239,488 1,038,439,415,808 144,054,288,384 2,427,887,943,680

Byte Size Plan 2,839,727,732,736 389,178,523,648 3,228,906,256,384

Byte Size Provider 3,259,105,280 80,271,441,920 83,530,547,200

Total Bytes 9,689,594,789,888 27,142,930,987,008 13,908,591,902,720 50,741,117,679,616

Evaluating Potential Network Study Cases

This effort evaluated the number of distinct individuals, year on year and across all records for legacy Medicaid (Medicaid Analytical eXtract file, or MAX), TAF and RIF to detail the ‘size’ of the human population over time. This is the maximum population which could be potentially available to OMOP network studies from CMS records as of this writing. As a general rule CMS insures 30% of the United States in any given year, but the actual number of distinct individuals observed is under described in the public domain14. CMS cases can disenroll and die which raises churn effects when evaluating population size. The numbers presented below should not be considered a ‘vital statistics’ or a patient census as we do not disambiguate disenrollment duration or reason here. Table 2 describes the human populations within and across the source data sets. Attributable deaths are listed by year on declared date of death position, either from linked Social Security Death Index or state level files with a preference for Social Security Death Index when both were available.

Table 2.

Distinct Beneficiary IDs by Source and Year with Distinct Cases and Mortality

Medicare Medicaid & CHIP Medicaid &CHIP

Year RIF Cases Max Cases TAF Cases Distinct Cases Distinct Deaths

1999 41,422,602 43,587,106 0 82,401,813 2,037,383

2000 41,847,508 46,334,479 0 84,735,471 2,051,078

2001 42,274,081 50,078,314 0 88,483,839 2,062,557

2002 42,794,321 55,063,855 0 93,178,038 2,086,503

2003 43,414,525 57,638,888 0 96,316,503 2,092,981

2004 44,079,802 60,244,145 0 99,240,873 2,052,924

2005 44,770,011 61,429,537 0 101,344,436 2,098,005

2006 45,685,188 61,661,640 0 102,495,801 2,069,758

2007 46,735,669 61,672,723 0 103,427,283 2,060,419

2008 47,868,545 63,842,647 0 106,515,413 2,112,545

2009 48,916,748 67,689,537 0 111,213,789 2,089,754

2010 50,052,744 71,330,572 0 116,332,062 2,138,267

2011 51,667,138 74,953,622 0 121,569,938 2,191,609

2012 53,540,264 77,166,643 0 125,060,670 2,221,161

2013 55,206,238 77,914,095 0 127,156,602 2,274,622

2014 56,767,788 67,984,037 20,899,787 140,920,665 2,297,802

2015 58,294,195 50,395,155 47,983,004 152,064,152 2,399,150

2016 59,818,481 0 100,547,717 155,690,474 2,438,629

2017 61,405,844 0 101,036,546 158,089,147 2,512,472

2018 62,930,784 0 99,894,759 158,832,269 2,544,377

2019 64,430,729 0 98,148,053 159,125,593 2,556,661

2020 65,901,907 0 NA 65,901,907 2,753,259

Distinct 110,223,936 158,320,187 126,424,544 297,494,554 49,141,916

Qualitative Variable Descriptive Analysis for CDM Mapping

The ETL process required mapping of TAF variables to CDM variables. This was accomplished using hand classification of TAF and CDM table-variable pairs fit to a custom value set. The custom value set mapping created a subset to subset relationship that candidate elements from TAF could be drawn from to satisfy specific subsets of the OMOP CDM. In this information reduction approach, all commensurately mapped variables in TAF and OMOP CDM were eligible candidates for the ETL. The final decision to map values was informed by the TAF data dictionary. 1,850 distinct variable-table pairs contained in TAF were considered along with the 650 variable-table pairs contained in the OMOP CDM. This mapping also supports data characterization of TAF. The final map is not presented here, but the variable level classification is described in Table 3 to inform data characterization of the TAF data set.

Table 3.

Qualitative distribution of TAF Distinct Table Type-Variables

Demographic & Enrollment Inpatient (IP) Files Long Term Care (LT) Files Other Services (OT) Files Pharmacy (RX) Files Total
Financial 414 58 64 61 52 649

Enrollment 577 1 1 2 1 582

Clinical 94 63 33 32 21 243

Meta 10 36 21 19 14 100

Subject 63 7 7 7 5 89

Provider 10 22 21 23 9 85

Place 18 10 9 10 6 53

Service 30 6 5 5 0 46

Sequence 0 1 1 1 0 3

Grand Total 1,216 204 162 160 108 1,850

The ETL

The ETL code was written and run, informed by our mapping efforts, on Databricks as Spark-SQL code. The resulting CDM was assessed for diagnostic (DX), procedure (PX), and medication (RX) code volumes relative to distinct individuals. We further detail the recoverable share of diagnostic, procedure, and medication codes available from OHDSI Athena; the Observational Health Data Science Initiative’s (OHDSI) vocabulary solution. Figures 13 describe the diagnostic, procedure, and medication volume by distinct individual where each individual can qualify for a specific DX, PX, or RX code once over the multi-year study period; Figures 13 detail these relationships as scatterplots.

Figure 1. Top 10 Diagnostic volumes with distinct patients, TAF 2014-2018.

Figure 1.

Figure one demonstrates the relationship between distinct individuals and their diagnostic code utilization over the study period in TAF records. The x axis describes the diagnostic events, and the y axis describes the distinct case volume. Diagnostic codes are plotted relative to the x and y axis. The ten largest patient volumes are labeled by diagnostic code and appear in the legend to the right.

Figure 3. Top 10 Medication volumes with distinct cases, TAF 2014-2018.

Figure 3.

Figure three demonstrates the relationship between distinct individuals and their national drug code utilization over the study period in TAF records. The x axis describes the dispensation events, and the y axis describes the distinct case volume. National drug codes are plotted relative to the x and y axis. The ten largest patient volumes are labeled by Athena concept name and appear in the legend to the right.

Results

The CCW SAS meta data table described 591,633 total variables across 15,318 tables at 50.74 terabytes and 320.24 billion records. These values reflect MAX, TAF, RIF and associated CCW ancillary files. Tables containing patient records reflect a year-month header-line file structure (or state-year-service type in the case of MAX). Variables are not distinct but persist within table classes (i.e., header-line, year-month) nor are table types (i.e., inpatient, long term care). While some records are not ‘clinical’ in nature they inform payments, eligibilities, provide details of individual clinical providers or patient demography over time. Our meta data describes RIF 1999 through 09/2021, TAF 2014-2019 and all of MAX, 1999-2015. Note MAX is a closed set; it is a legacy method of describing Medicaid records. Post 2016 Medicaid records are described with TAF tables only; TAF does not retrospectively subsume MAX. The meta data table was classified on table and series level in Table 1 to support interpretation. First MAX, RIF and TAF are described at table volume, then variable volume and finally observation and byte volume. Note that in Table 1, ‘demography’ includes enrollment point in time qualification data in some CMS records vintages. ‘Plan’ describes kinds of coverage while ‘records’ indicate clinical information.

Complex, relational distributions can be calculated from Table 1. For example, MAX has six tables which describe the state level record availability during the year for the MAX-TAF transition period, which was staggered by submitting states over time. These six tables had sixty non-distinct variables or ten each and 342 records or thirty-four records per variable for 1.17 megabytes of disk space. Similar uses show that RIF is larger than TAF when year is not disambiguated despite Medicaid being understood to have a larger year on year patient volume. Max stores records as state-record type-year-service table series while TAF and RIF follows a header-line-month-year-service table series, which may explain some table volume differences.

Table 2 details the number of distinct beneficiary IDs by year, program or across the entire observation period. These IDs are made by CCW out of identifiable data elements (i.e., social security number, first name, last name, date of birth) to allow for linking individuals across CMS programs. Distinct cases described in Table 2 are not rolling counts. Beneficiaries with a populated date of death are also included as well as distinct deaths within the year. Deaths and disenrollment are major routes of exit/churn for CMS. CHIP cases which are detailed in Medicaid are disenrolled at age 19; perhaps eight out of ten of decedents nationally are Medicare beneficiaries15,16. Over 49 million distinct beneficiary attributable deaths and 297 million distinct beneficiaries are observed in the data set. The MAX to TAF transition period occurred at random months within states, and individuals in transitioning states most likely were listed in both record systems if they billed on both ends of the transition within the calendar year. Distinct cases and deaths reported in Table 2 should control for TAF transition year double counting MAX cases. If there were 331 million individual Americans in 2019, then Table 2 demonstrates that 48.03% of the population had some CMS coverage for at least part of the year17. TAF data was not available for 2020 as of this writing.

Table 3 describes qualitative distributions of 1,850 distinct TAF variable-table pairs by row within table type by column. Table types ignore year and month and considers distinct variables within five broad types. While this confuses the specificity of file size and volume it enhances general understanding of the content within TAF and its distributions within table classes. Line and Header tables are further ambiguated within table classes for ease of presentation. Qualitative groups are intended to be general and reduce the complexity of TAF terms significantly (923 tables vs 5 table types, 55,670 variables vs 1,850 distinct variable table pairs).

Non-intuitive groupings include ‘Meta’, which includes values like ‘CCW load date’, as well as ‘TAF version’ information and ‘Sequence’ which describes ‘occurrence code’ sequences. Note the destination CDM variable-table pairs (n:650) were coded using identical groupings to facilitate TAF to CDM mapping by reducing potential candidates. Table Two demonstrates that the majority of TAF variables within table type are financial and enrollment variables, not clinical variables. Most variables are also located in demographic and eligibility files rather than patient record files. TAF contains clinical data, but its existence is intertwined with complex enrollment and payor details which is of low value in evidence-based medicine production.

The TAF ETL produced a distinct patient volume of 119,048,562 cases. Diagnostic volume found 8,178,460,032 diagnostic events and 89,527 distinct diagnostic codes, 364 (3,215,426 events) of which were not re-identifiable in OHDSI Athena. Procedure volume considered 9,463,745,977 procedure events and 141,188 distinct procedure codes, 70,123 were not present in OHDSI Athena (761,382,000 events). State specific procedure codes are high volume within the non-recoverable fraction. For example, the state of Ohio Medicaid PASSP program lists code ‘PT624’ PERSONAL CARE SERVICE-15 MINUTES’; and the ETL finds 12,438,618 events for code ‘PT624’ suggesting a potential match not found in Athena OHDSI.

Medication volume considered 94,617 distinct codes and 3,297,909,791 medication events; 3,255,067,575 are present in OHDSI Athena National Drug Codes (NDC) or 98.70% recall. The largest ‘non-reconcilable’ medication volume in TAF-NDC position was ‘HCPCS-NDC’ with 2,014,058 distinct individuals billing across 212,899,966 claims. These are most likely ‘procedure drugs’ as they predominantly occur on inpatient and outpatient office visit claims. To our knowledge, no comprehensive, integrated state specific and retrospective list exists of Medicaid program NDCs. Additional sources of under-recall include multiple NDC vaccine regiments which use a Centers for Disease Control and Prevention curated value set and state specific ‘preferred diabetic supply’ lists such as glucose test strips and lancets18. We used a convenience sample of state specific ‘over the counter’ NDCs, combination vaccine NDCs and state specific preferred ‘diabetic supply’ lists which further resolved recall error for at least 7,717,813 reconcilable events.

Figures 1, 2 and 3 describe the diagnosis, procedure, and medication volumes in the resulting ETL. The top ten codes by distinct patient volume are listed in the legend to the right in alphabetical order with the remainder demarcated as ‘NA’. Note that diagnostic, procedure, and medication are study terms which span various vocabularies. Diagnostic codes could be ICD10-CM, ICD9-CM, APC or DRG codes ‘officially’ as well as state specific diagnostic codes. Procedures include ICD10-PCS, ICD9-Proc, Health Care Common Procedure Coding System (HCPCS) and Common Procedure Terminology (CPT) as well as state specific codes. RX should be an NDC code; but NDCs with erroneous zero injection, over the counter medication and medical devices (empty syringes, sanitary pads, Ensure, condoms, nicotine patches) may also frequent ‘RX’ records.

Figure 2. Top 10 Procedure volumes with distinct cases, TAF 2014-2018.

Figure 2.

Figure two demonstrates the relationship between distinct individuals and their procedure code utilization over the study period in TAF records. The x axis describes the procedure events, and the y axis describes the distinct case volume. Procedure codes are plotted relative to the x and y axis. The ten largest patient volumes are labeled by procedure and appear in the legend to the right.

Figure 1 shows the relationship between the volume of diagnostic code utilization (x axis) and distinct individuals (y axis) by diagnostic code (dot) plotted within Athena concept name, or human readable diagnostic code description. Time is not considered, and the events plotted covers five TAF study years. High utilization, common diagnoses are observed. Though seemingly generic, the top 10 diagnoses are pathology cause agnostic, and more specific codes would have higher diversity. They are perhaps emergency department chief complaints and ‘reason for office visit’ codes which are expected among care seeking children and older adults whose care is frequently covered by Medicaid programs.

Figure 2 demonstrates the relationship between the volume of procedure code utilization (x axis) and distinct individuals (y axis) by procedure code (dot) plotted within Athena concept name. The top ten kinds of procedure codes include common office visit type Emergency Department (ED), explicit office visits with time interval and dental visits (oral evaluations). Blood venipuncture and metabolic panels are also highly common procedures in the TAF population, likely because they are disease condition agnostic.

Figure three shows the relationship between the volume of medication codes (National Drug Codes) being used (x axis) and distinct individuals (y axis) by medication code (dot) plotted within Athena concept name. Two concept names reflect two different NDC codes, and all four are Amoxicillin with variation in dosage and route of administration. Asthma inhalers are the most common medication in TAF over the study period by patient and dispensation volume. Chronic medications that have large populations (asthma) and common childhood antibiotics frequent heavily in TAF, perhaps because TAF contains America’s Child Health Insurance Program records.

Discussion

This study found that it is computationally possible to produce an OMPO CDM model of TAF records. It is highly likely that Databricks could support similar work for MAX and RIF, given its superior complexity. Table 1 demonstrates that CCW contains 50.74T of data for 22 data years. This could be used to benchmark the needs of local instances and CDM expansion. Table 3 demonstrates that the majority of TAF variables describe financial transactions (649/1850) followed by enrollment (582/1850). As enrollment qualification and duration is on some level a financial variable class one could say half of TAF is financial or finance related, rather than clinical. The true clinical value of the data volume within TAF is perhaps much smaller than the TAF byte footprint.

Table 2 demonstrates multiple methods for considering size. With 297 million distinct individuals described in CCW data the distance between CMS data and a ‘national’ electronic health record is not as vast as is generally assumed. There are perhaps 331 million people in the United States on any given day or at best 87% case capture21,22. The length of record was not evaluated in this study; do note disenrollment from CHIP at age 19 does not bar re-enrollment in Medicaid proper or Medicare later in life.

The clinical data within TAF is described in Figures 13 in broad, descriptive terms. Large data sets should have high volumes of generic diagnoses, procedure visits and routine medications unlike specialized patient data sets. Clinical data was not transformed or ‘cleaned’ for rational responses. Rational, ‘out of vocabulary’ responses are described in the results section. This is notable when attempting to reconcile NDC events as well as state specific procedure codes. Recall may improve if a comprehensive index of state specific procedures and NDCs could be maintained. Though the issue of legacy and incomplete NDC lists is well known there is no readily available, comprehensive solution. Though with some ingenuity near complete recall was possible as demonstrated above by using state specific convenience samples of code lists 19,20.

Towards use cases, Real World Evidence generation could be thought of as a synthetic trial arm with controls.2325 The CDM could support this, provided the trial being reproduced had suitable diagnostic, procedure and NDC terms. The ability of the 100% sample CMS records to segment errors by facility, demography, providers or individual within a sensitivity analysis is high value for evidence-based medicine.

Towards future work the best test of the CDM is its ability to participate in OHDSI network studies in ways which add unique value, rather than simply offering confirmatory findings. Network studies should consider CMS data as it is perhaps the closest approximation of a national electronic health record for the United States. This study finds over 297 million individuals described (at varying record lengths) and 49 million observed deaths. In turn CMS records are most likely superior to similar recall fractions in hospital settings or private insurance providers. While there is no guarantee that the care of interest is observed in the observation period or record length, patients re-enroll throughout the life course when clinical severity, financial need and eligibility necessitate and permit. Despite this limitation the data reuse value of a Medicaid OHDSI CDM is most likely high. The ETL code can be accessed at: https://github.com/lhncbc/CRI/tree/master/EtlOmopMedicaid

Conclusions

The TAF OMOP CDM was a success, though the true test should be participation in an OMOP network study. Mastering source data specific considerations, including state specific, non-internationalized coding conventions will be key to a successful international study.

Acknowledgements

This study was carried out by staff of the National Library of Medicine (NLM), National Institutes of Health, with support from NLM. This research was supported by the U.S. Department of Health and Human Services (HHS) Office of the Secretary Patient Centered Outcomes Research Trust Fund (PCORTF) under Award ID: 21-002-T-MSIS CDM-NIH (ASPE IAA: HP-750121PE080005).

Special thanks to Vojtech Huser for his contributions and early leadership of this project.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Declaration of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Nick Williams reports financial support was provided by US Department of Health and Human Services Office of the Assistant Secretary for Planning and Evaluation.

References

  • 1.Davidoff F, Haynes B, Sackett D, Smith R. Evidence based medicine. BMJ. 1995;310(6987):1085–1086. doi: 10.1136/bmj.310.6987.1085 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Djulbegovic B, Guyatt GH. Progress in evidence-based medicine: a quarter century on. The Lancet. 2017;390(10092):415–423. doi: 10.1016/S0140-6736(16)31592-6 [DOI] [PubMed] [Google Scholar]
  • 3.Greenhalgh T, Howick J, Maskrey N. Evidence based medicine: a movement in crisis? BMJ. 2014;348:g3725. doi: 10.1136/bmj.g3725 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Sackett DL, Rosenberg WMC. On the need for evidence-based medicine. Journal of Public Health. 1995;17(3):330–334. doi: 10.1093/oxfordjournals.pubmed.a043127 [DOI] [PubMed] [Google Scholar]
  • 5.Higgins JP, Giovane CD, Chaimani A, Caldwell DM, Salanti G. Evaluating the Quality of Evidence from a Network Meta-Analysis. Value in Health. 2014;17(7):A324. doi: 10.1016/j.jval.2014.08.572 [DOI] [PubMed] [Google Scholar]
  • 6.Faltinsen EG, Storebø OJ, Jakobsen JC, Boesen K, Lange T, Gluud C. Network meta-analysis: the highest level of medical evidence? BMJ Evidence-Based Medicine. 2018;23(2):56–59. doi: 10.1136/bmjebm-2017-l10887 [DOI] [PubMed] [Google Scholar]
  • 7.Hripcsak G, Ryan PB, Duke JD, et al. Characterizing treatment pathways at scale using the OHDSI network. PNAS. 2016;113(27):7329–7336. doi: 10.1073/pnas1510502113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Hripcsak G, Schuemie MJ, Madigan D, Ryan PB, Suchard MA. Drawing Reproducible Conclusions from Observational Clinical Data with OHDSI. Yearb Med Inform. 2021;30(1):283–289. doi: 10.1055/s-0041-1726481 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Seong Y, You SC, Ostropolets A, et al. Incorporation of Korean Electronic Data Interchange Vocabulary into Observational Medical Outcomes Partnership Vocabulary. Healthc Inform Res. 2021;27(1):29–38. doi: 10.4258/hir.2021.27.1.29 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Reinecke I, Zoch M, Reich C, Sedlmayr M, Bathelt F. The Usage of OHDSI OMOP - A Scoping Review. Stud Health Technol Inform. 2021;283:95–103. doi: 10.3233/SHTI210546 [DOI] [PubMed] [Google Scholar]
  • 11.MacTaggart P, Foster A, Markus A. Medicaid Statistical Information System (MSIS): a data source for quality reporting for Medicaid and the Children’s Health Insurance Program (CHIP). Perspect Health Inf Manag. 2011;8:1d. [PMC free article] [PubMed] [Google Scholar]
  • 12.Etaati L Azure Databricks. In: Etaati L, ed. Machine Learning with Microsoft Technologies: Selecting the Right Architecture and Tools for Your Project. Apress; 2019:159–171. doi: 10.1007/978-l-4842-3658-l_10 [DOI] [Google Scholar]
  • 13.Zaharia M Lessons from Large-Scale Software as a Service at Databricks. In: Proceedings of the ACM Symposium on Cloud Computing. SoCC ‘19. Association for Computing Machinery; 2019:101. doi: 10.1145/3357223.3365870 [DOI] [Google Scholar]
  • 14.50-Facts-in-50-Days-Pt1.pdf. Accessed April 7, 2022. https://www.cms.gov/Outreach-and-Education/Look-Up-Topics/50th-Anniversary/50-Facts-in-50-Days-Pt1.pdf
  • 15.Cubanski J, Neuman T, Griffin S, Damico A. Medicare Spending at the End of Life: A Snapshot of Beneficiaries Who Died in 2014 and the Cost of Their Care. Published online 2014:12. [Google Scholar]
  • 16.Griffin S, Jul 14 ADP, 2016. Medicare Spending at the End of Life: A Snapshot of Beneficiaries Who Died in 2014 and the Cost of Their Care. KFF. Published July 14, 2016. Accessed February 3, 2022. https://www.kff.org/medicare/issue-brief/medicare-spending-at-the-end-of-life/
  • 17.U.S. Census Bureau QuickFacts: United States. Accessed April 7, 2022. https://www.census.gov/quickfacts/US
  • 18.IIS | NDC Crosswalk tables | Code Sets | HL7 Data | Vaccines | CDC. Published September 25, 2018. Accessed April 7, 2022. https://wcms-wp-test-br.cdc.gov/php-app-template/index.php [Google Scholar]
  • 19.Simonaitis L, McDonald CJ. Using National Drug Codes and Drug Knowledge Bases to Organize Prescription Records from Multiple Sources. Am J Health Syst Pharm. 2009;66(19):1743–1753. doi: 10.2146/ajhp080221 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Peters LB, Bodenreider O. Approaches to Supporting the Analysis of Historical Medication Datasets with RxNorm. AMIA Annu Symp Proc. 2015;2015:1034–1041. [PMC free article] [PubMed] [Google Scholar]
  • 21.Fragidis LL, Chatzoglou PD. Implementation of a nationwide electronic health record (EHR): The international experience in 13 countries. International Journal of Health Care Quality Assurance. 2018;31(2):116–130. doi: 10.1108/IJHCQA-09-2016-0136 [DOI] [PubMed] [Google Scholar]
  • 22.Gunter TD, Terry NP. The Emergence of National Electronic Health Record Architectures in the United States and Australia: Models, Costs, and Questions. Journal of Medical Internet Research. 2005;7(1):e383. doi: 10.2196/jmir.7.1.e3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Beaulieu-Jones BK, Finlayson SG, Yuan W, et al. Examining the Use of Real-World Evidence in the Regulatory Process. Clinical Pharmacology & Therapeutics. 2020;107(4):843–852. doi: 10.1002/cpt.1658 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Burcu M, Dreyer NA, Franklin JM, et al. Real-world evidence to support regulatory decision-making for medicines: Considerations for external control arms. Pharmacoepidemiology and Drug Safety. 2020;29(10):1228–1235. doi: 10.1002/pds.4975 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Fang Y, He W, Wang H, Wu M. Key considerations in the design of real-world studies. Contemporary Clinical Trials. 2020;96:106091. doi: 10.1016/j.cct.2020.106091 [DOI] [PubMed] [Google Scholar]

RESOURCES