Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Dec 1.
Published in final edited form as: Circ Cardiovasc Qual Outcomes. 2018 Dec;11(12):e005247. doi: 10.1161/CIRCOUTCOMES.118.005247

Data and Information in the Sea of Electronic Health Records

Rashmee U Shah 1, Michael E Matheny 2,3,4,5
PMCID: PMC6301080  NIHMSID: NIHMS1512894  PMID: 30562078

Despite the abundance of electronic healthcare data, many important clinical investigations are limited to administrative claims data because of wide availability and health care ascertainment coverage. Specifically, Medicare data includes millions of patients with demographic data, International Classification of Diseases (ICD) codes, medication data, and cost data over an extended period of time. Such data are advantageous because they are easily accessible, use standard variable definitions, and require minimal data cleaning. The data lend themselves to large population and trend analyses, but granular clinical details are absent, a perpetual challenge for nuanced analyses. Medicare data often lack key features of clinical decision making, co-morbidities, and treatments that result in mis-classification and unmeasured confounding in analyses.

In this issue of Circulation: Cardiovascular Quality of Care and Outcomes, Desai et al. address a limitation of Medicare data by creating a model to predict left ventricular ejection fraction (LVEF) among patients with heart failure.1 Treatment decisions (e.g. use of medications or an implantable defibrillator) in cardiovascular care are often based on LVEF and clinical outcomes (e.g. hospital readmissions) are associated with lower LVEFs, yet Medicare data does not include this critical data element.2,3 While ICD9 codes include specifications for reduced and preserved LVEF, these codes are uncommonly used and the accuracy is variable. In the current study, for example, 75% of patients were included based on an unspecified heart failure code.

In this study, the authors linked Medicare data with LVEF values from electronic health records (EHRs) from two Boston area healthcare systems, with LVEF values extracted from text in echocardiogram reports. The study included 11,703 patients with ICD9 heart failure codes and echocardiogram LVEF data, 7,105 for training and 3,968 for testing. LVEF prediction models were constructed using 57 candidate variables from Medicare data. In the testing data, the study demonstrated 83% accuracy in classifying patients as preserved (LVEF≥45%) or reduced (EF<45%) using 35 variables. Model performance was less accurate when estimating a continuous LVEF measurement or creating a three level classifier (LVEF<40%, 40% to 49%, or >49%). For the latter, the model accuracy was 79% in the testing data. The authors propose to use this algorithm in retrospective, observational data analyses to predict the clinically relevant LVEF function category for Medicare patients.

This study raises several important questions about the limitations of healthcare data and what stakeholders--clinicians, patients, healthcare system leaders, and data scientists--should expect from our electronic data and subsequent analytics. What degree of accuracy in electronic phenotyping is sufficient for which categories of tasks? The stated accuracy of the binary LVEF algorithm was 83% with the error weighted towards mis-classifying reduced LVEF as normal LVEF. This significant level of misclassification could have substantial impact on future analyses relying on LVEF classification using this algorithm, without appropriate disclosure of the potential misclassification impact. Inaccuracies could influence both direction and strength of associations, and rigorous sensitivity analyses should accompany any use of this algorithm.

Furthermore, even if the algorithm was 100% accurate, single point of care LVEF binary estimation does not capture the heterogeneity of heart failure phenotypes. For example, patients with mildly reduced LVEF ≥40% include a distinct group of patients that have recovered function.4 In order to learn the pathophysiology and determinants of recovery, a worthy treatment target, stakeholders should expect data systems that directly capture discrete LVEF values as structured data elements over time. In the current article, LVEF was captured using a text-based, name-value pair regular expression matching approach. Echocardiogram reports are most often free text, without a structured format, and the need to extract this data is common in health care systems. For example, the national Veterans Health Affairs data infrastructure also requires LVEF extraction through text processing algorithms.5 Given the importance of LVEF in the care of patients with cardiovascular disease, stakeholders should expect that LVEF data is captured in a structured format using a controlled vocabulary, in a similar manner to other laboratory and vital sign data. This approach would provide the data infrastructure needed to answer important cardiovascular treatment questions in real-world, learning health care systems.

More broadly, the article by Desai et al. is the result of lack of interoperability in our healthcare data systems. Medicare providers lack an agreed upon standard to capture and store LVEF, and many other essential data elements. As a result, these data elements cannot be shared between providers or with the central repository and we are unable to use the data to improve patient care. The National Academy of Medicine recently outlined the critical need for interoperability in order to ensure safety, reduce costs, and improve patient-centered outcomes. The report highlights the intolerance for non-interoperable data systems in other industries--banks and customers insist that ATM cards work throughout the world, for example.6 Thus far, clinicians have not insisted on interoperability and it is uncertain if we will be the driving force for changing the healthcare data landscape. Alternatively, patients may be best suited to push the system toward interoperability. Imagine this scenario: a heart failure patient is seen at one hospital and owns their healthcare data, similar to their bank card. If the patient went to a different hospital for a treatment, she would expect that the second hospital could use the data from the first hospital, just as she expects that her bank card works at a different ATM. In a scenario where the patient is in charge of the data, and healthcare systems must compete for that patient’s business, interoperability becomes a much more attractive business proposition.

In summary, the work by Desai et al. is a useful demonstration of using inference to account for missing data, and has potential utility in population-level assessment among patients with heart failure. In accordance with recent expectations for reproducibility, the authors have provided the coefficients for the models so that anyone can reproduce the predictions in different data sets. However, future users of this algorithm to assign LVEF should remain skeptical of the results, and depending on the use case, pursue sensitivity analyses. Ultimately, however, healthcare enterprise stakeholders should not have to infer values that are key determinants of healthcare outcomes. Clinicians should lead the difficult task of consensus building to create a list of essential data elements, an agreed upon representation, and insist that our data systems incorporate these elements. Most importantly, patients should demand interoperability in order to improve the quality and reduce the cost of the healthcare they receive.

Acknowledgments

RUS is supported by a grant from the National Heart Lung and Blood Institute, K08HL136850. MEM is supported by the following grants: PCORI CDRN 1306–04819, NIH BCHI R01-HL-130828, VA HSR&D IIR 13–052, NIH-DK-R01–113201, and VA HSR&D RES-13–457

References

  • 1.Desai RJ, Lin KJ, Patorno E, Barberio J, Lee M, Levin R, Evers T, Wang SV, Schneeweiss S. Development and preliminary validation of a Medicare claims based model to predict left ventricular 2 ejection fraction class in patients with heart failure. Circ Cardiovasc Qual Outcomes 2018;11:e004700. [DOI] [PubMed] [Google Scholar]
  • 2.Sutton NR, Li S, Thomas L, Wang TY, De Lemos JA, Enriquez JR, Shah RU, Fonarow GC. The association of left ventricular ejection fraction with clinical outcomes after myocardial infarction: Findings from the Acute Coronary Treatment and Intervention Outcomes Network (ACTION) Registry-Get with the Guidelines (GWTG) Medicare-linked database. Am Heart J 2016;178:65–73 [DOI] [PubMed] [Google Scholar]
  • 3.Russo AM, Stainback RF, Bailey SR, Epstein AE, Heidenreich PA, Jessup M, Kapa S, Kremers MS, Lindsay BD, Stevenson LW. ACCF/HRS/AHA/ASE/HFSA/SCAI/SCCT/SCMR 2013 appropriate use criteria for implantable cardioverter-defibrillators and cardiac resynchronization therapy: a report of the American College of Cardiology Foundation appropriate use criteria task force, Heart Rhythm Society, American Heart Association, American Society of Echocardiography, Heart Failure Society of America, Society for Cardiovascular Angiography and Interventions, Society of Cardiovascular Computed Tomography, and Society for Cardiovascular Magnetic Resonance. J Am Coll Cardiol 2013;61:1318–1368. [DOI] [PubMed] [Google Scholar]
  • 4.Wilcox JE, Yancy CW. Heart Failure-A New Phenotype Emerges. JAMA Cardiol 2016;1:507–509. [DOI] [PubMed] [Google Scholar]
  • 5.Patterson OV, Freiberg MS, Skanderson M, J Fodeh S, Brandt CA, DuVall SL. Unlocking echocardiogram measurements for heart disease research through natural language processing. BMC Cardiovasc Disord 2017;17:151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Pronovost P, Johns MME, Palmer S, Bono RC, Fridsma DB, Gettinger A, Goldman J, Johnson W, Karney M, Samitt C, Sriram RD, Zenooz A, Claire Wang Y. Achieving High-Quality, Connected, And Person-Centered Care 2018. Accessed 2018 Oct 26. Available from: https://nam.edu/wp-content/uploads/2018/10/Procuring-Interoperability_web.pdf [PubMed] [Google Scholar]

RESOURCES