Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Jun 1.
Published in final edited form as: Circ Cardiovasc Qual Outcomes. 2020 May 4;13(6):e006908. doi: 10.1161/CIRCOUTCOMES.120.006908

Data Quarantine in the Time of the COVID-19 Pandemic

Rashmee U Shah 1, Lesley H Curtis 2
PMCID: PMC7373373  NIHMSID: NIHMS1590300  PMID: 32364764

Our healthcare information is trapped. It is trapped in the proprietary data models of the electronic medical record and in our healthcare systems’ data warehouses. This reality has become strikingly clear as the COVID-19 pandemic has swept across the globe, killing >80,000 people in the United States alone. We need answers but struggle to address even the simplest questions. How many individuals are infected? Who is at highest risk for developing severe infection? What therapies are being used to treat hospitalized patients? This crisis is testing the limits of our public health and healthcare systems in many ways, including a “quarantined” health information system. This perspective reviews several deficiencies in healthcare information technology that currently limit our ability to deal with the pandemic and suggests current solutions moving forward.

In an ideal world, healthcare systems would speak the same language, communicate with public health agencies, and engage directly with the community. This type of system would allow us to track, learn, and innovate during the current crisis. The COVID-19 pandemic lays bare just how far we are from this vision and, sadly, the deficiency will have dire consequences. For example, pooling data across multiple institutions is critical for scientific discovery and community surveillance. No single healthcare system reflects the status of a community or region, and individually each lacks the sample size and diversity for robust, generalizable results. Yet bringing data together for pooled analyses is currently too difficult. Different healthcare systems essentially speak different languages. Even if two systems use the same electronic medical record (EMR) software, each build is individualized such that the same concept may be “hidden” in different places in the data. For example, hydroxychloroquine has emerged as a potential treatment option for COVID-19 but also has known QT prolonging effects and can cause ventricular arrhythmias. As these are rare complications, healthcare systems and researchers must pool data to identify these events among treated patients. When a clinician researcher says, “Find me the patients treated with hydroxychloroquine,” a data scientist hears, “Find me the patients with one of the drug codes that represents hydroxychloroquine.” Yet one healthcare system may use the National Drug Code directory to represent medications and another system might use Medi-Span to represent medications. The systems speak different languages, a barrier to pooling data for rapid analyses.

Common data models (CDM) address the interoperability issue to some degree but this is an imperfect solution. Using our example above, the two different hydroxychloroquine representations could be mapped to a single data format, to facilitate pooling. But systems must still map their data accurately to the CDM, and the upfront cost is generally steep for this labor intensive process. The CDM is essentially a “middle man” and, currently, the transformation process is not automated nor does it occur in real time. After data are mapped to the CDM, the tables in the EMR must be regularly repopulated with the latest data feeds, and data are changing minute to minute during pandemics like COVID-19. Ideally, data would be captured in a standard way at the source, eliminating the need for a middle man to re-map data in another step. In other words, all EMR data would use the same vocabularies to represent medications and other medical concepts upfront. We are far from that ideal today and must continue to rely on CDMs for the foreseeable future.

Communication between healthcare information systems and public health departments is another critical step, but current communication channels are antiquated and inefficient – akin to two kids talking with paper cups attached by a string. Many states, counties, and cities use case report forms for epidemiologic surveillance, including questions like demographics and basic lab results, data that are readily available in the EMR but often quarantined from others. Often, EMR do not communicate directly with the case reporting system of public health departments. The Los Angeles County Department of Public Health, for example, requires case and death reporting by phone or-email.1 A direct line of communication between healthcare system data and public health agencies would be a drastic improvement, but is, generally, unavailable. The redundant data collection systems breed errors and result in incomplete data collection, both of which are unacceptable in the current crisis. Public health workers are struggling to collect and clean data that should already be available, rather than plan and implement strategies to address new data collection needs (e.g., seroprevalence of viral antibodies) and public well-being.

Freeing healthcare system data to connect with other players outside of traditional medical and public health organizations in the pandemic creates substantial opportunities. For example, in Taiwan, the national health insurance data was linked to immigration and customs data, so clinicians were alerted in real time if their patient had recently travelled to a high risk region.2 Testing and subsequent quarantine of individuals was more effective early in the pandemic because of this data linkage. This type of system requires a unique identifier for each person that allows linkage across disparate systems, which has, thus far, been missing from our healthcare information system. Aside from the infrastructure needs, a national identifier, and resulting “surveillance,” requires a high level of trust in the government, which may be absent in some segments of the population.

Despite all of these challenges, COVID-19 is forcing innovation and pushing us to improve our data systems – an unlikely benefit of this crisis. Efficient data exchange to local, regional, and national public health departments has become a top priority in recent weeks. Recent federal regulations opened the door for business associates like EMR vendors to disclose protected health information, in support of health oversight agencies.3 New FHIR (Fast Healthcare Interoperability Resources) apps to facilitate data exchange from EMRs are under development;4,5 combined with new regulations, we may see progress in the short term. In the long term, however, we must move away from proprietary, siloed data models that dominate health information technology. A FHIR app addresses the data exchange issue, but does not always address the accuracy and uniformity of the data itself. Standardized vocabularies, terminologies, and definitions at the source are needed. In addition, we must help healthcare systems that treat poor, minority, and rural populations to participate in these data exchange and pooling efforts. As with many aspects of this pandemic, data quarantine will disproportionately hurt these populations; the systems that serve them lack the basics like COVID-19 testing, let alone advanced information technology teams to map data to CDMs or “stand up” real time registries. Answers generated from more privileged healthcare systems will not accurately reflect and help the patients who need it most.

Each day of the pandemic means more deaths and untold morbidity. Individually, each healthcare system has enough data to count cases, deaths, and critical resource needs. But only collectively can we foster scientific discovery to help the entire nation and world. Urgent questions demand answers that require efficient, nimble, and rapid data aggregation; quarantined data will kill us.

Supplementary Material

New manuscript file

Disclosures:

RUS is supported by grants from the National Heart Lung and Blood Institute (K08HL136850 and R03HL148372), a donation from Women As One, and is an Associate Editor for Circulation: Cardiovascular Quality of Care and Outcomes.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

New manuscript file

RESOURCES