Skip to main content
HHS Author Manuscripts logoLink to HHS Author Manuscripts
. Author manuscript; available in PMC: 2020 Aug 3.
Published in final edited form as: Arthritis Care Res (Hoboken). 2020 Jan 13;72(2):163–165. doi: 10.1002/acr.24102

Reimagining Rheumatology: Big Data and the Future of Clinical Practice and Research

Jinoos Yazdany 1, Marian T Hannan 2
PMCID: PMC7398717  NIHMSID: NIHMS1612859  PMID: 31651101

In this issue of Arthritis Care & Research, we present another set of themed papers which are relevant to rheumatology clinical practice and research. Our themed issues are designed to spark attention and influence knowledge growth in rheumatology. The topic for this themed issue is the pertinent use of Rheumatology Registries, ‘Big Data’, and use of very large patient or administrative data sources to inform patient or individual aspects of key outcomes in the rheumatic diseases. Manuscripts representing a broad range of topics across the lifespan were considered, including treat-to-target, disease burden, costs, and others using a wide variety of ‘Big Data’ sources. Submitted manuscripts for the themed issues of Arthritis Care & Research undergo the same peer-review procedures as those considered for our regular pages of the journal, and therefore meet the same rigorous standards.

The call for papers for this theme issue on Big Data resulted in 76 submissions. From these papers, we are proud to publish 12 articles covering important topics and issues in the rheumatic diseases, including osteoarthritis, rheumatoid arthritis, lupus, juvenile dermatomyositis and patient care costs. The tremendous response and the wide array of topics addressed by registries, administrative data and large patient data sources furthered our thoughts on using Big Data in rheumatology and the expected exponential growth in our utilization of these data to increase our understanding of rheumatic diseases. Big Data is an exciting avenue that captures nuances and detail that puts the evidence into evidence-based medicine for many aspects of disease management that have not been fully appreciated by past study designs or data collection.

The volume of Big Data in healthcare is growing exponentially, outstripping data growth in other sectors and creating both new opportunities and significant challenges for medicine and research. Billions of dollars have been invested in health IT infrastructure. Electronic health records have fundamentally changed the work of physicians and health systems. To be a rheumatologist today means spending as many or more hours interfacing with an electronic health record (EHR) as with patients. These seismic shifts in medicine have been accompanied by promises of safer, better, and even more cost-efficient care, but to a large extent, these promises remain unrealized. Yet, there is reason to be hopeful. Here we discuss the ways the field of rheumatology can harness Big Data to advance our specialty and improve health outcomes for people with rheumatic diseases.

What is Big Data? Big Data typically refers to databases or registries with high volume, rapid velocity, much variety, and high veracity. These 4 “v’s” characterize the vast amounts of data that can be aggregated from often disparate sources for analysis. The ACR’s RISE registry, a focus for one of the themed articles, is a great example of Big Data. With widespread participation, the RISE registry has aggregated EHR data generated by more than one third of all U.S. rheumatologists. Volume is the main characteristic of the Big Data in RISE, which now includes information on over 2 million patients seen by rheumatologists and over 20 million encounters. Velocity refers to the speed at which data flows. In the case of RISE, data is uploaded nightly, is processed centrally, and is fed back to a web-based dashboard that displays quality measure performance on an ongoing basis. High velocity data transfer and processing allow RISE to support practice improvement, quality reporting and research. Variety is a key strength of RISE; the registry’s clinical data warehouse includes both structured data (e.g. vital signs, ICD-10 codes, medications, lab results) and unstructured data (clinical notes that undergo text mining and natural language processing), and the registry will soon be linked to outside data sources such as insurer claims. Finally, veracity refers to the validity or accuracy of the data. Like many Big Data sources, RISE data is heterogenous given varied provider documentation patterns and inherent differences in EHR products and their interoperability with sources of laboratory or radiology data. Preparing this data for quality reporting and for research takes significant effort and collaboration between the registry’s staff, data scientists, programmers, statisticians and rheumatologists. As repositories like RISE grow, it is exciting to think about some of the ways that Big Data can shape the future of rheumatology. We outline a few of these below.

Generating real-world evidence

Big Data should play a central role accelerating evidence generation. Despite fast-paced drug development for some rheumatic diseases, traditional research methods such as large (and very expensive) randomized trials have been unable to address many of the important research questions in our field. Large data networks like RISE can generate real-world evidence on not only drug effectiveness, but also on special populations such as the elderly, pregnant women, racial/ethnic minorities, or those with comorbidities, for which clinical trial results are scant. Big Data can provide important insights into practice patterns, as in the article by Curtis et al. in this issue. There is also significant interest in using RISE to accelerate recruitment of patients with less common conditions or phenotypes into clinical trials, and the first demonstration project to test this concept is underway.

Phenotyping

Rheumatic diseases are often characterized by complex and often nuanced phenotypes. Our key cognitive skill as specialists is recognizing the patterns and subtleties of these phenotypes and using this information to guide management. To date, it has been challenging to harness this collective wisdom to advance our understanding of rheumatic diseases and their outcomes. In fact, most research studies still use crude disease phenotypes such as ICD codes or single physician ascertainment based on classification criteria. However, computational methods that allow more granular phenotype extraction from electronic health records are advancing rapidly. For example, in recent work, we applied an artificial intelligence algorithm to recognize patterns of lupus and assign probabilities of disease.1 Work is ongoing to scale such algorithms in repositories like RISE to understand the full spectrum of phenotypes across a population, to track outcomes, and to conduct discovery research.

Prognostic Modeling

Individual risk prediction in rheumatology has been notoriously difficult, and few predictive models have been well-established for rheumatic diseases. For the most part, rheumatologists rely on their longitudinal clinical experience to forecast the future and advise patients about therapy. However, Big Data has potential to significantly accelerate predictive modeling. Given the high-dimensional and heterogeneous nature of EHR data, methods such as deep learning, a branch of artificial intelligence, have potential to improve our ability to predict risk and therefore to prognosticate for our patients. For example, in recent work, we used deep learning to successfully forecast RA outcomes across two health systems.2 Work is ongoing to further develop this work across RISE. Such algorithms create the foundation for developing more personalized prognostic models as well as treatment simulations that could potentially aid clinicians in both counseling patients and in making data-driven treatment decisions.

Precision Medicine

Prognostic models that use large EHR data repositories are one way to bring the potential of precision medicine to the bedside. But such models would likely perform better if they also drew from -omics data. Significant resources are currently being invested in defining molecular phenotypes of disease through collaborations such as the NIAMS Accelerating Medicines Partnership, a topic addressed by a review article in our themed issue by Davison et al. Big Data arising from these innovative projects have potential to inform more personalized diagnosis and treatment approaches in rheumatic diseases. Biological and clinical Big Data can also inform precision medicine approaches to drug safety. Currently, there remains a significant disconnect between safety observed in trials and real-world experience in populations that are older, have more comorbidities, and are more diverse. Developing prognostic models that draw on clinical and biological data to predict risk of adverse events in real world populations is a worthy goal. Big Data has also shown promising results in drug re-purposing and in examining off-label use of drugs. These latter areas are critical in rheumatology given the high number of orphan diseases and the lack of adequate therapeutic options for many patients.

Population and Public Health

While Big Data has potential to deliver precision medicine to individual patients, equally important is using such data to improve the health of populations. Such data will continue to play a critical role in disease surveillance and monitoring outcomes, as illustrated by articles in our themed issue addressing disease burden as well as nationwide osteoarthritis implementation programs. A critical next step is to use Big Data to better target disease prevention as well as resource allocation in ways that improve health. The RISE registry has taken an important step in this direction through its quality dashboard. The dashboard allows rheumatologists to identify gaps in evidence-based practice and to track improvement as they institute quality improvement initiatives.

Patient engagement

Rheumatic diseases are often chronic and self-management strategies can help patients maintain their health. Many patients are interested in using devices and applications to record health items such as their symptoms, diet, exercise, and sleep. In some cases, Big Data from devices has helped identify potential health risks or helped patients with arthritis understand how their pain responds to factors such as the weather. Connecting such personalized data from patients to other Big Data sources like EHR or environmental data to more fully understand the impacts of disease and treatments will help patients with chronic disease management while also helping physicians and researchers advance patient-centered care.

Two additional points are worth considering as we think about Big Data in rheumatology. First, the value of Big Data will be highly dependent on the specificity by which it is collected. For example, administrative data is plagued by faulty coding and EHR data by heavy duplication and non-standardized collection of information. As a community, defining what meaningful data elements we want to collect, independent of billing requirements, and building robust and feasible systems to enable this collection will be important. In addition, we are now entering a period where data are plentiful but the time and expertise to analyze it is limited. Therefore, a key factor to success in this new era will be collaboration to define priorities and prevent redundant efforts.

The papers in our themed issue highlight the diverse contributions of Big Data to advancing rheumatology, and much of this work already has important implications for clinical practice. It is exciting to think about the next levels of data integration for many of the research datasets presented, and the potential for such integrations to help us fully understand and treat our patients in all aspects in which their disease affects them. While there is much work to be done, we feel optimistic that Big Data will allow us to reimagine and improve the diagnosis and management of rheumatic diseases in the years to come.

Footnotes

Disclosures: Dr. Yazdany chairs the Registries and Health IT Committee for the American College of Rheumatology. She has received an investigator-initiated research award from Pfizer and has performed consulting for Eli Lilly and Astra Zeneca.

References

  • 1.Murray SG, Avati A, Schmajuk G, Yazdany J. Automated and flexible identification of complex disease: building a model for systemic lupus erythematosus using noisy labeling. J Am Med Inform Assoc. 2019. January 1;26(1):61–65 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Norgeot B, Glicksberg BS, Trupin L, Lituiev D, Gianfrancesco M, Oskotsky B, Schmajuk G, Yazdany J, Butte AJ. Assessment of a Deep Learning Model Based on Electronic Health Record Data to Forecast Clinical Outcomes in Patients With Rheumatoid Arthritis. JAMA Netw Open. 2019. March 1;2(3). [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES