Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Nov 8.
Published in final edited form as: Circ Cardiovasc Qual Outcomes. 2016 Nov 8;9(6):616–617. doi: 10.1161/CIRCOUTCOMES.116.003366

The Promise of Big Data: Opportunities and Challenges

Harlan M Krumholz 1
PMCID: PMC5459377  NIHMSID: NIHMS856686  PMID: 28263935

Medicine is on the cusp of a remarkable transformative change fueled by the prospect of an acceleration and augmentation in knowledge generation and application.1 Medicine improved as it transitioned from being based on belief to becoming anchored in science. We were limited, however, in our ability to learn from clinical practice, even though natural experiments were occurring every day. Our methods tended to be reductionist, our data crude, our samples constrained, and our inferences speculative. This inability to learn quickly, easily and meaningfully from everyday clinical practice is yielding to new capabilities and opportunities.

The shift is occurring as a result of advances in computational power and mathematical applications in concert with a rapid growth in the availability of data. Even changes in the last 5 years have produced breathtaking capabilities. We now have the computational heft to handle large, complex, high-dimensional data. We have mathematical applications that enable us to avoid the need to reduce the complexity of the data, maximizing the relevance of the data to patients seen in practice. And we have just experienced a remarkable advance in data availability through transformation of medical data and documentation into digital format.

A precision medicine future, with the generation of personalized estimates of risk of outcomes and response to various therapies, paradoxically depends on being able to analyze data from massive numbers of people. Only with such numbers can the heterogeneity of risk and response be revealed – and can each individual be understood in the context of the experience of people like them who have preceded them in the situation they are in. The knowledge of individual differences – and similarities – will provide opportunities for truly informed choices based on information that is more relevant, accurate and meaningful.

To fully leverage this opportunity will require several key steps.

Training

Most researchers are not adequately training for a big data era. The basic principles of epidemiology, study design and biostatistics, which form the core foundation for most clinical researchers, will remain important, but are not sufficient. There are additional areas of importance, including digital data, informatics, and machine learning. Skills in programming will be a core competency to understand the logic of the programs used to extract information.

Application

The production of the new knowledge is not enough to ensure that benefit is derived from it. The application of the knowledge produced from these data sources and methods will need to be optimized to promote benefit and minimize risk. The culture of medicine will need to evolve from a reliance on old methods of risk stratification, for example, to embrace computer-driven estimates that are linked to methods of producing information about net benefits of different strategies.

Replication

A major challenge of data-driven approaches is to ensure that the findings can be replicated. To depend on the estimates will require confidence that, time and again, they can be reproduced in different datasets, particularly for different subpopulations. The possibility of false positive inferences – or estimates of risk that are not stable across datasets – is substantial.

Dissemination

The dissemination and communication of the knowledge and methods merits consideration. Our journals are ill-prepared to vet advanced analytics applied to massive data resources. In many cases, the approach may be difficult to evaluate by reviewers and editors. Of note, Circulation: Cardiovascular Quality and Outcomes is committed to finding reviewers, including technical experts, to evaluate novel technical approaches. Moreover, the result may be better suited to a dynamic presentation, rather than the static Portable Document Format (PDF). In addition, the visualizations may not lend themselves to the two-dimensional page. The results of research in this area may be better disseminated as a product, with a paper that describes methods. There will be a need to develop platforms that enable this type of dissemination as part of a scientific contribution. Transparency is also important, even as there may be issues of proprietary algorithms and approaches.

Interoperability

Although data increasingly exist in digital format and provide remarkable opportunities for knowledge generation, obstacles still abound in coalescing data from different sources. Although there are common data standards for many data types, adherence to the standards vary. Moreover, data holders, independent of concerns for privacy, have often seen data as assets to be hoarded rather than shared, even among those participating in research consortia and public health initiatives. There will be a need for continuing efforts to combine data, with attention to privacy and security, in order to fulfill their promise to produce knowledge that will help the next person who needs assistance.

Funding

Funders are not accustomed to applications that focus on big data as a means of knowledge generation. These projects require infrastructure to make data available, method development to produce tools customized to the particular challenges in medicine, and support for empiric work that may not be hypothesis driven, but is just as capable of producing valuable insights. In addition, funders need reviewers that transcend traditional medical research paradigms if this work is to get the support it needs.

Collaboration

Finally, and perhaps most importantly, this work will require new collaboration for clinical research. Advances in medicine are likely to derive from work with mathematicians, statisticians and computer scientists. Such teams will bring together data scientists and clinical scientists. The recognition that medicine is an information science will begin to spread. These teams will need to invest in learning each other’s culture, language and capabilities in order to forge collaborative activities where the whole is truly larger than the sum of its parts.

Medicine already lags behind so many fields in its use of digital data. Our research and research funding is largely anchored in a model that has been stable for decades. Our meetings and journals largely remain bastions of traditional research and their presentation. Our practices on the wards are moving slowing toward the adoption of truly learning health system approaches.

And yet, we are at one of those amazing junctures in medicine that will define a new future as profoundly different from today as the pre-microbiology era was from what preceded it. We may have the chance to see what was formerly undecipherable. But the hazards of change are also great and our ability to tackle these challenges is far from guaranteed. We will need ways to prove the value of these new approaches and tools – and they should be subjected to the same scrutiny as any new diagnostic or therapeutic intervention.

In this issue, we feature an array of insightful articles that address different facets and applications with respect to big data methods.210 The studies are not just about data size, but about approaching the data in new ways. The publication of these contributions signals a receptivity to research done with massive data analyzed in creative ways with important implications for practice and policy. We are all hopeful that these new directions will yield benefits for our patients and the public, and inspire many of our readers to consider similar approaches in their own work. Be assured, research and clinical care are about to join the digital, mobile, mathematical, personalized revolution. We all need to ensure that the changes produce progress for people and society.

Acknowledgments

I would like to thank Sharon-Lise Normand for her partnership in co-editing this issue.

Footnotes

Conflict of Interest Disclosures: Dr. Krumholz is a recipient of research agreements from Medtronic and from Johnson & Johnson (Janssen), through Yale University, to develop methods of clinical trial data sharing; is the recipient of a grant from the Food and Drug Administration and Medtronic to develop methods for post-market surveillance of medical devices; works under contract with the Centers for Medicare & Medicaid Services to develop and maintain performance measures; chairs a cardiac scientific advisory board for UnitedHealth; and is the founder of Hugo, a personal health information platform.

References

  • 1.Krumholz HM. Big data and new knowledge in medicine: the thinking, training, and tools needed for a learning health system. Circ Cardiovasc Qual Outcomes. 2016;9 doi: 10.1377/hlthaff.2014.0053. xxx-xxx. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Ng K, Steinhubl SR, deFilippi C, Dey S, Stewart WF. Early detection of heart failure using electronic health records: practical implications for time before diagnosis, data diversity, data quantity and data density. Circ Cardiovasc Qual Outcomes. 2016;9 doi: 10.1161/CIRCOUTCOMES.116.002797. xxx-xxx. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Deo R, Nallamothu BK. Learning about Machine Learning: The Promise and Pitfalls of Big Data and the Electronic Health Record. Circ Cardiovasc Qual Outcomes. 2016;9 doi: 10.1161/CIRCOUTCOMES.116.003308. xxx-xxx. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Bhavnani SP, Muñoz D, Bagai A. Data science in healthcare: implications for early career investigators. Circ Cardiovasc Qual Outcomes. 2016;9 doi: 10.1161/CIRCOUTCOMES.116.003081. xxx-xxx. [DOI] [PubMed] [Google Scholar]
  • 5.Spertus JV, Normand S-LT, Wolf R, Cioffi M, Lovett A, Rose S. Assessing hospital performance following percutaneous coronary intervention using big data. Circ Cardiovasc Qual Outcomes. 2016;9 doi: 10.1161/CIRCOUTCOMES.116.002826. xxx-xxx. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Carlson MB, Scholtens DM, Frailey CN, Gravenor SJ, Powell ES, Wang AY, Kricke GS, Ahmad FS, Mutharasan RK, Soulakis ND. Characterizing teamwork in cardiovascular care outcomes: a network analytics approach. Circ Cardiovasc Qual Outcomes. 2016;9 doi: 10.1161/CIRCOUTCOMES.116.003041. xxx-xxx. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Groeneveld PW, Rumsfeld JS. Can big data fulfill its promise? Circ Cardiovasc Qual Outcomes. 2016;9 doi: 10.1161/CIRCOUTCOMES.116.003097. xxx-xxx. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Hansen PW, Clemmensen L, Sehested TSG, Fosbøl L, Torp-Pedersen C, Køber L, Gislason GH, Andersson C. Identifying drug-drug interactions by data mining – a pilot study of warfarin associated drug interactions. Circ Cardiovasc Qual Outcomes. 2016;9 doi: 10.1161/CIRCOUTCOMES.116.003055. xxx-xxx. [DOI] [PubMed] [Google Scholar]
  • 9.Hollingsworth JM, Funk RJ, Garrison SA, Owen-Smith J, Kaufman SA, Pagani FD, Nallamothu BK. Association between physician teamwork and health system outcomes following coronary artery bypass grafting. Circ Cardiovasc Qual Outcomes. 2016;9 doi: 10.1161/CIRCOUTCOMES.116.002714. xxx-xxx. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Mortazavi BJ, Downing NS, Bucholz EM, Dharmarajan K, Manhapra A, Li S-X, Negahban SN, Krumholz HM. Analysis of machine learning techniques for heart failure readmissions. Circ Cardiovasc Qual Outcomes. 2016;9 doi: 10.1161/CIRCOUTCOMES.116.003039. xxx-xxx. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Montori VM. Big Science: research collaboration for evidence-based care. Circ Cardiovasc Qual Outcomes. 2016;9 doi: 10.1161/CIRCOUTCOMES.116.003358. xxx-xxx. [DOI] [PubMed] [Google Scholar]

RESOURCES