Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 May 13.
Published in final edited form as: Heart Rhythm. 2020 May;17(5 Pt B):840–841. doi: 10.1016/j.hrthm.2020.02.014

Emergent design principles for prediction algorithms in health care

Kevin Wheelock *, Joyce M Lee , Hamid Ghanbari *
PMCID: PMC9101585  NIHMSID: NIHMS1695499  PMID: 32354447

There has been a rise in the number of prediction algorithms developed for health care using machine learning (ML) technology. However, aside from a few exceptions, these novel prediction algorithms have not made it to the frontlines of health care.1

This slow rate of adoption is not unusual in health care, which has a long tradition of skepticism about novel technology. This frustrates techno-optimists who are accustomed to “disrupting” existing legacy systems. However, there are good reasons for this skepticism. The history of novel technology in health care is one of great promises to “transform care,” followed by failure to deliver meaningful outcomes for patients.

There is an existing system in place for evaluation of new health care technologies. A core principle of this system is the general unwillingness to trust anything until it has gone through rigorous clinical testing. Widespread adoption occurs after expert panels have examined the available evidence and made summary recommendations. This method is imperfect but protects patients against harm posed by new technologies.

ML technologies make prediction cheap and widely available, but they are not fundamentally different from any other “rudimentary” prediction algorithms.2 “Rudimentary” algorithms go through cycles of testing, iteration, and improvement; are evaluated by clinical trials; and if successful are incorporated into expert guidelines. Only after undergoing this process are they deployed in the real world. Their performance can improve over time as they evolve via interaction with the health care system. Most prediction algorithms do not survive this process.

The few “rudimentary” algorithms that have survived and have been widely adopted contain emergent design principles we can learn from. The CHA2DS2-VASc score is an example of such an algorithm. We believe that studying the emergent properties of CHA2DS2-VASc reveals design principles that can be applied to ML algorithms. These principles include rigorous scientific foundation, purpose beyond prediction, transparency, efficiency, and fairness.

Rigorous scientific foundation

A successful prediction algorithm should have rigorous, peer-reviewed scientific evidence that establishes its safety, validity, reproducibility, usability, and reliability.

The CHA2DS2-VASc score and its forerunner, the CHADS2 score, have been studied for more than 2 decades. The open source nature of the algorithm allows any investigator to evaluate its effectiveness. Numerous articles have examined the performance of the risk score and its generalizability.3 This extensive validation and strong expert consensus make clinicians comfortable about trusting the algorithm in a variety of clinical situations. ML algorithms will need to go through a similar degree of validation before they become well integrated into clinical practice.

Purpose beyond prediction

The current CHA2DS2-VASc score arose from the need to evaluate the risk of stroke as interventions became available to modify that risk. The ultimate purpose of the score is not just to make a prediction but to help clinicians with clinical decision-making. It is important to show that improved prediction and added complexity translate into improved clinical outcomes. This requires clinical trials to evaluate the risks and benefits associated with the intervention recommended by the prediction algorithim.4 For CHA2DS2-VASc, the recommended intervention is starting anticoagulation. Numerous clinical trials have demonstrated that the benefits of using these medications outweigh the harms when the correct patient is selected based on the CHA2DS2-VASc score.3 If a prediction algorithm is to be widely adopted, it must demonstrate that its use results in better clinical outcomes.4

Transparency

Clinicians need to understand the complexities of any algorithm before they trust it. Factors that influence the decisions made by an algorithm must be visible to the people who use, regulate, and are affected by it.

The CHA2DS2-VASc score provides a great example of how transparency can facilitate adoption. The algorithm is simple, the components are visible to the people who use and are affected by it, and what the user should do given a specific risk score is clear. The results are easy to share with a patient and allow for shared decision-making. In short, the algorithm is designed to facilitate trust.

Modern ML algorithms, particularly neural networks, are often referred to as “black boxes.” Use of these technologies raises significant concerns in health care.5 Deep learning algorithms in particular have limited transparency, yet they demonstrate great promise and are beginning to outperform humans in many tasks.6 We need to find ways to understand and explain the performance of these algorithms to better facilitate their adoption by the medical community.

The desired level of transparency will vary depending on the user and the situation. Therefore, algorithm developers must engage stakeholders from the beginning to see what degree of transparency is required and build their algorithms accordingly.

Efficiency

In almost all aspects of our lives, digital tools have created material and time abundance. This is not the case in health care, where electronic medical records rank as one of the leading causes of burnout for clinicians.7 Physicians are resistant to adopt new tools that may tax their most scarce resource: time.

Prediction algorithms may improve efficiency within the clinical setting by automating computationally intensive tasks that are difficult or impossible for humans to perform. However, efficiency is determined by the combination of all the steps required to perform a task. This includes prediction but also the time it takes to act on the prediction.2 The CHA2DS2-VASc score is a highly efficient algorithm. Its simplicity allows clinicians to quickly apply it and act on the results. The system is already set up to utilize the “output” of CHA2DS2-VASc.

ML algorithms are computationally intensive, which can lead to a decrease in efficiency. Creators of prediction algorithms must consider how an algorithm will be incorporated into clinical workflow from inception. Efforts should be made to design algorithms that operate efficiently within existing systems.

Fairness

The “No Free Lunch” theorem states that if an algorithm performs particularly well on one class of problems, it must perform worse on average over the remaining problems.8 The optimization priorities of an algorithm result in outcomes that may be viewed as unfair depending on societal norms or vantage points. ML algorithms that undergo widespread adoption in health care should be designed to suit the values of the users who will depend on them. The fairness of the CHA2DS2-VASc score is ensured by assessing its effectiveness and tuning its performance in vulnerable populations. An example of this is the evolution from CHADS2 to CHA2DS2-VASc score. It incorporated female sex into the algorithm as an additional risk factor after it was shown to improve its performance.9

Algorithm developers use metrics to quantify fairness and tune algorithms to achieve different fairness definitions.10 This allows them to quantify fairness by expressing it as an optimization problem. However, quantitative techniques can only partially ensure fairness. We build prediction algorithms using data drawn from a health care system rife with inherent inequalities. Ensuring fairness will require transparency, deep understanding of the domain, and the capacity and willingness to iterate and experiment.

Conclusion

Developing ML algorithms without considering how they will be utilized will decrease the chances of adoption despite improved performance relative to existing systems. Algorithm designers will be served well to study the successful “rudimentary” algorithms that currently are being used in health care. These algorithms contain emergent principles that have been selected through iteration and experimentation and can be used as a foundation for the development of future ML algorithms.

Acknowledgments

Dr Ghanbari has received research support from Toyota, Biotronik, Medtronic, Abbott, and St Jude Corp; and has performed consulting for Preventice Solutions, Inc. All other authors have reported that they have no conflicts relevant to the contents of this paper to disclose.

References

  • 1.Wiens J, Saria S, Sendak M, et al. Do no harm: a roadmap for responsible machine learning for health care. Nat Med 2019;25:1337–1340. [DOI] [PubMed] [Google Scholar]
  • 2.Agarwal A, Gans J, Goldfarb A. Prediction Machines: The Simple Economics of Artificial Intelligence. Boston: Harvard Business Review Press; 2018. [Google Scholar]
  • 3.Steffel J, Verhamme P, Potpara TS, et al. The 2018 European Heart Rhythm Association practical guide on the use of non-vitamin K antagonist oral anticoagulants in patients with atrial fibrillation. Eur Heart J 2018;39:1330–1393. [DOI] [PubMed] [Google Scholar]
  • 4.Emanuel EJ, Wachter RM. Artificial intelligence in health care: will the value match the hype? JAMA 2019;321:2281–2282. [DOI] [PubMed] [Google Scholar]
  • 5.Price WN. Big data and black-box medical algorithms. Sci Transl Med 2018; 10:eaao5333. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019;25:44–56. [DOI] [PubMed] [Google Scholar]
  • 7.Shanafelt TD, Dyrbye LN, Sinsky C, et al. Relationship between clerical burden and characteristics of the electronic environment with physician burnout and professional satisfaction. Mayo Clin Proc 2016;91:836–848. [DOI] [PubMed] [Google Scholar]
  • 8.Wolpert DH, Macready WG. No free lunch theorems for optimization. IEEE Trans Evol Comput 1997;1:67–82. [Google Scholar]
  • 9.January CT, Wann LS, Alpert JS, et al. 2014 AHA/ACC/HRS guideline for the management of patients with atrial fibrillation: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines and the Heart Rhythm Society. J Am Coll Cardiol 2014;64:e1–e76. [DOI] [PubMed] [Google Scholar]
  • 10.Verma S, Rubin J. Fairness definitions explained. In: FairWare ʹ18: Proceedings of the International Workshop on Software Fairness. New York: Association for Computing Machinery; 2018. p. 1–7. [Google Scholar]

RESOURCES