Abstract
There is little debate about the importance of ethics in health care, and clearly defined rules, regulations, and oaths help ensure patients’ trust in the care they receive. However, standards are not as well established for the data professions within health care, even though the responsibility to treat patients in an ethical way extends to the data collected about them. Increasingly, data scientists, analysts, and engineers are becoming fiduciarily responsible for patient safety, treatment, and outcomes, and will require training and tools to meet this responsibility. We developed a data ethics checklist that enables users to consider the possible ethical issues that arise from the development and use of data products. The combination of ethics training for data professionals, a data ethics checklist as part of project management, and a data ethics committee holds potential for providing a framework to initiate dialogues about data ethics and can serve as an ethical touchstone for rapid use within typical analytic workflows, and we recommend the use of this or equivalent tools in deploying new data products in hospitals.
Keywords: data ethics, checklist, ethical analysis, business ethics, hospital ethics
INTRODUCTION
Ethics, charity, and egalitarianism form the foundational framework of healthcare delivery, with roots going back to antiquity.1 Providing care to the sick, the elderly, and the poor is a foundational moral practice in nearly every culture; despite many modern disagreements about how to deliver such care, the moral imperative to treat those who are sick remains persistent across the spectrum of religious, cultural, and political traditions. Modernity has presented increasingly sophisticated challenges to this moral framework, even among those devoted to advancing the efficiency, universality, and effectiveness of healthcare delivery.
The ethical bedrock of beneficent care is arguably murkiest in the use of patients’ data to develop software designed to support, improve, or predict the needs and outcomes of care.2,3 In many ways, healthcare data science resembles medical research, as it aims to develop new insights and tools that can contribute to generalizable knowledge or improve treatment. However, the ethical demands of hospital-based data science are both novel and sui generis, and the robust research ethical framework of the Belmont Report and institutional review board (IRB) oversight are foundational and informative, yet insufficient.4,5
There is growing recognition that data can present a host of ethical challenges.6–8 Numerous instances of algorithmic bias exist in which a seemingly objective data product turned out to exhibit biases based on gender, race, ethnicity, nationality, payor mix, or other sensitive features. These biases can persist even if demographic information is not included in the model, due to various kinds of confounding.9,10 Even in the best of cases, if there are imbalances present in the data it is impossible for models to simultaneously maximize all definitions of fairness.11 Further, the potential for commercializing data products is new12 and may force difficult decisions and tradeoffs between patient protection and institutional (financial) benefit. Broadly speaking, despite localized attempts to instill ethical values at individual healthcare organizations, few data professionals receive formal applied ethics training, nor do they have time and support to conduct large-scale ethical analyses for every data project. Given those challenges, a simple tool, modeled after the success of quick checklist-style methods in medicine and human-centered computing,13,14 might help infuse ethical practice into everyday analytics work.
In order to address the need for systematic ethical consideration of data projects at Seattle Children’s Hospital—a 361-bed tertiary pediatric hospital and specialty outpatient clinic system serving the U.S. states of Washington, Montana, Idaho, and Alaska—we formed a Data Ethics team and tasked that team to promote ethics awareness and thinking. This group was dedicated to helping healthcare analytics teams place their data science on an ethical foundation that is grounded in a mission to promote care for persons at their most vulnerable, while protecting their rights, dignity, and identity.
BACKGROUND AND CHECKLIST DEVELOPMENT
Beginning in 2016, the Data Ethics Group identified concerns in the predictive analytics community at Seattle Children’s Hospital that traditional oversight mechanisms—IRB and regulatory controls—were insufficient to address. In response, we formed a Data Ethics team to serve as a sounding board and provide guidance and support across the lifecycle of data science projects, and sought direction on how to assess the ethical dimensions of such projects, particularly for data professionals with little applied ethics education. High-level ethics principles govern behavior as an employee or a professional society member but are not specific enough for project-level guidance. In contrast, existing project-level tools for “fairness checklists” or “ethical impact assessment”14,15 are extremely detailed, time-consuming to complete, and presuppose an understanding of applied ethical reasoning. We found no simple tools that would allow data professionals to quickly yet thoughtfully evaluate ethical concerns related to healthcare data projects. The simplicity and brevity of such a tool was considered paramount; we did not want to instill a perceived burden or barrier to project initiation. Making adoption of ethical principles easy required providing ethical tools that are neither taxing nor tedious.
We therefore chose to develop a brief evaluation checklist (Supplementary Appendix) designed to provide insight into whether a specific data project might likely involve significant ethical implications, and could be used in typical analytics workflows (eg, agile project management, scrum, etc.). The checklist concentrates on issues of privacy, consent, bias, and transparency, and consists of four pairs of multiple-choice questions. One pair of questions focuses on overall assessment for ethical risk; 3 other sections respectively reflect:
Potential Privacy Issues: Patients and their families are largely unaware that their data go beyond “mere” records, and can be aggregated across multiple sources and used to predict and infer seemingly unrelated information. These questions aim to help determine whether those inferences may be problematic.
Potential Bias and Equity Issues: The analyses of data may have differential impact on some groups; perhaps through more precise and targeted group-specific models. Biases in data collection and measurement can thus ramify into biased or inequitable data products. For example, if data are collected from clinics that serve high-income patients, one must ask whether the resulting model can be generalized to low-income populations, or whether processes developed using data from higher income populations might be used to create systems that further disadvantage those in low-income populations. Race, ethnicity, language, and payor type are all common indicators of underprivileged conditions that need to be considered as confounders. Data analysts should also not assume that patients, families, and other employees have the same values and interests as they do. For example, an analyst might believe that increased testing is a positive (as it can speed detection and diagnosis), but a patient might find it to be intrusive, costly, and inconvenient.
Potential Transparency and Measurement Issues: Patients are often interested (or have a moral right) to know why some decision, prediction, or judgment was made. We should be able to determine how models generate output. In addition, some analytic methods are potentially highly influenced by a few datapoints, and so can yield fragile models that may not generalize as data are added or revised. These may perform well in the short run but fail in the long term. Of course, there might nonetheless be good reasons to use an opaque or somewhat-fragile model, but that choice should be made in recognition of potential ethical and practical drawbacks.
This rubric allows each data project to be scored simply by the data scientists, analysts, and engineers, and self-assess whether questions relating to transparency, consent, bias, or privacy are likely to cause concern for patients or staff. In the event that they perceive the need, they can reach out to the Data Ethics committee to serve as information fiduciaries, helping them design and implement mitigation strategies. These can include data blinding or automated alerting for equity biases, or support designing monitoring apparatuses to ensure ongoing ethical compliance. We additionally recommend that such tools all be evaluated for interrater reliability and that any appropriate factor analysis or psychometric analysis be performed to establish confidence and validate the checklist.
A PROPOSAL FOR DATA ETHICS BEST PRACTICES
The use of the checklist serves multiple purposes. First, it provides a reminder that ethical considerations exist and must be considered even for internal projects that may not directly impact patients or put them at obvious risk. Second, it provides a standardized means of comparing ethics, risk, and associated factors across distinct projects, and supports development of a knowledge base to guide future work. Third, it empowers peers to provide guidance, recommendations, and mitigation strategies should a project be determined to need ethics oversight. Fourth, it can allow for transparency and visibility for projects and their ethical implications. Fifth, it acts as a standardized guide for conversations with multidisciplinary teams to evaluate adoption of external tools as well as consider internal projects. And finally, it provides a prompt for projects to be escalated to formal ethical or legal oversight such as the IRB or General Counsel if the need for such action is apparent.
We inhabit a data environment in which technology outpaces our ability to develop cogent, universal, and practical ethical frameworks for emerging capability. It is critical that we enculturate reflection, accountability, and mindfulness of the effects and consequences data products may have on individuals and communities. Increasingly, data professionals are becoming fiduciarily responsible for patient safety, treatment, and outcomes, and will require training and support to meet this responsibility. To that end, we propose the following:
All data personnel undergo specialized ethical training devoted to understanding the risks and potential harms inherent in data product development.
All new product development is accompanied by standardized ethical review, such as the data ethics checklist demonstrated here.
Legacy data products be evaluated for ethical implications upon review, upgrade, or revision.
Specific, project-relevant ethical guidance is provided when rating development of new data products, and encourages personnel to consider their own implicit biases and the potential for algorithmic bias (ie, the potential for disparate outcomes).
Hospitals empanel Data Ethics Committees that function as information fiduciaries and provide support for ethical questions. These committees can escalate issues to IRBs, Bioethics teams, or other formal ethical oversight systems.
While much literature exists regarding ethical concerns about health data in research, technology, commercialization of healthcare delivery, and machine learning or artificial intelligence, we found no public work that could simply and practically describe the ethical issues associated with use of patient or employee data within institutions. Yet, it is crucial that internal data science projects continuously consider the ethics of their data governance and usage, as the consequences of implementation of predictive algorithms may be significant, serious, and irremediable. We believe that a simple checklist combined with basic training and the empanelment of a data fiduciary committee could help mitigate many of these concerns.
FUNDING
All authors report no conflict of interest or competing interests. All support for this effort was provided through programmatic prioritization by Seattle Children’s Enterprise Analytics and an Andrew Carnegie Fellowship (Carnegie Corporation of New York) to DDanks. The statements made and views expressed are solely the responsibility of the authors.
AUTHOR CONTRIBUTIONS
DDanks designed the original checklist; all authors contributed to revisions of the checklist. All authors drafted, revised, and approved the final manuscript.
SUPPLEMENTARY MATERIAL
The Data Ethics Checklist is available as Supplementary material at Journal of the American Medical Informatics Association online.
DATA AVAILABILITY STATEMENT
No new data were generated or analysed in support of this research.
Supplementary Material
ACKNOWLEDGMENTS
We would like to thank the Treuman Katz Center for Pediatric Bioethics at Seattle Children’s, particularly Seema Shah, JD, for providing an initial consult over the need for ethics in data management and analytics in hospital settings, and to all of the data professionals at Seattle Children’s who participated in the testing and evaluation of the checklist. This project was approved by the Institutional Review Board of Seattle Children’s Hospital (#00001849).
CONFLICT OF INTEREST STATEMENT
None declared.
REFERENCES
- 1. Jonsen AR. A Short History of Medical Ethics. New York, NY: Oxford University Press; 2000. [Google Scholar]
- 2. Copeland R. Google's “Project Nightingale” gathers personal health data on millions of Americans. Wall Street J 2019. https://www.wsj.com/articles/google-s-secret-project-nightingale-gathers-personal-health-data-on-millions-of-americans-11573496790 Accessed May 18, 2020. [Google Scholar]
- 3. Schneble CO, Elger BS, Shaw DM.. Google's Project Nightingale highlights the necessity of data science ethics review. EMBO Mol Med 2020; 12 (3): e12053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Metcalf J, Crawford K.. Where are human subjects in big data research? The emerging ethics divide. Big Data Soc 2016; 3 (1): 205395171665021–14. [Google Scholar]
- 5. Bozeman B, Hirsch P.. Science ethics as a bureaucratic problem: IRBs, rules, and failures of control. Policy Sci 2006; 38 (4): 269–91. [Google Scholar]
- 6. Stahl BC, Rainey S, Harris E, Fothergill BT.. The role of ethics in data governance of large neuro-ICT projects. J Am Med Inform Assoc 2018; 25 (8): 1099–107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Kim MO, Coiera E, Magrabi F.. Problems with health information technology and their effects on care delivery and patient outcomes: a systematic review. J Am Med Inform Assoc 2017; 24 (2): 246–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Veinot TC, Mitchell H, Ancker JS.. Good intentions are not enough: how informatics interventions can worsen inequality. J Am Med Inform Assoc 2018; 25 (8): 1080–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Benn EK, Goldfeld KS.. Translating context to causality in cardiovascular disparities research. Health Psychol 2016; 35 (4): 403–6. [DOI] [PubMed] [Google Scholar]
- 10. Vyas DA, Eisenstein LG, Jones DS.. Hidden in plain sight—reconsidering the use of race correction in clinical algorithms. N Engl J Med 2020; 383 (9): 874–82. [DOI] [PubMed] [Google Scholar]
- 11. Kleinberg J, Mullainathan S, Raghavan M.. Inherent trade-offs in the fair determination of risk scores In Papadimitrou CH, ed. 8th Innovations in Theoretical Computer Science Conference, Leibniz International Proceedings in Informatics. Saarbrücken/Wadern, Germany: Lipics, 2017; 67 (43): 1–23. [Google Scholar]
- 12. Shortliffe EH. AMIA president's column: AMIA's corporate relations activities. J Am Med Inform Assoc 2011; 18 (5): 727–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Gawande A. The Checklist Manifesto: How to Get Things Right. New York, NY: Metropolitan Books; 2009. [Google Scholar]
- 14. Madaio MA, Stark L, Wortman Vaughan J, Wallach H. Co-designing checklists to understand organizational challenges and opportunities around fairness in AI. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems; 2020: 1–14.
- 15. Wright D. A framework for the ethical impact assessment of information technology. Ethics Inf Technol 2011; 13 (3): 199–226. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
No new data were generated or analysed in support of this research.