ABSTRACT
Biomedical research is increasingly capitalizing on an array of data to illuminate the interplay between “omics,” lifestyle, and health. Leveraging this information presents opportunities to advance knowledge but also poses risks to research participants. In interviews with thought leaders, we asked which data type associated with a hypothetical precision medicine research endeavor was riskiest: 42% chose ongoing access to electronic health records, 17% chose genomic analyses of biospecimens, and 15% chose streaming data from mobile devices. Other responses included “It depends” (15%), the three types are equally risky (8%), and the combination of data types together is riskiest (3%). When asked to consider the hypothetical study overall, 60% rated the likelihood of the risks materializing as low, but 20% rated the potential consequences as severe. These results have implications for study design and informed consent, including placing appropriate emphasis on the risks and protections for the full range of data.
Keywords: research ethics, informed consent, genomic research, electronic health records, big data
Biomedical research is increasingly capitalizing on an array of data to illuminate the complex interplay between “omics,” lifestyle, and health outcomes. 1 Collecting, generating, and leveraging these rich sources of information present opportunities to advance scientific knowledge and improve health, but these efforts also pose risks to research participants. We asked about these risks in our qualitative research with a diverse group of nationally recognized thought leaders. They identified four broad categories of risk in a hypothetical large‐scale precision medicine research endeavor: unintended access to identifying information, permitted but potentially unwanted use of information, risks based on the nature of genomic information, and risks arising from longitudinal study designs 2 —with attendant prospects for physical, dignitary, group, economic, psychological, and legal harms. 3
To understand more about the specific sources of these risks and harms, we also asked these thought leaders to identify which they perceived to be the riskiest aspect of the endeavor: genomic analyses of biospecimens, ongoing access to electronic health record (EHR) data, or streaming data from mobile devices. Additionally, we asked them to rate the overall likelihood and severity of harm to participants, given this combination of data.
STUDY METHODS
Methodologic details are available elsewhere. 4 In addition, table 1, box A, the appendix, and the figures are available online (see the “Supporting Information” section below). We conducted in‐depth interviews with distinguished experts (n = 60) representing a range of perspectives, including the following:
ELSI research (“ELSI” for illustrative quotations): scholars who study ethical, legal, and social issues in genome science;
ethics (“Ethics”): e.g., directors of centers for bioethics;
federal government (“Government”): individuals in relevant positions in the federal government;
genome research (“Research”): bench science and medical genomics researchers;
health law (“Law”): e.g., directors of centers for health law;
historically disadvantaged populations (“Historically Disadvantaged”): scholars who study issues related to historically disadvantaged populations;
human subjects protections (“Human Subjects”): e.g., leaders of national organizations related to human subjects protections;
informatics (“Informatics”): bioinformatics and clinical and medical informatics experts; and
participant‐centric approaches (“Participant‐centric”): leaders in participant‐centric approaches to research.
We used stratified purposive sampling to interview at least six thought leaders per group, the minimum expected to reach saturation. 5 Interviewees were identified based on leadership positions in prominent organizations, institutions, and studies across the United States; authorship of seminal papers; and nominated expert sampling. 6
Our semistructured interview guide centered around a hypothetical “Million American Study” (see box A). Although the hypothetical study has similarities to the National Institutes of Health's All of Us Research Program, which is collecting biospecimens and health data from up to one million Americans, 7 it was not designed to be identical. Interview topics focused on potential benefits, risks, and harms, confidentiality protections, and informed consent. The final interview guide (available upon request), after refinements based on pilot testing, consisted of 19 questions. The Duke University Health System and the Vanderbilt University Medical Center Institutional Review Boards deemed this research exempt under 45 C.F.R. 46.101(b)(2).
The interviews were conducted by telephone between September 2015 and July 2016 by three members of the research team. At the beginning of each interview, we reviewed a study information sheet and obtained the participant's verbal agreement to participate and for audio recording. Interviews ranged in length from 30 to 120 minutes, averaging approximately one hour. Participants were offered $100 compensation for their time.
Answers to closed‐ended questions were entered in Microsoft Excel, checked for accuracy by multiple team members, and analyzed descriptively. Professionally transcribed interviews were uploaded into NVivo 11 for coding and analysis using standard iterative processes. 8 Narrative segments presented here (along with participant identification) are exemplary of frequently mentioned ideas; for additional examples, see the appendix.
STUDY RESULTS
When asked which data type collected or generated for the hypothetical Million American Study was riskiest, 42% of the thought leaders chose EHR data, 17% chose genomic data, and 15% chose streaming data from mobile devices (see figure 1). The remainder gave other responses, including that which type was riskiest depended on several factors (15%), that the three types were equally risky (8%), and that the combination of data types together was the greatest source of risk (3%).
EHR data as most risky
Among thought leaders who identified EHR data as riskiest, most pointed to the highly personal and potentially sensitive nature of the information. “That's where I've disclosed lots of personal information to the clinician—and it's been recorded in ways that, if they're revealed outside a confidential relationship, could be embarrassing or harmful,” explained one (1, Human Subjects). In particular, these thought leaders emphasized the “actual” nature of clinical data—in contrast to, for example, the probabilistic nature of much genomic information. One said, “The data that's in a health record is very proximal to the real world; it is what's actually happened in the past or is going on right now. So in that way, it's not speculative, it's not about likelihood. It's about the actual present health of an individual. I think that makes its impact a lot higher compared with genomic information” (42, Ethics).
Many expressed concerns about identifiability, given the comprehensiveness of EHR data. Several described possible harms if such data were misused, ranging from embarrassment to stigmatization and discrimination. A few noted that the longitudinal study design and ongoing access to EHR data magnified the risk, with one stating, “You don't know what's going to happen to yourself, or what those clinical data are going to look like, in the future” (45, Participant‐centric).
Despite having identified EHR data as the riskiest of the three data types, most of these interviewees still characterized the associated risk as low or moderate (see figure 1)—often anticipating heightened security measures to reduce the probability of unintended access. “The fact that it's the most sensitive information means that we probably have more safeguards built up around EHR data than virtually any other data type,” one commented (58, Research).
Genomic data as most risky
Among thought leaders who identified genomic data as riskiest, unanticipated use was a common theme. This included nonresearch use by law enforcement and insurance companies, as well as research uses that some participants might find morally objectionable.
Interviewees also commonly referred to specific attributes of genomic data, including its status as a unique identifier and implications for family members. Genomic data were also described as new to most people; thus, generating genomic data for research raised novel risks not otherwise encountered. One interviewee noted, “The electronic medical record data is not being collected for the purpose of this study; it already exists. So any risks associated with it probably exist to a large extent outside the study. But the genomic data is something that most Americans are not yet having collected for routine care” (57, Informatics).
A few interviewees discussed risks associated with offering individual genomic research results, including uncertainties associated with the information, and the lack of resources and expertise needed to return results responsibly. “In the research world,” one asserted, “we do what is necessary, but we don't necessarily do what is sufficient to care for people…. So I think things would go back to individuals that they do not know how to process and that have large implications for things downstream, whether it's reproductive choices or other things that influence people's own self‐identities” (5, Research).
Notably, several interviewees who did not choose genomic data as riskiest anticipated that many others might. “I think many people would view the genomic risk as the greatest, because there is still a great deal of genetic exceptionalism,” stated one thought leader (12, Law). Among interviewees who did choose genomic data as riskiest, half characterized the associated risk as high (see figure 1). The remainder pointed to robust security and available legal protections as reasons that the risk was low or moderate.
Mobile device data as most risky
Thought leaders who identified streaming data from mobile devices as riskiest commonly discussed the volume and granularity of the data—especially with regard to tracking movements and activities. One pointed out that “[i]f you use mobile health devices to the fullest possible extent … they contain a lot of information about you and what
Despite having identified EHR data as the riskiest of the three data types, most interviewees characterized the associated risk as low or moderate—often anticipating heightened security measures to reduce the probability of unintended access.
you do and where you go, much more than the genome” (38, Government).
Many voiced privacy and identifiability concerns due to security vulnerabilities in the transmission of data. These kinds of perceptions prompted some to predict unwanted use by law enforcement, the government, and malicious actors in general. One said, “I think real‐time monitoring of lifestyle and behavioral information—you know, you don't have to be a black‐helicopter person to start to think, yikes…. You're talking about a very intrusive kind of thing there. You're talking about knowing where people are in a real‐time way; what they're doing. That's a big deal” (44, Research).
Nearly half of interviewees who chose streaming data as riskiest characterized the associated risk as high (see figure 1). The remainder often highlighted the ubiquity of mobile devices and everyday disclosure of lifestyle information via social media as reasons that the risk was low or moderate. “In an era where people are posting what they eat on Facebook and Twitter messages about who they went out to the bars with at night, for the most part this kind of information—although I ranked it as riskier or potentially more harmful or concerning than [EHR and genomic data], I still don't think it's all that invasive or risky,” an interviewee clarified (26, Human Subjects).
Most risky data: Other responses
Among interviewees who did not identify a single data type as riskiest, many said their answer would depend on factors associated with the study design, including the robustness of data protections and access controls, as well as factors associated with the individual. For example, one interviewee, citing religious and cultural considerations commented,
It's hard to say which is more risky. It really depends on who you are asking, what their values are in terms of the type of data they're giving, and where they are socially positioned in society. And just to that last point, there are some groups [for which] the idea that it might be used for criminal justice reasons or shared for purposes of forensics—that risk is more palpable and much more of a realistic risk to some groups than others. So it's very hard to put them in rank order without considering some of these other factors. (32, Ethics)
Some perceived the three data types to be equally risky—that they were either “all fairly comparable in terms of risks” (27, Government)—or involved distinctive risks that might depend on individual circumstances. One stated, “I don't think one is more [risky] than the other. I think they have implications for different things…. They are different risks, there are different potential possibilities of things going wrong, there are different potential benefits from it, but to me it's all equal. But other people may feel very, very differently” (50, Human Subjects). A few said the combination of the three data types together was most risky for participants but also most valuable for research. One interviewee explained,
Genome sequence without connection to clinical data seems much less problematic to me. When you combine them, then you get a different picture. Then adding real time monitoring adds the behavior. So you're getting a much clearer depiction of what's going on in a person's life, connected to what clinical issues they're facing, connected to what's going on genomically. For me, the combination is much more powerful—which is, of course, why it is being proposed at all. (11, ELSI)
Likelihood of risks
When asked to consider the hypothetical Million American Study overall, 60% of thought leaders rated the risks as unlikely to materialize (see figure 2), commonly citing the “strong track record of safe research in this domain” (1, Human Subjects) and robust safeguards. Only 10% rated the risks as likely to occur, while the remainder gave another response, often explaining that it would depend on factors such as who was conducting the study, the specific safeguards in place, and decisions regarding return of results.
Severity of harms
When asked to characterize the consequences should the risks materialize, 40% of interviewees anticipated that they would not be severe (see figure 2). Twenty percent—including nearly all thought leaders representing historically disadvantaged perspectives—rated the consequences as severe. Noted one,
People who know nothing about the history of native people always read or hear the risks as me being hypersensitive, overly sensitive, being suspicious, all these things. You have to look at the history and you have to understand where we're sitting as people who have been colonized and who have been overthrown and are fighting to keep our rights…. My concern is that the people who move forward in these things will, in their ignorance, not consider the harms that can occur to a certain population. (36, Historically Disadvantaged)
The remainder of interviewees (40%) gave another response, typically explaining that severity would depend on individual‐level and study design factors. Several echoed the belief that “for most people [the consequences] would not be that big a deal, but for a certain subset of people, it would be a huge deal” (2, Government).
The majority of interviewees rated likelihood and severity similarly (within one point of one another). Among those with contrasting answers, most perceived the likelihood that the risks would materialize as low, but the consequences as more severe.
DISCUSSION
The continuum of research required to advance the vision of precision medicine involves not only extensive genomic characterization but also comprehensive, highly granular phenotypic data. 9 Enabling people to make informed decisions about participation in such research requires that reasonably foreseeable risks be described accurately and to a level of detail a reasonable person would want to know. 10 Although genomic data entail special features and important risks, 11 disproportionate emphasis on this aspect of the research in consent processes may reflect lingering genetic exceptionalism. 12
Our results suggest that evaluating and communicating other kinds of risks are at least equally important. Further, the likelihood of many of the risks and the severity of ensuing harms depend on study design decisions under the control of the investigator, as well as participant‐level factors. 13 Establishing robust protections against the full range of risks, as well as clearly conveying the value‐laden nature of many of the potential harms, is essential.
In another part of the interview, we asked these thought leaders about possible measures to mitigate the risks and harms. As reported elsewhere, 14 they described technical data security measures as necessary but insufficient due to challenges in human involvement and widespread data sharing. They saw efforts to restrict access to research data—including Data Access Committees and Certificates of Confidentiality—as either weak or useful but not foolproof. They held a similar view of efforts to prevent misuse (including Data Use Agreements and the Genetic Information Nondiscrimination Act), although they noted additional issues such as lack of enforcement. Using a combination of measures to create a multilayered “web” of protections may be most effective, but additional research is needed on how to realize the full scope of protections that laws, rules, and procedures are intended to provide. 15
Interpretation of our results is subject to several limitations. First, essentially none of the prominent individuals we interviewed could be categorized as representing only one stakeholder group. Table 1 lists the perspective for which we identified them as thought leaders, but each interviewee could easily have been recognized in more than one category. For this reason, as well as the qualitative nature of our study, we did not attempt to assess similarities and differences between stakeholder groups. Further investigation of the extent to which perspectives differ between groups, as well as the origins and prevalence of significant differences, may be an area for future research.
Second, we carried out these interviews from 2015 to 2016, primarily in the United States. For the most part, we believe our results reflect fundamental ethical considerations that endure across time and location. Even so, changing sociopolitical environments and the swiftly evolving research landscape require ongoing vigilance. As just one example, mobile applications and devices intended to monitor and promote health are rapidly proliferating. 16 As the volume and nature of data they collect expands, stakeholder perceptions of the risks they pose when integrated in research will likely increase.
The findings reported here are just one part of a complex interview with a diverse group of thought leaders. Even so, these results highlight important considerations for both study design and informed consent that require careful attention to maintain trust in the research enterprise.
ACKNOWLEDGMENTS AND DISCLAIMER
This work was supported by a grant from the National Human Genome Research Institute (NHGRI) (R01‐HG‐007733). The content is solely the responsibility of the authors and does not necessarily represent the official views of NHGRI or the National Institutes of Health. Thanks to our colleagues Leslie E. Wolf, Erin C. Fuse Brown, and Kevin C. McKenna.
Supporting information
The table, box A, appendix, and figures are available in the “Supporting Information” section for the online version of this article and via Ethics & Human Research's “Supporting Information” page: https://www.thehastingscenter.org/supporting-information-ehr/.
Supporting Information
REFERENCES
- 1. Toga, A. W. , et al., “Big Biomedical Data as the Key Resource for Discovery Science,” Journal of the American Medical Informatics Association 22, no. 6 (2015): 1126–31; Bui, A. A. T., and J. D. Van Horn, “Envisioning the Future of ‘Big Data’ Biomedicine,” Journal of Biomedical Informatics 69 (2017): 115‐17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Beskow, L. M. , et al., “Thought Leader Perspectives on Risks in Precision Medicine Research,” in Big Data, Health Law, and Bioethics, ed. Cohen I. G. et al. (Cambridge: Cambridge University Press, 2018), 161–74. [Google Scholar]
- 3. Beskow, L. M. , Hammack C. M., and Brelsford K. M., “Thought Leader Perspectives on Benefits and Harms in Precision Medicine Research,” PLoS One 13, no. 11 (2018): e0207842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Ibid.
- 5. Guest G., Bunce A., and Johnson L., “How Many Interviews Are Enough? An Experiment with Data Saturation and Variability,” Field Methods 18, no. 1 (2006): 59–82. [Google Scholar]
- 6. Namey, E. E. , and Trotter R. T. II, “Qualitative Research Methods,” in Public Health Research Methods, ed. Guest G. S. and Namey E. E. (Los Angeles: SAGE Publications, 2015), 443–82. [Google Scholar]
- 7. Denny, J. C. , et al., “The ‘All of Us’ Research Program,” New England Journal of Medicine 381, no. 7 (2019): 668–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. MacQueen, K. M. , et al., “Codebook Development for Team‐Based Qualitative Analysis,” Cultural Anthropology Methods 10, no. 2 (1998): 31–36; Guest, G., et al., Applied Thematic Analysis (Los Angeles: Sage Publications, 2012). [Google Scholar]
- 9. Denny et al., “The ‘All of Us’ Research Program.”
- 10. U.S. Department of Health and Human Services , “Federal Policy for the Protection of Human Subjects—Final Rule,” Federal Register 82, no. 12 (2017): 7149–7274. [PubMed] [Google Scholar]
- 11. Sariyar, M. , et al., “How Sensitive Is Genetic Data?,” Biopreservation and Biobanking 15, no. 6 (2017): 494–501; Garrison, N. A., et al., “Genomic Contextualism: Shifting the Rhetoric of Genetic Exceptionalism,” American Journal of Bioethics 19, no. 1 (2019): 51‐63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Sabatello, M. , and Juengst E., “Genomic Essentialism: Its Provenance and Trajectory as an Anticipatory Ethical Concern,” in Looking for the Psychosocial Impacts of Genomic Information, ed. E. Parens and P. S. Appelbaum, special report, Hastings Center Report 49, no. S1 (2019): S10–S18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Beskow et al., “Thought Leader Perspectives on Benefits and Harms in Precision Medicine Research.” [DOI] [PMC free article] [PubMed]
- 14. Hammack, C. M. , Brelsford K. M., and Beskow L. M., “Thought Leader Perspectives on Participant Protections in Precision Medicine Research,” Journal of Law, Medicine & Ethics 47, no. 1 (2019): 134–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Wolf, L. E. , et al., “The Web of Legal Protections for Participants in Genomic Research,” Health Matrix Cleveland 29, no. 1 (2019): 3. [PMC free article] [PubMed] [Google Scholar]
- 16. Kao, C. K. , and Liebovitz D. M., “Consumer Mobile Health Apps: Current State, Barriers, and Future Directions,” in Clinical Informatics in Psychiatry, supplement, PM&R: The Journal of Injury, Function, and Rehabilitation 9, no. 5S (2017): S106–S115; Rathbone, A. L., and J. Prescott, “The Use of Mobile Apps and SMS Messaging as Physical and Mental Health Interventions: Systematic Review,” Journal of Medical Internet Research 19, no. 8 (2017): e295. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
The table, box A, appendix, and figures are available in the “Supporting Information” section for the online version of this article and via Ethics & Human Research's “Supporting Information” page: https://www.thehastingscenter.org/supporting-information-ehr/.
Supporting Information
