Are Public Repository Requirements Exacerbating Lack of Diversity?

Thomas May

doi:10.1016/j.tig.2020.03.004

. Author manuscript; available in PMC: 2021 Jun 1.

Published in final edited form as: Trends Genet. 2020 Apr 14;36(6):390–394. doi: 10.1016/j.tig.2020.03.004

Are Public Repository Requirements Exacerbating Lack of Diversity?

Thomas May ¹

PMCID: PMC7372716 NIHMSID: NIHMS1584914 PMID: 32396832

Abstract

Although Public Repository requirements are aimed at researchers and designed to ensure that the utility of the limited data we have is optimized, these policies also have ramifications for research participants. In this Opinion, I discuss how the nature of such repositories can subject participants whose data is “banked” to unwitting participation in scientific projects they might find objectionable. In addition, concerns about the privacy of “banked” genomic data are exacerbated by recent projects that demonstrate the ability to re-identify genomic data, raising the specter of discriminatory or oppressive use of this information. These concerns are most likely to discourage participation in research that requires data sharing among those who have experienced these phenomena and are less likely to discount their likelihood.

Keywords: Ethics, Data Sharing

The Continuing Challenge of Participant Diversity in Genomic Research

Lack of diversity among genomic research participants is a problem from both a moral and practical perspective. From a moral perspective, failure to be included among research participants can unjustly deny participation in the benefits of research, a well-recognized problem, especially for women and minorities, within clinical trials. [https://www.fda.gov/media/84982/download] From a practical perspective, lack of diversity among research participants skews the data concerning disease-variant associations, and hinders the ability to identify important genomic characteristics pertinent to health [1]

A number of initiatives have attempted to address these problems from different directions. The National Human Genome Research Institute (NHGRI) has recognized this problem for more than a decade, addressing diversity among career scientists through a “Plan for Increasing the Number of Underrepresented Minorities Trained in Genomics and ELSI Research” developed in 2008 (the idea being that greater diversity in the scientific workforce will spur greater diversity in research participants). More recently, it addressed diversity among research participants through a “Minority and Special Populations Initiative” that includes revisions to eligibility for funding under the Diversity Action Plan (DAP); increased funding of collaborative research focused on “diseases and conditions affecting minority populations”; and Community Outreach and Public Education. [https://www.genome.gov/leadership-initiatives/Minority-and-Special-Populations-initiative]. In addition, the “All of Us” and initiative has seen some success [https://www.researchallofus.org/data/data-snapshots/] through an emphasis on trust, transparency and partnership [https://allofus.nih.gov/about/program-overview/precision-medicine-initiative-privacy-and-trust-principles], as have some limited initiatives surrounding discrete research projects. [2]

Despite these efforts, lack of diversity remains a serious challenge as the distribution of the participants in genomic research continues not to reflect a representative population. Much of the progress made in diversifying genomic research participation has been achieved within a very limited set of sub-populations: For example, while the proportion of participants in genome-wide association studies (GWAS) of non-European ancestry has increased from 4% to 19% between 2009 and 2016, almost all of this gain in diversity has been achieved among participants of Asian ancestry, whose representation has increased from 3% to 14% of GWAS participants. Participation among African and other ancestry groups, for example, has not reflected similar progress, if any. [3]

With the number of participants from some minority populations continuing to reflect a significant lack of representation, it might seem as if even greater emphasis – and fewer exceptions – should be placed on efforts to assure that what data is collected among these populations is fully released to approved public repositories so that this data can be used by multiple investigators, and for multiple studies. However, I argue below that such a strategy, while at first blush seeming to optimize the utility of the limited data that is produced for participants from these populations, might actually undermine efforts to increase enrollment among these populations (particularly African Americans). Although studies of attitudes toward biobanking and public repositories (discussed below) have shown differential preferences based upon race, little has been done to cash out the root of these differences in terms of what specifically motivates lack of participation in research: this is likely due to the inherent difficulty of studying individuals reluctant to participate in research studies in the first place. Nonetheless, careful consideration of what is known about trust and research participation, along with consideration of recognized behavioral phenomena, can identify plausible explanations for underrepresentation, even if these cannot be empirically established as causal for the reasons just described. It is in this vein that I argue below that a preferred strategy may well be more less stringent requirements for depositing data into public repositories, at least for research where minority populations are targeted. At the very least, we must consider how these recognized phenomena might – separately and together – exacerbate underrepresentation of minority communities when combined with public repository requirements.

The Ideals and Pitfalls of Public Repositories

With the growth of both Precision and Evidence-Based Medicine, and increasing role of large datasets for identifying best practices and advancing biomedical research, there has been a sustained effort across NIH to promote the collection of data into public repositories that facilitate shared use of generated data. [www.nlm.nih.gov/NIHbmic/nih_data_sharing_policies.html]. This has been recognized as of particular importance to genomic research, which requires very large datasets, resulting in NHGRI adopting policies to encourage data producers to quickly deposit data collected into approved public repositories (e.g. GenBank, ENCODE, modENCODE). These policies, which go to great lengths to balance the interests of data producers with the ability of other investigators to use the data generated to “design and carry out their own research programs” [https://www.genome.gov/Pages/Research/ENCODE/Mod-ENCODE_Consortia_Data_Release_Policy_revised_11-22-09.pdf, p.1], require investigators funded to quickly release data as soon as it is verified (before analysis and prior to publication). The reason for this is to “accelerate access to and use of…data by the entire scientific community.” [https://www.genome.gov/Pages/Research/ENCODE/Mod-ENCODE_Consortia_Data_Release_Policy_revised_11-22-09.pdf, p.1]

NHGRI justifies this requirement because it is recognized as one of the most effective ways to promote “the use of the human genome sequence and subsequent genomic data sets to advance scientific knowledge and application to human health.” [https://www.genome.gov/Pages/Research/ENCODE/Mod-ENCODE_Consortia_Data_Release_Policy_revised_11-22-09.pdf, p.2]. This justification seems plausible on its surface: Because genomic data is expensive to generate and often requires ultra-large datasets, “pooling” data generated across funded research projects allows for greater numbers, and would seem to also help to diversify data by aggregating pools of participants from different regions, institutions, and specialties.

Although biorepository policies like these are aimed at researchers and designed to ensure that data and knowledge are shared, these policies also have ramifications for research participants whose personal information constitutes the data shared, albeit in de-identified form. Not only does this raised well-recognized concerns about privacy, but also potentially raises concerns about specific purposes for which data collected might be employed. While an individual may be willing or even eager to participate in research exploring potential therapies for cancer, diabetes, or heart disease, they may object to the very enterprise of, for example, attempts to identify a “gay gene” [4] or, as illustrated in the Havasupai case [5], to the confirmation or rebuttal of cultural history, legends or beliefs.

The distinction between overall favorable attitudes toward research and possible objection to specific projects is at the center of challenges to enroll greater numbers of minorities in genomic research protocols. Current biorepository policies rely on overall favorable attitudes toward biomedical research in general, and in certain contexts toward data banking in particular. Closer examination would indicate a more subtle, or complex approach to public attitudes toward data sharing is warranted. First, consider the degree to which control is identified as important by surveys where participants were favorable toward data sharing in general: A study published in Genetics in Medicine in 2011 indicated that while most research participants were favorably disposed toward public release of their genomic data, a significant portion favored restricted release. [6]. Such restriction is not necessarily indicative of preferences for study-specific consent, but is likely a reflection of concerns about privacy and potential misuse of data that is shared: several studies have found willingness to share, for example, is greater where logistics are explained, data is de-identified, and privacy protection is emphasized — more so than when data sharing policies reflect tiered or study-specific consent. [7,8].

Second, a 2016 review of literature surrounding individuals’ perspectives on broad consent and data sharing, also published in Genetics in Medicine [8], found that while willingness to share data was high in general, it was lower (a) among people who had privacy concerns, (b) when pharmaceutical companies would have access to data, and most importantly for our purposes here, (c) among underrepresented minorities. This should raise concerns about how minority participation in genomic research is affected by data sharing requirements.

Although NIH data sharing policies do provide for a tiered system of data sharing and also subject data sharing to limitations based upon IRB requirements and informed consent [https://osp.od.nih.gov/wp-content/uploads/NIH_GDS_Policy.pdf], these do not address the “optics” of public repository requirements. The nature of data repositories —and indeed some of the very goods they are designed to foster — removes control over the types of scientific endeavors an individual might wish to participate in and promote, and thus subjects participants whose data is “banked” to unwitting participation in scientific projects they might potentially find objectionable. This is particularly salient for populations who have experienced discrimination or stigmatization, warranting protection against use of data for projects that would further disadvantage them.

Race, Trust, and Participation in Data Sharing

There arise at least two practical problems for minority participation in data sharing, that must be considered when attempting to increase participation among minority populations:

The inadequacy of privacy protections in areas of particular concern to minority populations (e.g. employment discrimination or forensic use of genomic data to target law enforcement in ways detrimental to minority communities [9])
Distrust of research purposes beyond very narrow parameters. (as reflected in concerns about Pharmaceutical company access to data, as well as lower minority willingness to share data in general as I will discuss below).

This latter point, distrust of research purposes, is perhaps both the easiest factor to understand, and most straightforward to justify. The Tuskegee Syphilis Study, sponsored by the US Public Health Service, infamously denied treatment to African American men diagnosed with syphilis in order to advance a research “study in nature” of the disease, creating clear reason for this population to eye any research endeavor with suspicion. This suspicion is justified precisely because (not in spite of) the fact that research abuses are often a reflection of inadequate attention to risks due to a narrow focus on advancement of science, rather than intentional infliction of harm: the more recent scandal involving genetic research on members of the Havasupai Tribe by Arizona State University researchers demonstrates the very real (unintended) harms that can follow from such inattention to particular cultural contexts when identifying risks. (5) In addition, the goods resulting from research often fail to be shared by many in minority communities, a problem identified in the story of HeLa cell research and the inability of the family of Henrietta Lacks (from whom these cells were obtained) to benefit due to lack of insurance coverage. Indeed, a recent qualitative study of minority perspectives found fears of becoming the “next Henrietta Lacks” to be common. [10].

Mistrust of research purposes for often culturally-specific reasons contributes to privacy concerns (#1 above) among many minority populations, and again for good reason (albeit more subtle). For example, recent use of genetic databanks as a forensic tool for law enforcement raises questions about the potential for this tool to oppress, providing disincentives to participate for some minority populations [5]. Beyond this, concerns about privacy are commonly related to potential for discrimination in employment and insurability across all populations, including minority populations. [6,7]. Concerns about potential oppression or discrimination seem more likely to discourage participation among those who have experienced these phenomena, as we know experience of sociopolitical vulnerability / discrimination increases perceptions of, and attention to, perceived risk. [11]

Concerns about use or misuse of “banked” genomic data are exacerbated by recent projects that demonstrate the ability to re-identify genomic data that was thought to be de-identified. First demonstrated as possible in a study published in Science in 2013 [12], the possibility of re-identification has become ever more plausible since, with a 2019 study demonstrating that re-identification is possible even for data thought to be secured from such efforts —that is, even if informative SNPs are hidden and only yes/no queries are allowed on the presence of specific alleles in the dataset. [13]. Re-identification is not a threat posed by the general population: because the studies that have shown this possible have accomplished re-identification by cross-referencing multiple datasets, only persons who have access to multiple data sets, containing data detailed enough to distinguish a specific individual, could successfully do so. However, while the resources to re-identify data are not easily accessible to the general population, the proliferation of big data threatens to increase the ability to undertake such projects and highlights perceived threats of unauthorized use of an individual’s genetic data in ways that might be detrimental to their personal interests, or the interests of their community. In some cases, these threats of “group harm” to one’s community may be perceived as even greater risks than threats to one’s personal interests. Indeed, the European Union, which once considered privacy risks for de-identified data banked in repositories as purely theoretical, now considers the risk of re-identification to be very real, resulting in questions about whether U.S. data sharing policies are consistent with the European Union’s data privacy protections. [14].

Concluding Remarks

A focus on public repository requirements may discourage minority populations from participating in genomic research completely. Public data sharing policies make it possible for an individual’s genomic data to be used for purposes that the individual participant might find objectionable, or that might be detrimental to the interests of that person or her community. Despite emphasis on trust and transparency, past research abuses and the “optics” of data sharing may combine to discourage minority participation in genomic research where data sharing through public repositories is required. Instead, we should consider more lenient opt-out procedures to public repository requirements for studies where minority participants are targeted, and re-direct our efforts toward mechanisms designed to increase a sense of control and to engender trust. For example, counseling about opt-out rights and increased use of community engagement techniques and oversight boards that reflect broader community values. Several questions remain concerning how best to establish trust in biomedical research among minority populations; about how to incorporate community oversight of research utilizing public repository data; and how to best handle the growing potential for re-identification of data within public repositories. (see also Outstanding questions) However, it is imperative that we handle these issues head-on if we wish to gain the confidence required for significant minority participation in genomic research. Rather than citing generally favorable public attitudes toward biomedical research (and, in proper context, biobanks) to justify strict policies toward release of data to public repositories, we should take confidence from these favorable attitudes that if we can engender a sense of control and trust, voluntary participation in data sharing will ensue.

Questions.

How might trust in biomedical research endeavors most effectively be established within minority populations?

How might community oversight be incorporated into the design of research projects stemming from public repository data?

How might the potential for re-identification be addressed in protections for research participants?

Highlights.

Data sharing subjects participants to unwitting participation in scientific projects they might potentially find objectionable.

Concerns about use or misuse of “banked” genomic data are exacerbated by recent projects that demonstrate the ability to re-identify genomic data.

Concerns about potential oppression or discrimination are more likely to discourage participation among those populations who have experienced these phenomena.

Acknowledgements

This work was supported under a grant from the National Library of Medicine and the national Human Genome Research Institute, 1G13LM012445-01A1 (PI: Thomas May)

G13 Scholarly Works in Biomedicine and Health: New Frameworks for Informed Consent in Genomic and Precision Medicine

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

1.Hindorff L, Bonham V, Brody L, et al. , 2018. “Prioritizing diversity in human genomics research,” Nature Reviews Genetics 19:175–185. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.May T, Cannon A, Moss I, et al. , “Recruiting Diversity Wghere it Exists: The Alabama Genomic Health Initiative,” Journal of Genetic Counseling, (forthcoming). [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Popejoy A, and Fullerton S, 2016. “Genomics in Failing on Diversity,” Nature 538:161–164. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Yoder J, 2019. A new age of gay genomics is here. Are we ready for the consequences? Slate. August 29, 2019.
5.Mello M, and Wolf L, 2010. The Havasupai Indian Tribe Case: Lessons for research involving stored biological samples. New England Journal of Medicine 363: 204–207. [DOI] [PubMed] [Google Scholar]
6.McGuire A, Oliver J, et al. , 2011. To share or not to share: a randomized trial of consent for data sharing in genome research. Genetics in Medicine 13(11):948–955. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Lemke A, Wolf W, et al. , 2010. Public and biobank participant attitudes toward genetic research participation. Public Health Genomics 13(6):368–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Garrison N, Sathe N, Antommania A, et al. , 2016. “A systematic review of individuals’ perspectives on broad consent and data sharing in the United States.” Genetics in Medicine 18(7):663–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.May T, 2018. “Sociogenetic risks: ancestry DNA testing, third party identity, and protection of privacy,” New England Journal of Medicine 370: 410–412. [DOI] [PubMed] [Google Scholar]
10.Lee S, Cho M, Kraft S, 2019. I dont want to be Henrietta Lacks: diverse patient perspectives on donating biospecimens for precision medicine research. Genetics in medicine 21(1):107–113. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Satterfield T, Mertz C, and Slovic P, 2004. Disc rimination, Vulnerability, and Justice in the Face of Risk. Risk Analysis 24(1): 115–129. [DOI] [PubMed] [Google Scholar]
12.Altman Russ B., Clayton Ellen Wright, Kohane Isaac S., Malin Bradley A., and Roden Dan M. 2013. “Data re-identification: societal safeguards.” Science 339(6123): 1032–1033. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Thenan E, Ayday E, and Cicek A, 2019. Re Identification of individuals in genomic data-sharing beacons via allele inference. Biotechnology 35(3):365–371. [DOI] [PubMed] [Google Scholar]
14.Mascalzoni D, Bentzen H, Budin-Ljosne I, et al. , 2019. Are requirements to deposit data in research repositories compatible with the European Union’s general data protection regulation? Annals of Internal Medicine doi: 10.7326/M18-2854. [DOI] [PubMed] [Google Scholar]

[R1] 1.Hindorff L, Bonham V, Brody L, et al. , 2018. “Prioritizing diversity in human genomics research,” Nature Reviews Genetics 19:175–185. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.May T, Cannon A, Moss I, et al. , “Recruiting Diversity Wghere it Exists: The Alabama Genomic Health Initiative,” Journal of Genetic Counseling, (forthcoming). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Popejoy A, and Fullerton S, 2016. “Genomics in Failing on Diversity,” Nature 538:161–164. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Yoder J, 2019. A new age of gay genomics is here. Are we ready for the consequences? Slate. August 29, 2019.

[R5] 5.Mello M, and Wolf L, 2010. The Havasupai Indian Tribe Case: Lessons for research involving stored biological samples. New England Journal of Medicine 363: 204–207. [DOI] [PubMed] [Google Scholar]

[R6] 6.McGuire A, Oliver J, et al. , 2011. To share or not to share: a randomized trial of consent for data sharing in genome research. Genetics in Medicine 13(11):948–955. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Lemke A, Wolf W, et al. , 2010. Public and biobank participant attitudes toward genetic research participation. Public Health Genomics 13(6):368–77. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Garrison N, Sathe N, Antommania A, et al. , 2016. “A systematic review of individuals’ perspectives on broad consent and data sharing in the United States.” Genetics in Medicine 18(7):663–71. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.May T, 2018. “Sociogenetic risks: ancestry DNA testing, third party identity, and protection of privacy,” New England Journal of Medicine 370: 410–412. [DOI] [PubMed] [Google Scholar]

[R10] 10.Lee S, Cho M, Kraft S, 2019. I dont want to be Henrietta Lacks: diverse patient perspectives on donating biospecimens for precision medicine research. Genetics in medicine 21(1):107–113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Satterfield T, Mertz C, and Slovic P, 2004. Disc rimination, Vulnerability, and Justice in the Face of Risk. Risk Analysis 24(1): 115–129. [DOI] [PubMed] [Google Scholar]

[R12] 12.Altman Russ B., Clayton Ellen Wright, Kohane Isaac S., Malin Bradley A., and Roden Dan M. 2013. “Data re-identification: societal safeguards.” Science 339(6123): 1032–1033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13. Thenan E, Ayday E, and Cicek A, 2019. Re Identification of individuals in genomic data-sharing beacons via allele inference. Biotechnology 35(3):365–371. [DOI] [PubMed] [Google Scholar]

[R14] 14.Mascalzoni D, Bentzen H, Budin-Ljosne I, et al. , 2019. Are requirements to deposit data in research repositories compatible with the European Union’s general data protection regulation? Annals of Internal Medicine doi: 10.7326/M18-2854. [DOI] [PubMed] [Google Scholar]

PERMALINK

Are Public Repository Requirements Exacerbating Lack of Diversity?

Thomas May, PhD

Abstract

The Continuing Challenge of Participant Diversity in Genomic Research

The Ideals and Pitfalls of Public Repositories

Race, Trust, and Participation in Data Sharing

Concluding Remarks

Questions.

Highlights.

Acknowledgements

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Are Public Repository Requirements Exacerbating Lack of Diversity?

Thomas May, PhD

Abstract

The Continuing Challenge of Participant Diversity in Genomic Research

The Ideals and Pitfalls of Public Repositories

Race, Trust, and Participation in Data Sharing

Concluding Remarks

Questions.

Highlights.

Acknowledgements

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases