Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Apr 8.
Published in final edited form as: Proc (IEEE Int Conf Healthc Inform). 2024 Aug 22;2024:638–641. doi: 10.1109/ichi61247.2024.00102

An Ethical Approach to Genomic Privacy Preserving Technology Development

Lynette Hammond Gerido 1, Erman Ayday 2
PMCID: PMC11976530  NIHMSID: NIHMS2071050  PMID: 40200994

Abstract

Demand for genomic research data and genetic testing results from cancer patients has grown exponentially. When a patient is diagnosed with a hereditary cancer syndrome, standard practice is for providers to encourage patients to discuss their results with their relatives and encourage those relatives to have clinical genetic testing and possibly participate in genetic research. Genomic research data and genetic testing results are being shared and connected in ways never imagined. Genomic data sharing is critical for advancing precision health and increasing diversity in global genome databases. However, these advancements often come with undesirable consequences, which call for additional privacy safeguards and research practices to protect hereditary cancer patients and their families because relatives who may have genomic information in common with the patient causing privacy risks to ripple throughout a kinship network. We propose to address this gap using an interdisciplinary approach integrating bioethical principles (autonomy, non-maleficence, beneficence, respect for persons, and equity) with data science techniques to mitigate privacy risk challenges.

Keywords: Human genome, privacy, hereditary cancer, data sharing

I. Introduction

Demand for genomic research data and genetic testing results from cancer patients has grown exponentially. Hereditary cancer syndromes are responsible for 5–10% of all diagnosed cancer cases. Researchers using data from exome studies estimate the prevalence of hereditary breast and ovarian cancer (HBOC) in the general population to be 1 in 139 [1]. The prevalence of Lynch syndrome (LS) is estimated at 1 in 279 [2]. Both HBOC and LS are highly penetrant, autosomal dominant genetic conditions. When diagnosed with HBOC or LS, standard practice is for providers to encourage patients to discuss their results with their relatives and encourage them to have clinical genetic testing [3], [4] and possibly participate in genetic research [5], [6]. Genomic research data and genetic testing results are being shared and connected in ways that were never imagined before. Genomic data sharing is critical for advancing precision health and increasing diversity in global genome databases. However, these advancements often come with undesirable consequences, which call for additional privacy safeguards and research practices to protect hereditary cancer patients and their families because relatives who may have genomic information in common with the patient causing privacy risks to ripple throughout a kinship network. There is little research focusing on reidentification risk mitigation strategies and technologies related to cascade testing and how researchers should address concerns associated with inferred kinship and group harm.

II. Human Genome Data Sharing

Mapping of the human genome set a course for open research and data sharing among scientists that accelerated discovery and advancement. Since the introduction of next generation sequencing technologies, the amount of genomic research data has proliferated. Researchers are incentivized to share genomic data to support effective translation of study finding and improve clinical application and health outcomes. Researchers need large genomic datasets to study the origins of individuals and identify associations between traits and specific parts of DNA. However, as shown by earlier work, public availability of genomic data for research (even in anonymized form) causes serious privacy concerns [7], [8]. Furthermore, several studies report that although most individuals show positive attitude towards genomic research and participating in such studies, the overwhelming majority of them have ranked privacy as one of their top concerns [9], [10], [11]. “Although such large datasets are promising a revolution in medicine, it has been shown in numerous studies that it is not straightforward to ensure anonymity of the participants in such datasets” [12]. Human genome is the utmost personal identifier and sharing genomic data for research while preserving the privacy of the individuals have been challenging many different fields (e.g., medicine, bioinformatics, computer science, law, and ethics) for long, due to possibly dire ethical, monetary, and legal consequences.

III. Genomic Research Collaboration

Researchers need to share data to conduct collaborative analyses on genomic datasets. Studies of small and specific subpopulations (e.g., rare disease analysis or genomic studies of minorities) would significantly benefit from collaborations between researchers. Collaborative analyses typically require a lengthy institutional review board (IRB) process to exchange raw datasets that include individuals’ personal genomes. But, before such collaborative analyses, the exploration phase of genomic research often requires analyzing only the summary statistics and selecting the important genotypes to be investigated. However, when data is collected from a distributed source, even this step can be privacy-sensitive, and therefore it needs to be handled carefully. Ayday’s team is developing methods to carryout preliminary computations among several researchers (potential collaborators) before the full IRB process, in a privacy-preserving way without compromising on privacy.

This allows for more accurate and efficient results, as well as an increased understanding of the data. By sharing data, researchers can work together to identify patterns and locate genomic variants that may be associated with specific diseases or conditions to identify new biomarkers for diagnostics and therapeutics. However, privacy concerns are a significant barrier to data sharing and collaboration in genomic analysis. The sensitive nature of genomic data raises concerns about the potential for misuse and abuse if it is not properly protected. These concerns have led to a number of restrictions on how genomic data can be shared, making it difficult for researchers to access the data they need to conduct their work.

IV. Reidentification

Although large genomic datasets show great promise for revolutionizing medicine, many studies point out that ensure anonymity of individuals whose data are included is not as straightforward as it may seem. The human genome is the utmost personal identifier and sharing genomic data for research while preserving the privacy of the individuals have been challenging many different fields (e.g., medicine, bioinformatics, computer science, law, and ethics) for long, due to possibly dire ethical, monetary, and legal consequences. Reidentification can be carried out in several ways, 1.) triangulation of publicly available data (e.g., voter records) with open research data (e.g., Personal Genome Project profiles) [7], 2.) matching high-quality three-dimensional (3D) face maps [13], 3.) inference by kinship relationships [12], and 4.) the rarity of the variants [14]. Re-identification of an individual jeopardizes their privacy and can have dire ethical consequences, such as discrimination (e.g., employment or insurance). Also, relatives may share genomic information with an individual who has been re-identified, which can cascade the risks throughout a kinship network. Despite such knowledge of re-identification risks, significant research gaps remain, due in part to the influx of automated technologies used to assess privacy risk often developed without the backbone of an ethical framework to ensure equitable distribution of protections and benefit. We propose to address this gap using an interdisciplinary approach integrating bioethical principles (autonomy, non-maleficence, beneficence, respect for persons, and equity) with data science techniques to mitigate privacy risk challenges.

V. Ethical Framework

In 2018, Claw and colleagues presented an ethical framework informed by community based participatory research approaches to engage Indigenous people and communities in genomic research. We selected this as framework to guide our work because indigenous cultures emphasize collectivism, which may better help us to prioritize the needs of networks of people as groups or families to better understand ways to prevent harm. In addition, some indigenous communities have been excluded or faced barriers to participation due to lack of study transparency, research malpractice, or subsequent misuse of genomic data. There are cultural and political sensitivities related to appropriate data use, which should be taken into account to reduce harms and rebuild trust. Claw’s ethical framework extends beyond the current US federal requirements for biomedical and behavioral research. It includes six principles: (1) understand existing regulations, (2) foster collaboration, (3) build cultural competency, (4) improve transparency, (5) support capacity, and (6) disseminate research findings. The goals of the framework are to build trust, increase inclusion of diverse groups in genomic research, and enhance ethical research practices that promote community oversight (e.g., tribal oversight and consultation) and benefits to participants and their communities.

Gerido’s previous work includes community-based participatory research to identify patients’ privacy concerns related to genetic testing for susceptibility to hereditary cancer syndromes [15]. Patients’ described concerns related to communicating with family members, privacy, and the subsequent use of their genetic data by researchers and third parties. Ayday has developed algorithms to assess reidentification risks by evaluating the research study designs, environments, and population characteristics. We come together to 1.) capture researcher and IRB reviewer perspectives of these algorithmic reidentification risk mitigation for genomic research and 2.) develop a conceptual framework for leveraging reidentification risk assessment technologies to address hereditary cancer patient priorities. We hypothesize that hereditary cancer patients and their families will have varying priorities and concerns that have not yet been incorporated into the privacy risk assessment algorithms and scientists will require specific instructions and risk communications to address the concerns when seeking IRB approval.

A. Perceptions of Privacy Risk Mitigation Strategies for Hereditary Cancer Patients

Utilizing the resources of the Case Western Reserve University Comprehensive Cancer Center, we will conduct focus group interviews with Geneticists, Genetic Counselors, Scientists, Patient Advocates and IRB members involved in Hereditary Cancer Research. Focus groups will be conducted to explore the range of individual opinions and experiences related to privacy risk mitigation strategies. Such questions will include:

  1. Have you previously participated in or reviewed a GWAS study within the last 5 years?

  2. How would you describe the potential reidentification risks for hereditary cancer patients who participate in GWAS research?

  3. What features of the privacy preserving technology should address the privacy considerations that are unique to hereditary cancer patients?

  4. Would your response change based on type of cancer?,

  5. Are you more likely to participate in future research that employs privacy preserving technology?

  6. What concerns do you have?

We will use Claw’s framework to organize the feedback collected during the focus groups. Using the framework in this way provides reassurance to communities that have been underserved by genomic research and has the potential to expand protections for individuals participating in genomic research and clinical genetic testing.

B. Guidance and Feature Recommendations for Algorithms Assessing Reidentification Risk for Hereditary Cancer Syndromes

We will map insights from the family focus groups to technical requirements for enhancing the algorithm and propose updates to the existing policies and guidance for genomic data sharing practices. Input measures such as study sample size, rarity of variants or disease, phenotypic associations, and other potentially identifiable characteristics of the study participants will be considered. For instance, the privacy risk will be higher for small studies that include only underrepresented populations. Output measures to be considered relate to the return of results and dissemination of the findings (with particular attention to the specificity of the statistical analyses and data that are shared).

VI. Position

Health informatics and data science need to incorporate ethical frameworks during design and development to adequately address privacy concerns related to human genome data sharing. We hypothesize that when researchers and reviewers receive tailored risk communications, they will be empowered to improve their study designs and feedback to scientists seeking IRB approval.

Figure 1:

Figure 1:

Claw’s ethical framework for enhancing genomic research with Indigenous communities.

TABLE I. Table I:

Six Principles for engaging in ethical research.

Key Considerations Significance
1. Understand existing regulations Respect
2. Foster collaborations Reciprocity
3. Build cultural competency Respect - persons, community traditions, knowledge, and values
4. Improve transparency Beneficence
5. Support capacity Beneficence, equity
6. Disseminate research findings Beneficence, equity, respect for persons

Acknowledgment

This research is supported the National Library of Medicine 1R01LM014520-01A1, Accelerating Genomic Data Sharing and Collaborative Research with Privacy Protection). The views expressed are our own and do not reflect an endorsement from the NIH.

Contributor Information

Lynette Hammond Gerido, Department of Bioethics, Case Western Reserve University School of Medicine, Cleveland, Ohio, US.

Erman Ayday, Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, Ohio, US.

References

  • [1].Abul-Husn NS et al. , “Exome sequencing reveals a high prevalence of BRCA1 and BRCA2 founder variants in a diverse population-based biobank,” Genome Medicine, vol. 12, no. 1, p. 2, Dec. 2019, doi: 10.1186/s13073-019-0691-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Maratt JK and Stoffel E, “Identification of Lynch Syndrome,” Gastrointestinal Endoscopy Clinics, vol. 32, no. 1, pp. 45–58, Jan. 2022, doi: 10.1016/j.giec.2021.09.002. [DOI] [PubMed] [Google Scholar]
  • [3].Ballard LM, Band R, and Lucassen AM, “Interventions to support patients with sharing genetic test results with at-risk relatives: a synthesis without meta-analysis (SWiM),” Eur J Hum Genet, vol. 31, no. 9, pp. 988–1002, Sep. 2023, doi: 10.1038/s41431-023-01400-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Dimond R, Doheny S, Ballard L, and Clarke A, “Genetic testing and family entanglements,” Social Science & Medicine, vol. 298, p. 114857, Apr. 2022, doi: 10.1016/j.socscimed.2022.114857. [DOI] [PubMed] [Google Scholar]
  • [5].Katz S, Hawley ST, Tocco R, and Kurian AW, “A pilot study to increase cascade genetic risk education and testing in families with hereditary cancer syndromes.,” JCO, vol. 40, no. 28_suppl, pp.378–378, Oct. 2022, doi: 10.1200/JC0.2022.40.28_suppl.378. [DOI] [Google Scholar]
  • [6].Memorial Sloan Kettering Cancer Center, “Genetic Counseling and Genetic Testing for Hereditary Cancer at MSK: The PROMPT Study | Memorial Sloan Kettering Cancer Center.” Accessed: Apr. 22, 2024. [Online]. Available: https://www.mskcc.org/cancer-care/risk-assessment-screening/genetic-counseling-and-testing/clinical-trials/prompt-study
  • [7].Sweeney L, Abu A, and Winn J, “Identifying Participants in the Personal Genome Project by Name (A Reidentification Experiment).” arXiv, Apr. 29, 2013. doi: 10.48550/arXiv.1304.7605. [DOI] [Google Scholar]
  • [8].Venkatesaramani R, Malin BA, and Vorobeychik Y, “Reidentification of individuals in genomic datasets using public face images,” Science Advances, vol. 7, no. 47, p. eabg3296, Nov. 2021, doi: 10.1126/sciadv.abg3296. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Sanderson SC et al. , “Motivations, concerns and preferences of personal genome sequencing research participants: Baseline findings from the HealthSeq project,” European Journal of Human Genetics, vol. 24, no. 1, Art. no. 1, Jan. 2016, doi: 10.1038/ejhg.2015.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].McGuire AL et al. , “To Share or Not to Share: A Randomized Trial of Consent for Data Sharing in Genome Research,” Genet Med, vol. 13, no. 11, Art. no. 11, Nov. 2011, doi: 10.1097/GIM.0b013e3182227589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Duenas DM et al. , “Motivations and concerns of patients considering participation in an implementation study of a hereditary cancer risk assessment program in diverse primary care settings,” Genetics in Medicine, vol. 24, no. 3, pp. 610–621, Mar. 2022, doi: 10.1016/j.gim.2021.11.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Ayoz K, Ayday E, and Cicek AE, “Genome Reconstruction Attacks Against Genomic Data-Sharing Beacons,” Proc Priv Enhanc Technol, vol. 2021, no. 3, pp. 28–48, 2021, doi: 10.2478/popets-2021-0036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Durmaz B and Ayday E, “Entering Watch Dogs: Evaluating Privacy Risks Against Large-Scale Facial Search and Data Collection,” in IEEE INFOCOM 2021 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), May 2021, pp. 1–6. doi: 10.1109/INF0C0MWKSHPS51825.2021.9484550. [DOI] [Google Scholar]
  • [14].Hansson MG et al. , “The risk of re-identification versus the need to identify individuals in rare disease research,” Eur J Hum Genet, vol. 24, no. 11, Art. no. 11, Nov. 2016, doi: 10.1038/ejhg.2016.52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Gerido LH et al. , “Big advocacy, little recognition: the hidden work of Black patients in precision medicine,” J Community Genet, Sep. 2023, doi: 10.1007/s12687-023-00673-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Claw KG, Anderson MZ, Begay RL, Tsosie KS, Fox K, and Garrison NA, “A framework for enhancing ethical genomic research with Indigenous communities,” Nat Commun, vol. 9, no. 1, Art. no. 1, Jul. 2018, doi: 10.1038/s41467-018-05188-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES