Abstract
The rapidly growing scale and variety of biomedical data repositories raise important privacy concerns. Conventional frameworks for collecting and sharing human subject data offer limited privacy protection, often necessitating the creation of data silos. Privacy-enhancing technologies (PETs) promise to safeguard these data and broaden their usage by providing means to share and analyze sensitive data while protecting privacy. Here, we review prominent PETs and illustrate their role in advancing biomedicine. We describe key use cases of PETs and their latest technical advances, and highlight recent applications of PETs in a range of biomedical domains. We conclude by discussing outstanding challenges and social considerations that need to be addressed to facilitate a broader adoption of PETs in biomedical data science.
Keywords: biomedical data privacy, genomic privacy, privacy-enhancing technologies, secure computation, data sharing, collaborative studies
1. Introduction
Data sharing is a vital force in biomedical innovation. Public data repositories and biobanks allow researchers at various organizations to analyze vast arrays of human subject data beyond what they may be able to collect themselves. Many academic labs, commercial enterprises, and hospitals have joined to form collaborative consortia to share biomedical data, in the hope of extracting insights that are inaccessible to individual entities due to limited data sizes. Policies and guidelines that promote the public dissemination of research data established by government entities (e.g., NIH Data Management and Sharing Policy1) and international standard-setting organizations such as Global Alliance for Genomics and Health (GA4GH) (1) have played pivotal roles in preserving the culture of data sharing among the biomedical community, a tradition rooted in landmark collaborative efforts such as the Human Genome Project.
As we enter the era of personalized medicine, broader sharing of biomedical data is becoming more essential than ever. The limited diversity of human populations represented by existing biomedical datasets has reinforced inequities in how different groups benefit from biomedical advances (2). Studying rare diseases often requires merging small patient cohorts across organizations to enhance statistical power (3). Furthermore, accurately inferring health-related insights for each unique individual requires access to computational models trained on large and multi-modal datasets capturing the wide spectrum of individual variation in health and disease. Although recently created biobanks (e.g., the All of Us Research Program (4)) have taken a significant step toward recruiting diverse study participants, these resources are increasingly stored within siloed computing environments, limiting the scope and use of these datasets.
To further expand data sharing efforts in biomedicine, growing concerns about the privacy risks must be addressed with robust mitigation measures. In the absence of such measures, we may witness a greater dependence on restricted data silos, further compounded by the recent surge in more stringent privacy regulations (e.g., the General Data Protection Regulation [GDPR] in the European Union) and the escalating risks associated with the increasing scale of biomedical data. Moreover, a major data breach has the potential to erode public trust in the scientific enterprise. The loss of trust could not only hinder the efforts to gather large datasets but also worsen inequities by disproportionately affecting the willingness of certain populations to participate in studies.
Privacy-enhancing technologies (PETs) offer promising technical solutions to overcome these challenges by employing a variety of mathematical, algorithmic, and hardware design approaches to enable the sharing and analysis of sensitive data while protecting privacy (Figure 1). PETs encompass a broad range of techniques that address different data sharing scenarios and introduce various trade-offs in terms of the type of supported analyses, computational cost, and the degree of privacy protection offered. In this review, we focus on technologies that are most widely studied in the literature, including homomorphic encryption (HE), secure multiparty computation (MPC), trusted execution environment (TEE), differential privacy (DP), and federated learning (FL). Recent advances have greatly increased the applicability of each of these technologies in biomedicine, as we illustrate in this review. Unlike existing reviews that describe PETs as a potential solution to data-sharing challenges in biomedicine (5, 6, 7, 8, 9), we focus on providing an accessible summary of the latest advances in PETs, examining both their technical foundations and biomedical applications.
Figure 1.

Privacy-enhancing technologies (PETs) provide a range of mathematical, algorithmic, and hardware-based solutions to enable the sharing and analysis of sensitive biomedical data in various settings while protecting data privacy.
The rest of our article is organized as follows. Section 2 presents background on biomedical data privacy, covering its historical context and key concepts. Section 3 outlines the scenarios in biomedical research that involve data sharing. Section 4 delves into each PET, providing an overview of techniques, recent advances, limitations, and recent publications, exploring its application to biomedical tasks. Section 5 reviews related techniques that help facilitate the sharing of biomedical data. Finally, in Section 6, we conclude by discussing open challenges and highlighting key directions for future work.
2. Biomedical Data Privacy: Challenges and Existing Safeguards
Data privacy challenges in biomedicine have continually evolved over the past several decades, shaped by technological advances, increasing public awareness, and changes in policies and laws. From the 1960s to the 1980s, the biomedical community saw the establishment of ethical principles in human subject research, exemplified by documents like the Belmont Report (10). At the same time, concerns about patient privacy increased due to the digitization of medical records. The 1990s saw the establishment of the Common Rule and the Health Insurance Portability and Accountability Act (HIPAA) (11), marking initial efforts to create legal frameworks for safeguarding biomedical data in both research and healthcare. The completion of the Human Genome Project and the rapid growth of genetic research in the 2000s intensified concerns about the protection of human subjects and their genetic privacy. The 2010s witnessed a surge in privacy concerns in broader societal contexts fueled by the rise of social media, major data breaches, and controversy surrounding government surveillance (12, 13). The international community responded to these concerns by strengthening oversight over the collection, sharing, and use of personal information, such as via GDPR, enacted in 2018. More recent examples include Arizona and California’s 2021 genetic privacy laws (14), which strengthened privacy requirements for storing and sharing genetic data, as well as NIST’s Genomic Cybersecurity initiative, which called for new security standards in genomics (15). Currently, privacy concerns persist and extend to new biomedical domains, such as digital health (e.g., electronic health records, mobile applications, and wearable devices), multi-omics, and epidemic responsiveness in the wake of the COVID-19 pandemic (16). The growing number of large-scale biobanks and data repositories further amplifies these challenges.
The risks posed by biomedical data breaches are multifaceted. Although individuals have different notions of an acceptable privacy risk, unauthorized exposure of private information related to one’s biology and health may result in emotional distress, stigmatization, and discrimination in employment, education, and insurance opportunities. Perhaps more concerning, these harms could extend to families and demographic groups, for example, by disclosing genetic relationships between individuals or elevated health risks within groups, respectively. From the perspective of organizations that manage sensitive biomedical data, data breaches can lead to financial penalties, legal consequences, operational disruptions, and reputation damage. These risks are not just hypothetical—as of this writing, a lawsuit has been filed against 23andMe for a data breach that compromised nearly 1 million customers, including their full names, birthdates, and DNA profiles that were being sold on the dark web for up to $10 per leaked individual (17). Such leaks, if repeated, would lead to a decline in trust in the biomedical research enterprise and health systems, which would further set back future research efforts and impede scientific progress.
Biomedical privacy breaches can occur through multiple routes, each posing unique challenges (8, 18). Phenotype inference aims to deduce an individual’s traits or health conditions from different types of biomedical data, often in an unexpected manner. Reidentification occurs when inadequately anonymized data is linked back to an individual. This may involve combining multiple datasets, utilizing auxiliary information, or exploiting vulnerabilities in the de-identification process. Data linkage integrates information from different datasets to construct a more comprehensive profile of an individual, allowing attackers to reveal identity, health status, or other sensitive information about the individual. Even when the complete dataset including individual-level information is not directly shared, privacy may still be breached: data reconstruction attacks attempt to assemble fragments of available information to reveal a portion of the original data; and membership inference attacks focus on determining whether an individual is part of a specific dataset, which can disclose whether the person belongs to a sensitive or stigmatized group, potentially leading to privacy violations.
Conventional approaches for protecting privacy of biomedical data include policies and laws, technical security measures, and contractual agreements. Legal and financial penalties help serve as deterrents to misuse of biomedical information; both HIPAA and GDPR prescribe such penalties for non-compliance. The standard practice for securing biomedical data involves encrypting data at rest, employing a secure computing infrastructure, and deidentification strategies (19). Access control and user authentication mechanisms also are widely used to ensure that only authorized individuals can access sensitive data. Furthermore, researchers and organizations that share access to sensitive data typically establish data use agreements (DUAs) to define the permitted use of the data and guidelines for data management. Similarly, business associate agreements (BAAs), which are contracts between HIPAA-covered entities (e.g., hospitals) and their business associates (e.g., third-party service providers or collaborators), are an important tool for ensuring that entities that handle protected health information (PHI) comply with data protection standards.
Although these approaches offer useful safeguards for biomedical data, they do not eliminate the possibilities of data breaches or re-identification and rely primarily on limiting access to data to achieve security, which has resulted in many datasets being isolated and placed beyond the reach of most researchers. In addition, the ability to detect breaches and enforce penalties can be limited in practice; legal and policy criteria standards that determine what data qualifies as private or sufficiently de-identified for sharing lack clear definitions. As a result, many existing datasets are either shared insecurely based on trust or deemed ineligible for sharing due to privacy risks.
3. Data Sharing Scenarios and Limitations
Privacy risks and data-sharing constraints vary across different scenarios in biomedical research and practice. Here, we outline the typical data sharing scenarios (as illustrated in Figure 1) as well as the challenges faced by stakeholders in each context.
Study participation.
A private individual may voluntarily contribute health data and other personal information to studies (either clinical or non-clinical), data repositories, and third-party services. Participants typically provide informed consent, which outlines the details of data sharing, including the purpose, scope, and potential risks involved. Data sharing benefits participants by improving the collective understanding of health and disease, which could lead to better treatments or other health-related decisions. In the context of some services, participants may also receive personalized data insights. Privacy concerns are a key factor in an individual’s decision to participate in a study (20). The adequacy of informed consent remains a topic of ongoing debate, particularly with regard to the ethics of obtaining broad consent for the secondary use of data (21). Effectively communicating privacy risks also can be challenging, as some risks require technical expertise to fully understand them or are poorly understood even among researchers.
Queryable databases.
These databases are specifically designed to allow users to retrieve information through queries, providing a structured and efficient way to access biomedical data. Examples include patient registries used for study recruitment and various databases containing annotated genetic variants (22), electronic health records (EHRs) (23), and clinical trial data (24). While the restricted nature of queries minimizes information leakage, studies have shown that even with these limitations, unintended disclosures can occur (25). Such concerns may discourage individuals from participating in these databases as well as pose data management challenges to the database provider.
Analytic services.
This refers to the practice of delegating computational tasks to external entities that have access to restricted models or data resources, or more computing resources. The increasing computational demands of biomedical analysis workflows are compelling researchers to increasingly use these third-party services (26), many of which are hosted in cloud environments. However, privacy concerns or regulations can limit use of these services (e.g., an EU researcher wishing to upload data to a service operating in a non-EU country). Moreover, service providers may be required to introduce measures to protect the auxiliary models and data used by the server, which could be leaked through the analysis results returned to users (27).
Collaborative studies.
There is increasing need for researchers from different institutions or countries to collaborate on shared research goals by combining their data to obtain a more complete understanding of the biomedical phenomenon of interest. This scenario often involves the establishment of consortia focused on specific health conditions or research areas. However, organizations may need to comply with policies that limit or prohibit external sharing of data, a problem that is exacerbated if entities operate in different regulatory environments or countries.
Public data release.
This involves making biomedical datasets and analysis results openly accessible to the broader scientific community and, in some cases, the public. This practice supports the transparency and reproducibility of scientific studies and promotes collaborative efforts such as data science competitions (e.g., Kaggle2). It also extends the utility of collected datasets by allowing researchers around the world to reanalyze existing data. However, releasing a dataset containing individual-level information poses great privacy risk and is feasible only in rare circumstances. Care must also be taken when releasing simulated or redacted datasets based partially on private data, as these may still lead to privacy leakage.
Public health monitoring.
The COVID-19 pandemic motivated the design of public health systems capable of monitoring disease outbreaks and facilitating responses (e.g., exposure notification apps). The effective operation of these systems may require the collection of a broad range of personal information beyond health status, including demographic, geolocation, and social activity data. In addition, sharing these data across jurisdictions may be essential for a more accurate understanding of infectious agents. However, the possibility of harming individuals due to the disclosure of private information remains a significant concern (16), which can prevent the widespread adoption of these systems.
4. Privacy-Enhancing Technologies and Their Biomedical Applications
Privacy-enhancing technologies (PETs) represent a collection of computational techniques to safeguard sensitive biomedical data. Collectively, these technologies enable the development of privacy-by-design methods for sharing and analyzing biomedical data. These improve both the privacy and utility of biomedical data beyond what is feasible given existing security practices and contractual safeguards, such as data use agreements. We describe each technology in detail and discuss recent methodological advances and applications within the biomedical domain.
4.1. Secure Multiparty Computation (MPC)
MPC allows multiple parties to work together to perform computations collectively on their private inputs without revealing the input to each other. There are two main techniques used for MPC: garbled circuit (GC) and secret sharing (SS).
First introduced by Yao in 1986 (28), a GC enables secure evaluation of a function, represented as a Boolean circuit, between two parties with private input. The input and output of each logical gate in a GC are randomly masked to prevent the evaluator from gaining information during the circuit evaluation. The input of one party is securely communicated to the other party using the cryptographic primitive of oblivious transfer (OT), which then allows the receiver to evaluate the circuit without knowing the raw input. Although the exponential scaling of the circuit size for complex analysis tasks often leads to high communication and computational costs, several enhancements (29, 30, 31, 32) have improved the efficiency of these schemes (33, 34, 35).
SS schemes (36, 37) allow a group of parties to collectively encode a private number by dividing it into random shares, which are held individually by the parties. The private number can be reconstructed only when a predefined number of shares are combined. For example, in additive SS schemes, which are most commonly used in practical settings, secret shares are random elements of a ring (an algebraic structure exemplified by integers modulo a prime number) that add up to the private value. This ensures that all parties’ shares must be combined to reveal the secret; any subset reveals no information that can be used to infer the secret. Securely adding two secret-shared numbers, x and y, involves each party adding their individual shares for x and y, resulting in new shares representing x+y. Secure multiplication requires interaction between parties (38) but preserves the confidentiality of the private input by masking the numbers shared between parties. Other operations, such as division, square root, and comparison, are performed using addition, multiplication, and special routines that exploit the bitwise representation of private values. These operations can be combined to securely perform various analyses on private data held by multiple parties.
Many frameworks and compilers have been developed to ease the implementation of MPC algorithms leveraging various building block protocols (39). Hybrid schemes (40, 41) and compilers (39, 33, 42), which combine different MPC methodologies to improve efficiency, have also been proposed. For example, ABY (42) proposed switching between GCs and different types of SS (integer or Boolean) to perform each operation in the domain where it is most efficient (e.g., evaluating comparisons in a two-party setting using GCs or evaluating multiplexors or other bitwise operations with more than two parties using Boolean SS). These frameworks have been extended and optimized for applications in machine learning (ML), such as training and inference of neural network models (43, 44, 45, 46). Recent enhancements of core operations such as secure comparison (47) have further improved the performance and versatility of MPC frameworks.
The primary limitation of MPC is its substantial communication cost. While GCs allow most of the computation to be performed non-interactively by transferring the entire circuit in a single round of communication, the circuit is typically limited to Boolean operations, and the size of the circuit can become impractically large for sophisticated numerical calculations. SS enjoys greater analytic flexibility and efficiency in general compared to GCs, but SS-based MPC typically requires many rounds of interaction for complex tasks, a potential bottleneck in limited communication settings (e.g., wide-area network with large round-trip delays). Furthermore, the requirement that the entire input dataset be secret-shared among the parties can be a hurdle for large-scale biomedical datasets.
Several recent works have developed MPC protocols for a range of analysis tasks in biomedicine (48, 49, 50, 51, 52, 53). A common goal of these works is to improve the efficiency of MPC by redesigning the analysis task in a way that is more amenable to efficient computation using MPC operations. For example, Cho et al. (48) introduced a generalization of SS techniques aimed at minimizing redundant computation, which led to an efficient algorithm for genome-wide association studies (GWAS), involving sophisticated linear algebra tasks such as principal component analysis (PCA). This work was extended to address collaborative prediction of drug-target interactions using a neural network model (54). Jagadeesh et al. (50) used GC to efficiently perform Boolean operations (such as set intersection and difference) to identify genetic variants of interest in patient genomes. Von Maltitz et al. introduced an MPC protocol for survival analysis based on the Kaplan-Meier estimator (55). A different approach was taken by Smajlovic et al. (56), who developed a Python-based compiler that transforms a high-level analysis code into MPC executables incorporating automated optimization based on static code analysis. Such tools can help accelerate the development of MPC applications for various biomedical tasks by making the techniques more accessible to biomedical practitioners.
4.2. Homomorphic Encryption (HE)
HE refers to a form of encryption that allows direct computations on encrypted data. Earlier HE schemes, such as RSA (57), ElGamal (58), Paillier (59), and Goldwasser-Micali (60), were known as leveled or somewhat HE (SWHE) schemes, supporting specific types of operations, e.g., additions only or multiplications only, or a limited number of them. In 2008, Gentry (61) introduced the first construction of a fully HE (FHE) scheme that allows arbitrary arithmetic computations through a bootstrapping technique, which refreshes a ciphertext (encrypted data) to support additional operations. Addressing the limited concrete efficiency of the initial scheme of Gentry, which required several minutes of runtime for each bit operation (62), several schemes were later proposed (63, 64, 65, 66) to reduce the overall computational cost of FHE and thereby enabling its use in practical applications.
Akin to standard encryption schemes, the security of HE is based on the hardness of well-studied mathematical problems. Many HE schemes are based on the Ring Learning with Errors (RLWE) problem (67, 68), a lattice-based problem where the goal is to distinguish whether a set of ring elements is constructed using a common secret element or sampled randomly. This problem is shown to be extremely difficult to solve without knowing the secret and easy otherwise, translating into the guarantee that an entity can decrypt a ciphertext only if the decryption key is known. To maintain the difficulty of this problem, random noise must be introduced into the ciphertext, and the noise increases with each homomorphic operation. Unlike schemes that precisely perform computation at the expense of reducing the range of encoded values (63, 64, 65), the CKKS scheme (66) adds noise directly to the data values, enabling efficient operations at a small loss in precision. CKKS has been widely adopted in scientific applications where a small amount of noise can be tolerated. In all RLWE-based schemes, a single ciphertext encodes multiple values, and homomorphic operations such as addition and multiplication are performed simultaneously on all values in a ciphertext—known as the single-instruction, multiple-data (SIMD) property. Exploiting this property can improve the scalability of these schemes.
Notable recent developments include more efficient bootstrapping techniques (69, 70) and alternative constructions that offer a trade-off between different types of operations. For example, TFHE (71) is constructed based on the mathematical structure of the torus and permits efficient bootstrapping yet is limited to Boolean or bitwise operations. Furthermore, several HE compilers have been proposed (72) to streamline the development and optimization of HE algorithms, for example, to simplify the management of ciphertext noise. Frameworks have also been developed for the secure training of predictive ML models (73, 74).
In the biomedical domain, HE has mainly found applications in the outsourcing of computational tasks involving sensitive data. These computations may be challenging for individual users to perform because of the scale of the problem (in terms of both dataset size and computational complexity), or because of limited access to additional data or models required for the analysis. HE helps to ensure that the user’s data remains private when analysis is delegated to a third party. For example, HE-based solutions have been proposed for privately outsourcing the detection of heart conditions in ECG data (75), as well as cardiovascular risk prediction based on health records (76). Many works have tackled the computation of GWAS statistics on encrypted data, addressing a range of statistics and application settings (77, 78, 79, 80, 81). Other tasks explored in the literature include count queries on genomic and medical databases (e.g., for cohort exploration) (82, 83), detection of genetic parent-child relationships (84), and disease risk prediction using both clinical and genomic information (75, 85). Finally, Kim et al. (86) and Gürsoy et al. (87) recently illustrated secure imputation of an encrypted private genome.
These advances have brought HE-based solutions closer to meeting the requirements of biomedical applications. Nevertheless, the scope of these applications remains restricted due to several factors, including: the substantial computational overhead of homomorphic operations compared to unencrypted analysis; the need to approximate non-linear operations using additions and multiplications; and the practical limits on the complexity of the analysis task due to the high cost of bootstrapping. Moreover, most of the aforementioned solutions require that all input data be encrypted and transferred to the entity performing the computation, which can be a significant burden for large datasets. In Extending HE to Collaborative Analysis Settings: Multiparty Homomorphic Encryption, we describe a recent technical advance that helps address these limitations.
4.3. Trusted Execution Environments (TEE)
TEE is a secure area within the main processor, also called an enclave, that ensures the safe and isolated execution of software. This isolation guarantees that the memory content, end-to-end communication with external parties, and control flow of the application are protected from untrusted or malicious processes running within the same hardware, including a malicious operating system or hypervisor (104, 105). In certain TEE architectures, the binary executable of an application can also be verified through a process called remote attestation (106). To achieve these security properties, TEE relies on core hardware security components built into the processor that cannot be manipulated by software. These components typically comprise a memory encryption engine and controller to isolate memory access, and integrated circuits for cryptographic key storage and operation.
Recent TEE developments have focused on supporting third-party software deployment in an untrusted cloud environment, addressing the deployment of both user-level applications and virtual machines (VMs). Popular TEE platforms include: Intel Software Guard Extension (SGX) (107) for user-level applications; and Intel Trust Domain Extensions (TDX) (108) and AMD Secure Encrypted Virtualization (SEV) (109) for VMs. Nvidia recently introduced an update to its GPU architecture that enables GPU computation in TEE (110). In a mobile setting, ARM TrustZone (105) is a ubiquitous TEE platform on ARM CPUs, although generally a limited set of TEE functionalities are available for mobile applications.
Although TEEs offer the capability to confidentially analyze private data with computational efficiency and functionalities similar to conventional computing environments, their major drawback lies in the complexities of achieving hardware-based security. Unlike MPC and HE, which rely on minimal and well-established cryptographic primitives, TEE’s hardware-based approach introduces unique vulnerabilities. Some vulnerabilities are the result of CPU architectural bugs that allow a malicious process to extract protected data from an enclave (111); manufacturers typically promptly patch these problems once they are discovered. Other limitations are inherent in the TEE architecture and lead to the issue of side channels—indirect pathways for information leakage (112, 113). For example, a TEE enclave’s access patterns to memory pages can inadvertently reveal sensitive information stored in the secure area to an attacker. Although such attacks require significant effort, software-level mitigation is necessary when the highest level of security is required. One strategy involves ensuring that the memory access or timing patterns of the program do not depend on sensitive information (114). However, such mitigation can incur an additional computational burden and require relevant expertise during algorithm development.
Despite these drawbacks, TEE has a promising future with vested interests from major CPU producers like Intel, AMD, and ARM, who continue to address security issues and improve their TEE platforms. Cloud service providers such as Google Cloud Platform and Microsoft Azure offer TEE-enabled computing infrastructure. In addition, initiatives such as the Confidential Computing Consortium (CCC)3 and the Trusted Execution Environment Provisioning (TEEP) Working Group4 of the Internet Engineering Task Force (IETF) have been formed to support open-source projects and the development of standards related to TEE. In the research community, many software tools have been introduced in recent years to ease the translation of existing software to run securely on TEE platforms (113).
In the biomedical domain, TEEs have gained significant traction due to their ability to securely outsource the analysis of biomedical data and to facilitate the development and deployment of health artificial intelligence (AI) tools on a large scale. Notable real-world examples include BeeKeeperAI (115), a privacy-preserving healthcare AI company, and AOK, a network of eleven regional health insurers in Germany. These organizations utilize Intel SGX to protect confidential patient data, complying with regulations such as HIPAA, GDPR, and Germany’s Patient Data Protection Act (PDSG) (116). Applications of TEE in genomics are also emerging. An example is a federated GWAS service based on Intel SGX, which securely aggregates data from multiple sites and incrementally updates the statistics as study participants are added or removed (117). Data sketching techniques for enhancing the efficiency of GWAS computation in Intel SGX have also been proposed (118). Considering other genome analysis tasks, Widanage et al. demonstrated read mapping in Intel SGX and described a generalization of their tool to other workflows (119). Dokmai et al. proposed a TEE-based service for secure genotype imputation, introducing techniques to achieve resilience against side-channels while maintaining accurate imputation performance (114).
4.4. Differential Privacy (DP)
DP is a mathematical definition of privacy that provides rigorous privacy protection by ensuring that the removal or addition of a single individual in a dataset does not lead to a distinguishable change in the analysis results (120, 121). Formally, given , a randomized mechanism satisfies if, for all datasets and that differ in one record, and for any subset of all possible outputs of , we have —intuitively, this means that any result is similar in likelihood between similar datasets. The parameter is called the privacy budget and is used to specify the level of privacy protection. DP mechanisms generally satisfy the privacy guarantee by adding noise to the data, where a smaller provides more privacy at the cost of greater loss in accuracy by adding more noise. Standard DP techniques include Laplace, Gaussian, and exponential mechanisms, representing different approaches to sampling the noisy analysis result.
Various techniques have been developed to minimize noise addition and obtain a more desirable trade-off between privacy and utility. For example, some DP formulations relax the notion of privacy for better utility: , also known as approximate DP (121), requires that be satisfied with probability at least . Concentrated DP (122), zero-concentrated DP (123), and Rényi DP (124) view privacy loss as a random variable and bound the average loss instead of the worst-case loss.
Key properties of DP include post-processing, which ensures that further analysis of data that satisfy DP remain DP, and composition, which allows multiple mechanisms operating on the same data to be combined to provide a joint DP guarantee. As a result of these properties, adding DP noise to different components of the analysis pipeline—e.g., input, output, optimization objectives (125, 126), or gradients (127, 128)—can have a significant impact on overall precision depending on the analysis task. Another key factor that influences the amount of noise is sensitivity, which measures the maximum change in the analysis output due to a single-record change in the data. Different approaches have been proposed to analyze the sensitivity of a given function (e.g., global, local, or smooth sensitivity (129)). Due to these considerations, it is often necessary to carefully design DP mechanisms for specific applications to optimize their performance.
For example, in a multiparty setting, DP can be implemented either locally, by individual data providers, or globally, by a central server that aggregates analysis results; these approaches are called local DP (LDP) and centralized DP (CDP), respectively. Although CDP typically requires less noise by adding it directly to aggregated data, it may be more vulnerable to privacy leakage because it relies on a trusted third party for data aggregation. On the other hand, LDP offers DP at the level of individual data providers while increasing the overall amount of noise. Common data perturbation techniques to achieve LDP include the randomized response and its variants (130, 131, 132).
DP has recently been deployed by various entities to address private collection of statistics and publication of privatized datasets. RAPPOR (133) uses randomized response and bloom filters to privately collect usage statistics from the Chrome browser. LDP has been deployed by Apple to collect information about emojis and search queries from its devices (134), and by Microsoft for application-level telemetry in Windows 10 (135). In 2020, the US Census Bureau released the census data with DP using the TopDown algorithm (136), which hierarchically aggregates statistics based on geographic units.
A key focus of DP applications in biomedicine has been on the release of GWAS statistics. Uhler et al. introduced DP mechanisms for releasing minor allele frequencies (MAF) and χ2 statistics for case-control GWAS (137). This work was later extended by Johnson and Yu et al. to handle larger cohorts (138) and logistic regression (139). An alternative approach based on the exponential mechanism has also been proposed (140). Simmons and Berger (141) introduced an optimization framework for privately reporting a fixed number of the most significant associations. In subsequent work, Simmons et al. developed DP methods for GWAS with correction for population stratification (142). Other notable applications of DP include the sharing of genotypic data (143), clinical trial data (144), and tabular medical records (145). DP has also been applied in interactive database settings, e.g., for count or membership queries (146, 147) and genetic matching of patients (148). In the public health domain, DP has been used to support the development of the COVID-19 Real-Time Information System for Preparedness and Epidemic Response (CRISPER) (149) and a mobile diagnostic system for coronary heart disease (150).
Despite these advances, the practical adoption of DP faces several technical challenges. Privacy parameters (e.g., ) associated with DP methods are an important factor controlling the trade-off between privacy and utility; however, there are no rigorous methods or standards for choosing an acceptable value of these parameters for a given task. Since biomedical data are typically high-dimensional, a large number of statistics need to be shared privately. Moreover, these data are often analyzed using sophisticated algorithms comprised of many steps where DP could be incorporated. As a result, designing effective DP mechanisms that optimally distribute the privacy budget can be difficult. Another limitation is that DP cannot protect every dataset; e.g., small datasets likely require an overwhelming amount of noise for DP and need to be protected using other strategies.
4.5. Federated Learning (FL)
FL allows multiple parties to collaboratively train machine learning (ML) models in a distributed manner (151). The parties share the model parameters or updates (e.g., gradients) during training, but do not directly share the training data, hence mitigating privacy risks. Two main categories of FL use cases include: (i) cross-silo, in which a small number of parties hold a substantial fraction of the data, and (ii) cross-device, where a large number of devices (possibly millions) hold a small amount of data (151). The former is more similar to traditional MPC settings where parties may represent different institutions, each of which has collected data from many individuals, while the latter is often found in consumer applications, where, for instance, millions of mobile phones may collect personal user data.
In FL, each party is limited to their local share of the data in evaluating and updating the model; thus, several approaches exist for synchronizing the state of the model across the parties. The federated averaging technique asks each party to locally compute model updates that are sent to a central server to be averaged and applied globally (152). The weights used to average these updates are typically chosen as a function of the size and quality of each party’s data (153, 154). More advanced methods such as federated matched averaging (FedMA) synchronize weights layer-wise via matching to cope with permutation invariance in neural networks (155). Other approaches avoid global synchronization and instead iteratively pass weights from party to party (156). Personalized FL is another approach, whereby each party learns a different local model that incorporates both the information from other parties and local data characteristics (157, 158).
The robustness of FL is a major challenge in practice. Issues such as network connectivity, communication constraints, and resource constraints can prevent certain parties from fully participating in every round of the protocol (159, 160). Heterogeneity across data silos or devices may also introduce concerns about inequity and limited generalization of trained models; for instance, simple averaging techniques have been shown to lead to inaccurate results in small sub-populations (161, 162, 163).
Another challenge is that FL may provide limited privacy and security protection. For example, by inspecting the model updates from other parties during multiple rounds of the protocol (164), one hospital may be able to infer the characteristics of patients in another hospital. This could reveal information such as the distribution of clinical labels, individual coordinates of feature vectors, and sometimes even entire training inputs (165, 166, 167). Furthermore, a malicious adversary could manipulate the data or the model to further their own goals at the expense of others (168).
The recent literature on FL introduces a wide range of techniques to address these limitations. Combining FL with DP can provide rigorous bounds on privacy leakage (169, 170), although doing so while maintaining the accuracy of the model can be challenging. If the central aggregator is not trusted, parties may choose to use LDP to add noise to their local gradients before aggregation (171). Alternatively, MPC, HE, or TEE can also support secure aggregation of model weights, so that no additional information is leaked other than aggregated results (164). Solutions that protect model parameters throughout the entire training procedure using encryption techniques have also been proposed (97, 98, 99). Robustness is often addressed by adapting the protocol based on the qualities of each party. For example, some methods propose to detect and remove outliers to learn from a core set of reliable parties. Others propose to alter the averaging weights to produce fairer global models that perform comparably well on each party (163). Although existing FL applications focus mainly on supervised learning, recent work extends FL to address other ML tasks, including semi-supervised, unsupervised, and reinforcement learning (172, 173, 174).
FL has touched upon numerous biomedical applications. In the cross-silo setting, FL can improve analytics and care for patients at various stages of healthcare by putting together more extensive training data to improve the performance of ML models. Notable uses include rare disease analysis (175, 176), multi-hospital collaboration for medical image analysis (177, 178, 179), and automated phenotyping and risk prediction from clinical notes (180, 181, 182). In the cross-device setting, FL has the potential to transform mobile health (183). For example, FL can allow wearable devices, such as FitBits or Apple Watches, to adapt over time to the individual’s unique health and lifestyle characteristics, such as resting heart rate, steps per day, and blood oxygen levels. These models can enable more accurate health monitoring for individuals, e.g., for gait identification and fall detection (184, 185).
5. Other Related Techniques
Several workflows involving the exchange of private data have received special attention from the privacy and security community to develop targeted methods that extend beyond the scope of PETs described in Section 4. We highlight some of these techniques.
Private information retrieval (PIR).
In PIR, a client retrieves specific items of interest from a database stored on a server without revealing the identity of the accessed items (186, 187). A naïve approach of downloading the entire database and querying it locally is impractical for large datasets. In an HE-based solution, the client uploads an encrypted query, based on which the server searches the database homomorphically and then returns the result to the client for decryption. With practical lattice-based HE (Section 4.2) and database pre-processing and amortization techniques (188, 189), recent PIR protocols have been shown to scale to databases including billions of entries (190, 191, 192, 193). Other works have extended PIR to more sophisticated queries, such as keyword search in sparse databases (194, 195) and batch querying (196, 197). In the biomedical context, PIR can enhance the utility of public data resources that require users to either download the entire database or disclose private data (e.g., genetic mutations or patient records) to the server in order to query the database. For example, PIR solutions have been proposed for outsourced storage of genomic data, which support secure retrieval of variants of interest (198, 199).
Private set intersection (PSI).
PSI addresses a problem closely related to PIR, where two parties, each holding a set of items, wish to learn the intersection between the two sets without revealing any other information to each other. PSI-size is a notable variant of PSI, where only the size of the intersection is revealed. PSI has been extensively studied, leading to practical protocols for billions of items and several variants addressing different trust assumptions, trade-offs between communication and computation, and number of parties (200, 201, 202). Several works have proposed PSI protocols for the computation of genome similarity, viewing each genome as a set of variants. Baldi et al. (203) introduced paternity testing based on PSI techniques (204, 205). Wang et al. (206) developed a PSI-based protocol for securely calculating the edit distance between genomes.
Zero-knowledge proofs (ZKP).
Verification of computation, an essential component of trustworthy data analytics systems, can be challenging when sensitive biomedical data are involved. A zero-knowledge proof (ZKP) (207) is a cryptographic primitive, related to MPC and digital signatures (208, 57), that allows one to prove the truthfulness of a statement about the data or the computation without disclosing sensitive information. For example, Goldreich et al. (209) showed that generic ZKPs (210) can be used to prove that a secure computation protocol (e.g., MPC) is carried out honestly without decrypting any intermediate value. Although generic constructions typically incur impractical computational overheads, recent advances have improved the efficiency of ZKPs under a variety of security and model assumptions (211, 212, 213, 214). Froelicher et al. (215) demonstrated that ZKPs for discrete logarithms (216) can ensure the integrity of certain HE computations in a distributed health analytics system. Chatel et al. (217) introduced a ZKP scheme based on the MPC-in-the-head paradigm (218, 219), allowing direct-to-consumer analytic service providers to verify that the user’s uploaded data are from a trusted source, thus preventing a malicious user from tampering with the analysis result.
Blockchain.
Blockchain provides a decentralized framework for securely recording and verifying transactions in a distributed network. It uses cryptographic techniques to create a chain of blocks, a time-stamped list of transactions, providing transparency, immutability, and accountability in data management. Beyond well-known applications in finance (e.g., Bitcoin), blockchain has become increasingly relevant in biomedical domains (220). A key use case is to create a secure and decentralized Health Information Exchange (HIE) to improve the management of medical records and insurance claims among various stakeholders (221). It can also be used to create a data sharing platform to support biomedical research while providing data provenance and accountability (222). Privacy protection of data exchanged through blockchains is a key challenge that often requires blockchains to be carefully combined with other encryption techniques or PETs. Another focus of ongoing research is on improving the scalability and robustness of blockchain networks, which is necessary for their deployment across a large network of institutions.
Synthetic data generation.
Creating synthetic data that resemble real data without being directly linked to private individuals has become a useful privacy-aware data sharing strategy (223). Public sharing of synthetic data can support collaborative efforts, such as data analysis competitions and validation of computational models across institutions. It can also support various academic and educational activities, for instance, by creating a realistic patient profile for use in training or public communication. Techniques for generating synthetic data have evolved alongside ML advances, particularly in deep generative models. The introduction of generative adversarial networks (GANs) and diffusion models greatly improved the synthesis of various types of biomedical data, including medical images (224, 225) and EHRs (226). However, the possibility that synthetic data can leak private information about the original data used to train the models remains a major concern (227). Recent studies have suggested that the greater expressiveness of modern generative models, in fact, increases the likelihood of private training data being reconstructed (228). Although incorporating DP into model training can help mitigate these risks (229), it can degrade the quality of the generated data, especially for high-dimensional data such as images and genomes. Future improvements in both the quality and privacy of synthetic data will be crucial in expanding its use in settings where direct sharing of data is necessary.
6. Open Challenges and Outlook
As the field of biomedical data science expands to encompass a wider variety of data modalities, more complex statistical models, and evolving computing environments, our understanding of privacy risks must also change. Studies that uncover novel privacy risks in emerging data types (e.g., transcriptomics (230, 85, 231, 232), proteomics (233), and wearable devices (234)) and computational models (e.g., diffusion and large language models (235, 228)) will be particularly valuable. Integrating these findings into practical guidelines and policies will require a thoughtful examination of the evolving incentives and capabilities of potential adversaries (236, 237, 238).
A critical aspect of PETs is the varying degrees of privacy protection they offer and how they can be aligned with our social values and the needs of practitioners. While acknowledging the value of cryptographic PETs (i.e., HE, MPC, and DP) that offer the strongest, formal notions of privacy, we must also be aware of potential pitfalls in the practical implementation of these techniques, such as software flaws (239) or violations of model assumptions (240). Technologies that offer less formal but more widely applicable privacy enhancements (i.e., TEE and FL) can be useful alternatives in some settings. A promising future direction is to explore a joint use of PETs to combine their strengths while mitigating their weaknesses, as in Extending HE to Collaborative Analysis Settings: Multiparty Homomorphic Encryption. Future policies and regulations have a crucial role to play in translating the complex privacy properties of emerging tools based on PETs into concrete guidelines for the biomedical community.
The social impact of PETs involves another key consideration: equity (241). Many studies have shown that there are inequities in emerging clinical applications of computational tools (242, 243). Rectifying these issues requires greater data sharing to create more diverse datasets, which in turn introduces new privacy challenges (244, 245). On the other hand, those whose data is most needed to improve equity in biomedicine (e.g., underrepresented groups) may also have the most to lose in the event of a privacy breach (246). Furthermore, certain PETs, such as DP, may disproportionately reduce the accuracy of ML models in populations with limited representation in the dataset (247). Navigating this complex trade-off between privacy and equity remains an important challenge.
We expect trust and transparency to play a crucial role in aligning PETs with the interests of stakeholders in organizational settings (248). PETs can be viewed as a tool to strengthen trust between stakeholders by increasing transparency and mitigating various privacy and security risks that emerge in collaborative partnerships. This perspective contrasts with a common focus in the PETs community on preventing malicious actors from breaching systems and gaining access to sensitive data. Integrating contextual values such as trust and human-centered design principles into PETs could foster the creation of tools that more effectively address the needs of the biomedical community.
As PETs continue to mature and become more broadly applicable, as demonstrated in this review, there will be a growing need to tailor these techniques to create effective algorithms and tools that address diverse biomedical workflows. A closer collaboration among PETs developers, biomedical practitioners, policymakers, as well as patients and study participants could help prioritize efforts that address the most pressing challenges. Furthermore, software development and deployment tools designed to assist researchers in incorporating PETs into their existing workflows could help ensure that these techniques are broadly accessible. The combination of advances in foundational techniques, effective algorithm design, and the establishment of social frameworks to safeguard the use of these technologies will be key to unlocking the potential of PETs in biomedical data science.
Extending HE to Collaborative Analysis Settings: Multiparty Homomorphic Encryption.
A recent line of work explores a novel use of HE to facilitate collaborative studies. Conventional HE schemes, described in Section 4.2, allow any party with the decryption key to access the private data. In contrast, threshold HE schemes (88, 89, 90, 91) use a decryption key that is secret-shared among a group of parties, allowing them to individually operate over encrypted data while ensuring that only the data values that are agreed upon among the parties can be decrypted. Similarly, multi-key schemes (92, 93) allow each party to use their own key and modify operations to support data encrypted under different keys.
These multiparty HE (MHE) schemes open the door to HE-based algorithms that can analyze private data distributed among multiple parties, analogously to the MPC setting. Recent studies (94, 95) have shown that these schemes can enable seamless integration of HE operations with efficient interactive routines, including MPC protocols, to reduce the cost of challenging operations such as bootstrapping (90). Importantly, these schemes allow each party to leverage efficient local computations using the locally available unencrypted data. MHE can thus help address the scalability limitations of standalone applications of HE or MPC by offering a federated analysis paradigm in which it is neither necessary to secret-share the entire private dataset among the parties nor to encrypt and centralize all data at a single site for analysis.
Applications of MHE are being explored in various domains, including distributed ML and linear algebra (96, 97, 98, 99) as well as collaborative biomedical analyses, such as GWAS (89, 94, 95, 100) and cell type classification (101). Recent results (95) demonstrate the practicality of this approach in handling complex biomedical tasks on the scale of modern biobanks that include hundreds of thousands of individuals. However, addressing each application currently requires time-consuming effort to design and optimize algorithms to achieve practical runtimes. Ongoing work on streamlining the development and use of these solutions, e.g., through cloud-based analysis platforms (102) and easy-to-use programming frameworks or libraries (56, 103), can help make these tools more widely available.
ACKNOWLEDGMENTS
Posted with permission from the Annual Review of Biomedical Data Science, Volume 7; copyright 2024 the author(s), https://www.annualreviews.org. This work was supported by NIH R01 HG010959 (to B.B.), RM1 HG011558 and DP5 OD029574 (to H.C.), the National Science Foundation Graduate Research Fellowship under Grant No. 2141064 and the Hertz Foundation Fellowship (to S.S.). We thank Simon Mendelsohn for comments.
Footnotes
DISCLOSURE STATEMENT
The authors are not aware of any affiliations, memberships, funding, or financial holdings that might be perceived as affecting the objectivity of this review.
LITERATURE CITED
- 1.Rehm HL, Page AJ, Smith L, Adams JB, Alterovitz G, et al. 2021. GA4GH: International policies and standards for data sharing across genomic research and healthcare. Cell genomics 1(2) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Fatumo S, Chikowore T, Choudhury A, Ayub M, Martin AR, Kuchenbäcker K. 2022. Diversity in genomic studies: A roadmap to address the imbalance. Nature medicine 28(2):243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Philippakis AA, Azzariti DR, Beltran S, Brookes AJ, Brownstein CA, et al. 2015. The matchmaker exchange: a platform for rare disease gene discovery. Human mutation 36(10):915–921 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.All of Us Research Program Investigators. 2019. The “All of Us” research program. New England Journal of Medicine 381(7):668–676 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Arellano AM, Dai W, Wang S, Jiang X, Ohno-Machado L. 2018. Privacy policy and technology in biomedical data science. Annual review of biomedical data science 1:115–129 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Gürsoy G 2022. Genome privacy and trust. Annual Review of Biomedical Data Science 5:163–181 [DOI] [PubMed] [Google Scholar]
- 7.Wan Z, Hazel JW, Clayton EW, Vorobeychik Y, Kantarcioglu M, Malin BA. 2022. Sociotech-nical safeguards for genomic data privacy. Nature Reviews Genetics 23(7):429–445 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Bonomi L, Huang Y, Ohno-Machado L. 2020. Privacy challenges and research opportunities for genomic data sharing. Nature genetics 52(7):646–654 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Berger B, Cho H. 2019. Emerging technologies towards enhancing privacy in genomic data sharing [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Beauchamp TL, et al. 2008. The belmont report. The Oxford textbook of clinical research ethics:149–155 [Google Scholar]
- 11.Nosowsky R, Giordano TJ. 2006. The Health Insurance Portability and Accountability Act of 1996 (HIPAA) privacy rule: implications for clinical research. Annu. Rev. Med 57:575–590 [DOI] [PubMed] [Google Scholar]
- 12.Isaak J, Hanna MJ. 2018. User data privacy: Facebook, cambridge analytica, and privacy protection. Computer 51(8):56–59 [Google Scholar]
- 13.Greenwald G 2014. No place to hide: Edward Snowden, the NSA, and the US surveillance state. Macmillan [Google Scholar]
- 14.HealthITSecurity. 2023. Growing Number of States Enact New Genetic Data Privacy Laws – healthitsecurity.com. https://healthitsecurity.com/news/growing-number-of-states-enact-new-genetic-data-privacy-laws. [Accessed 30–11-2023]
- 15.2023. Cybersecurity of Genomic Data — NCCoE — nccoe.nist.gov. https://www.nccoe.nist.gov/projects/cybersecurity-genomic-data. [Accessed 30–11–2023]
- 16.Cho H, Ippolito D, Yu YW. 2020. Contact tracing mobile apps for COVID-19: Privacy considerations and related trade-offs. arXiv preprint arXiv:2003.11511
- 17.Adler S 2023. First Lawsuit Filed Over 23andMe Data Breach. HIPAA Journal Available at: https://www.hipaajournal.com/first-lawsuit-filed-over-23andme-data-breach/ (Accessed: November 29th, 2023)
- 18.Erlich Y, Narayanan A. 2014. Routes for breaching and protecting genetic privacy. Nature Reviews Genetics 15(6):409–421 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Garfinkel S 2015. De-identification of Personal Information. National Institute of Standards and Technology [Google Scholar]
- 20.Clayton EW, Halverson CM, Sathe NA, Malin BA. 2018. A systematic literature review of individuals’ perspectives on privacy and genetic information in the united states. PloS one 13(10):e0204417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Steinsbekk KS, Kåre Myskja B, Solberg B. 2013. Broad consent versus dynamic consent in biobank research: is passive participation an ethical problem? European Journal of Human Genetics 21(9):897–902 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Fiume M, Cupak M, Keenan S, Rambla J, de la Torre S, et al. 2019. Federated discovery and sharing of genomic data using beacons. Nature biotechnology 37(3):220–224 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Fleurence RL, Curtis LH, Califf RM, Platt R, Selby JV, Brown JS. 2014. Launching PCOR-net, a national patient-centered clinical research network. Journal of the American Medical Informatics Association 21(4):578–582 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Zarin DA, Tse T, Williams RJ, Califf RM, Ide NC. 2011. The clinicaltrials. gov results database—update and key issues. New England Journal of Medicine 364(9):852–860 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Shringarpure SS, Bustamante CD. 2015. Privacy risks from genomic data-sharing beacons. The American Journal of Human Genetics 97(5):631–646 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Das S, Forer L, Schönherr S, Sidore C, Locke AE, et al. 2016. Next-generation genotype imputation service and methods. Nature genetics 48(10):1284–1287 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Mosca MJ, Cho H. 2023. Reconstruction of private genomes through reference-based genotype imputation. Genome Biology 24(1):271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Yao ACC. 1986. How to generate and exchange secrets. In 27th annual symposium on foundations of computer science (Sfcs 1986), pp. 162–167. IEEE [Google Scholar]
- 29.Malkhi D, Nisan N, Pinkas B, Sella Y, et al. 2004. Fairplay-Secure Two-Party Computation System. In USENIX security symposium, vol. 4, pp. 9. San Diego, CA, USA [Google Scholar]
- 30.Kolesnikov V, Schneider T. 2008. Improved garbled circuit: Free XOR gates and applications. In Automata, Languages and Programming: 35th International Colloquium, ICALP 2008, Reykjavik, Iceland, July 7–11, 2008, Proceedings, Part II 35, pp. 486–498. Springer [Google Scholar]
- 31.Pinkas B, Schneider T, Smart NP, Williams SC. 2009. Secure two-party computation is practical. In Advances in Cryptology-ASIACRYPT 2009: 15th International Conference on the Theory and Application of Cryptology and Information Security, Tokyo, Japan, December 6–10, 2009. Proceedings 15, pp. 250–267. Springer [Google Scholar]
- 32.Huang Y, Evans D, Katz J, Malka L. 2011. Faster secure Two-Party computation using garbled circuits. In 20th USENIX Security Symposium (USENIX Security 11) [Google Scholar]
- 33.Songhori EM, Hussain SU, Sadeghi AR, Schneider T, Koushanfar F. 2015. Tinygarble: Highly compressed and scalable sequential garbled circuits. In 2015 IEEE Symposium on Security and Privacy, pp. 411–428. IEEE [Google Scholar]
- 34.Liu C, Wang XS, Nayak K, Huang Y, Shi E. 2015. Oblivm: A programming framework for secure computation. In 2015 IEEE Symposium on Security and Privacy, pp. 359–376. IEEE [Google Scholar]
- 35.Rastogi A, Hammer MA, Hicks M. 2014. Wysteria: A programming language for generic, mixed-mode multiparty computations. In 2014 IEEE Symposium on Security and Privacy, pp. 655–670. IEEE [Google Scholar]
- 36.Shamir A 1979. How to share a secret. Communications of the ACM 22(11):612–613 [Google Scholar]
- 37.Blakley GR. 1979. Safeguarding cryptographic keys. In International Workshop on Managing Requirements Knowledge, pp. 313–313. IEEE Computer Society [Google Scholar]
- 38.Beaver D 1992. Efficient multiparty protocols using circuit randomization. In Advances in Cryptology—CRYPTO’91: Proceedings 11, pp. 420–432. Springer [Google Scholar]
- 39.Hastings M, Hemenway B, Noble D, Zdancewic S. 2019. Sok: General purpose compilers for secure multi-party computation. In 2019 IEEE Symposium on Security and Privacy (S&P), pp. 1220–1237. IEEE [Google Scholar]
- 40.Keller M 2020. MP-SPDZ: A versatile framework for multi-party computation. In Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, pp. 1575–1590 [Google Scholar]
- 41.Zhang Y, Steele A, Blanton M. 2013. PICCO: a general-purpose compiler for private distributed computation. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security, pp. 813–826 [Google Scholar]
- 42.Demmler D, Schneider T, Zohner M. 2015. ABY-A framework for efficient mixed-protocol secure two-party computation. In Network and Distributed System Security (NDSS) Symposium [Google Scholar]
- 43.Liu J, Juuti M, Lu Y, Asokan N. 2017. Oblivious neural network predictions via minionn transformations. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp. 619–631 [Google Scholar]
- 44.Mohassel P, Zhang Y. 2017. SecureML: A system for scalable privacy-preserving machine learning. In 2017 IEEE Symposium on Security and Privacy (S&P), pp. 19–38. IEEE [Google Scholar]
- 45.Riazi MS, Weinert C, Tkachenko O, Songhori EM, Schneider T, Koushanfar F. 2018. Chameleon: A hybrid secure computation framework for machine learning applications. In Proceedings of the 2018 on Asia conference on computer and communications security, pp. 707–721 [Google Scholar]
- 46.Mohassel P, Rindal P. 2018. ABY3: A mixed protocol framework for machine learning. In Proceedings of the 2018 ACM SIGSAC conference on computer and communications security, pp. 35–52 [Google Scholar]
- 47.Makri E, Rotaru D, Vercauteren F, Wagh S. 2021. Rabbit: Efficient comparison for secure multi-party computation. In International Conference on Financial Cryptography and Data Security, pp. 249–270. Springer [Google Scholar]
- 48.Cho H, Wu DJ, Berger B. 2018. Secure genome-wide association analysis using multiparty computation. Nature biotechnology 36(6):547–551 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Kamm L, Bogdanov D, Laur S, Vilo J. 2013. A new way to protect privacy in large-scale genome-wide association studies. Bioinformatics 29(7):886–893 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Jagadeesh KA, Wu DJ, Birgmeier JA, Boneh D, Bejerano G. 2017. Deriving genomic diagnoses without revealing patient genomes. Science 357(6352):692–695 [DOI] [PubMed] [Google Scholar]
- 51.Jha S, Kruger L, Shmatikov V. 2008. Towards practical privacy for genomic computation. In 2008 IEEE Symposium on Security and Privacy (S&P), pp. 216–230. IEEE [Google Scholar]
- 52.Bogdanov D, Kamm L, Laur S, Sokk V. 2018. Implementation and evaluation of an algorithm for cryptographically private principal component analysis on genomic data. IEEE/ACM transactions on computational biology and bioinformatics 15(5):1427–1432 [DOI] [PubMed] [Google Scholar]
- 53.Ma R, Li Y, Li C, Wan F, Hu H, et al. 2020. Secure multiparty computation for privacypreserving drug discovery. Bioinformatics 36(9):2872–2880 [DOI] [PubMed] [Google Scholar]
- 54.Hie B, Cho H, Berger B. 2018. Realizing private and practical pharmacological collaboration. Science 362(6412):347–350 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.von Maltitz M, Ballhausen H, Kaul D, Fleischmann DF, Niyazi M, et al. 2021. A privacy-preserving log-rank test for the kaplan-meier estimator with secure multiparty computation: algorithm development and validation. JMIR medical informatics 9(1):e22158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Smajlović H, Shajii A, Berger B, Cho H, Numanagić I. 2023. Sequre: a high-performance framework for secure multiparty computation enables biomedical data sharing. Genome Biology 24(1):5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Rivest RL, Shamir A, Adleman L. 1978. A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM 21(2):120–126 [Google Scholar]
- 58.ElGamal T 1985. A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE transactions on information theory 31(4):469–472 [Google Scholar]
- 59.Paillier P 1999. Public-key cryptosystems based on composite degree residuosity classes. In International conference on the theory and applications of cryptographic techniques, pp. 223–238. Springer [Google Scholar]
- 60.Goldwasser S, Micali S. 2019. Probabilistic encryption & how to play mental poker keeping secret all partial information. In Providing sound foundations for cryptography: on the work of Shafi Goldwasser and Silvio Micali [Google Scholar]
- 61.Gentry C 2009. A fully homomorphic encryption scheme. Stanford university [Google Scholar]
- 62.Gentry C, Halevi S. 2011. Implementing gentry’s fully-homomorphic encryption scheme. In Annual international conference on the theory and applications of cryptographic techniques, pp. 129–148. Springer [Google Scholar]
- 63.Fan J, Vercauteren F. 2012. Somewhat practical fully homomorphic encryption. Cryptology ePrint Archive [Google Scholar]
- 64.Brakerski Z, Gentry C, Vaikuntanathan V. 2014. (leveled) fully homomorphic encryption without bootstrapping. ACM Transactions on Computation Theory (TOCT) 6(3):1–36 [Google Scholar]
- 65.Brakerski Z 2012. Fully homomorphic encryption without modulus switching from classical GapSVP. In Annual Cryptology Conference, pp. 868–886. Springer [Google Scholar]
- 66.Cheon JH, Kim A, Kim M, Song Y. 2017. Homomorphic encryption for arithmetic of approximate numbers. In Advances in Cryptology-ASIACRYPT 2017: 23rd International Conference on the Theory and Applications of Cryptology and Information Security, Hong Kong, China, December 3–7, 2017, Proceedings, Part I 23, pp. 409–437. Springer [Google Scholar]
- 67.Regev O 2009. On lattices, learning with errors, random linear codes, and cryptography. Journal of the ACM (JACM) 56(6):1–40 [Google Scholar]
- 68.Lyubashevsky V, Peikert C, Regev O. 2010. On ideal lattices and learning with errors over rings. In Advances in Cryptology-EUROCRYPT 2010: 29th Annual International Conference on the Theory and Applications of Cryptographic Techniques, French Riviera, May 30-June 3, 2010. Proceedings 29, pp. 1–23. Springer [Google Scholar]
- 69.Bossuat JP, Mouchet C, Troncoso-Pastoriza J, Hubaux JP. 2021. Efficient bootstrapping for approximate homomorphic encryption with non-sparse keys. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 587–617. Springer [Google Scholar]
- 70.Han K, Ki D. 2020. Better bootstrapping for approximate homomorphic encryption. In CT-RSA [Google Scholar]
- 71.Chillotti I, Gama N, Georgieva M, Izabachène M. 2020. TFHE: fast fully homomorphic encryption over the torus. Journal of Cryptology 33(1):34–91 [Google Scholar]
- 72.Viand A, Jattke P, Hithnawi A. 2021. SoK: Fully Homomorphic Encryption Compilers. Proceedings - IEEE Symposium on Security and Privacy 2021-May:1092–1108 [Google Scholar]
- 73.Gilad-Bachrach R, Dowlin N, Laine K, Lauter K, Naehrig M, Wernsing J. 2016. Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. In ICML [Google Scholar]
- 74.Graepel T, Lauter K, Naehrig M. 2012. ML confidential: Machine learning on encrypted data. In International conference on information security and cryptology, pp. 1–21. Springer [Google Scholar]
- 75.Kocabas O, Soyata T. 2020. Towards privacy-preserving medical cloud computing using homomorphic encryption. In Virtual and Mobile Healthcare: Breakthroughs in Research and Practice. IGI Global [Google Scholar]
- 76.Bos JW, Lauter K, Naehrig M. 2014. Private predictive analysis on encrypted medical data. Journal of biomedical informatics 50:234–243 [DOI] [PubMed] [Google Scholar]
- 77.Blatt M, Gusev A, Polyakov Y, Goldwasser S. 2020. Secure Large-Scale Genome-Wide Association Studies using Homomorphic Encryption. Proceedings of the National Academy of Sciences 117(21):11608–11613 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Kim M, Lauter K. 2015. Private genome analysis through homomorphic encryption. In BMC medical informatics and decision making, vol. 15, pp. 1–12. BioMed Central; [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Bonte C, Makri E, Ardeshirdavani A, Simm J, Moreau Y, Vercauteren F. 2018. Towards practical privacy-preserving genome-wide association study. BMC bioinformatics 19(1):1–12 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Lu WJ, Yamada Y, Sakuma J. 2015. Privacy-preserving genome-wide association studies on cloud environment using fully homomorphic encryption. In BMC medical informatics and decision making, vol. 15, pp. 1–8. Springer; [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Zhang Y, Dai W, Jiang X, Xiong H, Wang S. 2015. Foresee: Fully outsourced secure genome study based on homomorphic encryption. In BMC medical informatics and decision making, vol. 15, pp. 1–11. BioMed Central; [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Leighton AT, Yu YW. 2023. Secure federated boolean count queries using fully-homomorphic cryptography. bioRxiv [Google Scholar]
- 83.Kantarcioglu M, Jiang W, Liu Y, Malin B. 2008. A cryptographic approach to securely share and query genomic sequences. IEEE Transactions on information technology in biomedicine 12(5):606–617 [DOI] [PubMed] [Google Scholar]
- 84.Bruekers F, Katzenbeisser S, Kursawe K, Tuyls P. 2008. Privacy-preserving matching of DNA profiles. Cryptology ePrint Archive [Google Scholar]
- 85.Ayday E, Raisaro JL, McLaren PJ, Fellay J, Hubaux JP. 2013. Privacy-Preserving Computation of Disease Risk by Using Genomic, Clinical, and Environmental Data. In 2013 USENIX workshop on health information technologies (HealthTech 13) [Google Scholar]
- 86.Kim M, Harmanci AO, Bossuat JP, Carpov S, Cheon JH, et al. 2021. Ultrafast homo-morphic encryption models enable secure outsourcing of genotype imputation. Cell systems 12(11):1108–1120 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Gürsoy G, Chielle E, Brannon CM, Maniatakos M, Gerstein M. 2022. Privacy-preserving genotype imputation with fully homomorphic encryption. Cell systems 13(2):173–182 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Desmedt YG. 1994. Threshold cryptography. European Transactions on Telecommunications 5(4):449–458 [Google Scholar]
- 89.Asharov G, Jain A, López-Alt A, Tromer E, Vaikuntanathan V, Wichs D. 2012. Multiparty computation with low communication, computation and interaction via threshold FHE. In Advances in Cryptology-EUROCRYPT 2012: 31st Annual International Conference on the Theory and Applications of Cryptographic Techniques, Cambridge, UK, April 15–19, 2012. Proceedings 31, pp. 483–501. Springer [Google Scholar]
- 90.Mouchet C, Troncoso-pastoriza JR, Bossuat JP, Hubaux JP. 2021. Multiparty Homomorphic Encryption from Ring-Learning-with-Errors. In Proceedings on Privacy Enhancing Technologies Symposium [Google Scholar]
- 91.Damgård I, Pastro V, Smart N, Zakarias S. 2012. Multiparty computation from somewhat homomorphic encryption. In Annual Cryptology Conference, pp. 643–662. Springer [Google Scholar]
- 92.Kim T, Kwak H, Lee D, Seo J, Song Y. 2022. Asymptotically Faster Multi-Key Homomorphic Encryption from Homomorphic Gadget Decomposition. Cryptology ePrint Archive [Google Scholar]
- 93.Kwak H, Lee D, Song Y, Wagh S. 2021. A Unified Framework of Homomorphic Encryption for Multiple Parties with Non-Interactive Setup. Cryptology ePrint Archive [Google Scholar]
- 94.Froelicher D, Troncoso-Pastoriza JR, Raisaro JL, Cuendet MA, Sousa JS, et al. 2021. Truly privacy-preserving federated analytics for precision medicine with multiparty homomorphic encryption. Nature communications 12(1):5910. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Cho H, Froelicher D, Chen J, Edupalli M, Pyrgelis A, et al. 2022. Secure and Federated Genome-Wide Association Studies for Biobank-Scale Datasets. bioRxiv :2022–11 [Google Scholar]
- 96.Froelicher D, Cho H, Edupalli M, Sousa JS, Bossuat J, et al. 2023. Scalable and Privacy Preserving Federated Principal Component Analysis. 2023 IEEE Symposium on Security and Privacy (S[is]&P):888–905 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Zheng W, Popa RA, Gonzalez JE, Stoica I. 2019. Helen: Maliciously Secure Coopetitive Learning for Linear Models. In IEEE S&P [Google Scholar]
- 98.Froelicher D, Troncoso-Pastoriza JR, Pyrgelis A, Sav S, Sousa JS, et al. 2021. Scalable Privacy Preserving Distributed Learning. Proceedings on Privacy Enhancing Technologies Symposium [Google Scholar]
- 99.Sav S, Pyrgelis A, Troncoso-Pastoriza JR, Froelicher D, Bossuat JP, et al. 2021. POSEIDON: Privacy-Preserving Federated Neural Network Learning. In 28Th Annual Network And Distributed System Security Symposium (Ndss 2021). Reston: INTERNET SOC [Google Scholar]
- 100.Yang M, Zhang C, Wang X, Liu X, Li S, et al. 2022. TrustGWAS: A full-process workflow for encrypted GWAS using multi-key homomorphic encryption and pseudorandom number perturbation. Cell Systems 13(9):752–767 [DOI] [PubMed] [Google Scholar]
- 101.Sav S, Bossuat JP, Troncoso-Pastoriza JR, Claassen M, Hubaux JP. 2022. Privacy-preserving federated neural network learning for disease-associated cell classification. Patterns 3(5) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Mendelsohn S, Froelicher D, Loginov D, Bernick D, Berger B, Cho H. 2023. sfkit: A Web-Based Toolkit for Secure and Federated Genomic Analysis. In Nucleic Acids Research 2023 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Li W, Kim M, Zhang K, Chen H, Jiang X, Harmanci A. 2023. COLLAGENE enables privacy-aware federated and collaborative genomic data analysis. Genome Biology 24(1):204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Sabt M, Achemlal M, Bouabdallah A. 2015. Trusted Execution Environment: What It is, and What It is Not. In 2015 IEEE Trustcom/BigDataSE/ISPA, vol. 1, pp. 57–64 [Google Scholar]
- 105.Pinto S, Santos N. 2019. Demystifying Arm TrustZone: A Comprehensive Survey. ACM Com-put. Surv 51(6) [Google Scholar]
- 106.Banks AS, Kisiel M, Korsholm P. 2021. Remote Attestation: A Literature Review
- 107.Costan V, Devadas S. 2016. Intel SGX Explained. Cryptology ePrint Archive, Paper 2016/086. https://eprint.iacr.org/2016/086 [Google Scholar]
- 108.Intel Corporation. 2022. Intel® trust domain extensions. White paper, Intel Corporation. Accessed: 2023–11–22 [Google Scholar]
- 109.Kaplan D, Powell J, Woller T. 2021. Amd memory encryptiobn. White paper, AMD. Accessed: 2023–11–22 [Google Scholar]
- 110.Nertney R 2023. Confidential Compute on NVIDIA Hopper H100. White paper WP-11459–001, NVIDIA. Initial Release — Early Access [Google Scholar]
- 111.Borrello P, Kogler A, Schwarzl M, Lipp M, Gruss D, Schwarz M. 2022. ÆPIC Leak: Architecturally Leaking Uninitialized Data from the Microarchitecture. In 31st USENIX Security Symposium (USENIX Security 22) [Google Scholar]
- 112.van Schaik S, Seto A, Yurek T, Batori A, AlBassam B, et al. 2022. SoK: SGX.Fail: How Stuff Get eXposed [Google Scholar]
- 113.Fei S, Yan Z, Ding W, Xie H. 2021. Security Vulnerabilities of SGX and Countermeasures: A Survey. ACM Comput. Surv 54(6) [Google Scholar]
- 114.Dokmai N, Kockan C, Zhu K, Wang X, Sahinalp SC, Cho H. 2021. Privacy-preserving genotype imputation in a trusted execution environment. Cell Systems 12(10):983–993.e7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115.BeeKeeperAI. 2022. BeeKeeperAI Applies Sightless Computing Technology to Pediatric Rare Disease Project. https://www.beekeeperai.com/beekeeperai-novartis-pediatric-rare-disease-press-release. Accessed: 2023–11–22
- 116.Intel Corporation. 2021. Maximum Security at the Processor Level: Intel® SGX Protects Electronic Patient Record. Solution brief, Intel Corporation [Google Scholar]
- 117.Pascoal T, Decouchant J, Boutet A, Esteves-Verissimo P. 2021. Dyps: Dynamic, private and secure gwas. Proceedings on Privacy Enhancing Technologies [Google Scholar]
- 118.Kockan C, Zhu K, Dokmai N, Karpov N, Kulekci MO, et al. 2020. Sketching algorithms for genomic data analysis and querying in a secure enclave. Nature Methods 17(3):295–301 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 119.Widanage C, Liu W, Li J, Chen H, Wang X, et al. 2021. HySec-Flow: Privacy-Preserving Genomic Computing with SGX-based Big-Data Analytics Framework. In 2021 IEEE 14th International Conference on Cloud Computing (CLOUD), pp. 733–743. Los Alamitos, CA, USA: IEEE Computer Society; [DOI] [PMC free article] [PubMed] [Google Scholar]
- 120.Dwork C, McSherry F, Nissim K, Smith A. 2006. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, pp. 265–284. Springer [Google Scholar]
- 121.Dwork C, Kenthapadi K, McSherry F, Mironov I, Naor M. 2006. Our Data, Ourselves: Privacy Via Distributed Noise Generation. In EUROCRYPT [Google Scholar]
- 122.Dwork C, Rothblum GN. 2016. Concentrated differential privacy. arXiv preprint arXiv:1603.01887
- 123.Bun M, Steinke T. 2016. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, pp. 635–658. Springer [Google Scholar]
- 124.Mironov I 2017. Rényi differential privacy. In 2017 IEEE 30th computer security foundations symposium (CSF), pp. 263–275. IEEE [Google Scholar]
- 125.Chaudhuri K, Monteleoni C, Sarwate AD. 2011. Differentially private empirical risk minimization. Journal of Machine Learning Research 12(3) [PMC free article] [PubMed] [Google Scholar]
- 126.Iyengar R, Near JP, Song D, Thakkar O, Thakurta A, Wang L. 2019. Towards practical differentially private convex optimization. In 2019 IEEE Symposium on Security and Privacy (S&P), pp. 299–316. IEEE [Google Scholar]
- 127.Bassily R, Smith A, Thakurta A. 2014. Private empirical risk minimization: Efficient algorithms and tight error bounds. In 2014 IEEE 55th annual symposium on foundations of computer science, pp. 464–473. IEEE [Google Scholar]
- 128.Abadi M, Chu A, Goodfellow I, McMahan HB, Mironov I, et al. 2016. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp. 308–318 [Google Scholar]
- 129.Nissim K, Raskhodnikova S, Smith A. 2007. Smooth sensitivity and sampling in private data analysis. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pp. 75–84 [Google Scholar]
- 130.Warner SL 1965. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association 60(309):63–69 [PubMed] [Google Scholar]
- 131.Dwork C, Naor M, Reingold O, Rothblum GN, Vadhan S. 2009. On the Complexity of Differentially Private Data Release: Efficient Algorithms and Hardness Results. STOC ‘09. New York, NY, USA: Association for Computing Machinery [Google Scholar]
- 132.Kairouz P, Bonawitz K, Ramage D. 2016. Discrete distribution estimation under local privacy. In International Conference on Machine Learning, pp. 2436–2444. PMLR [Google Scholar]
- 133.Erlingsson Ú, Pihur V, Korolova A. 2014. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pp. 1054–1067 [Google Scholar]
- 134.Differential Privacy Team A. 2017. Learning with privacy at scale. https://machinelearning.apple.com/docs/learning-with-privacy-at-scale/appledifferentialprivacysystem.pdf
- 135.Ding B, Kulkarni J, Yekhanin S. 2017. Collecting telemetry data privately. Advances in Neural Information Processing Systems 30 [Google Scholar]
- 136.Abowd J, Kifer D, Garfinkel SL, Machanavajjhala A. 2019. Census TopDown: Differentially Private Data, Incremental Schemas, and Consistency with Public Knowledge
- 137.Uhlerop C, Slavković A, Fienberg SE. 2013. Privacy-preserving data sharing for genome-wide association studies. The Journal of privacy and confidentiality 5(1):137. [PMC free article] [PubMed] [Google Scholar]
- 138.Yu F, Fienberg SE, Slavković AB, Uhler C. 2014. Scalable privacy-preserving data sharing methodology for genome-wide association studies. Journal of biomedical informatics 50:133–141 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 139.Yu F, Rybar M, Uhler C, Fienberg SE. 2014. Differentially-private logistic regression for detecting multiple-SNP association in GWAS databases. In Privacy in Statistical Databases: UNESCO Chair in Data Privacy, International Conference, PSD 2014, Ibiza, Spain, September 17-19, 2014. Proceedings, pp. 170–184. Springer [Google Scholar]
- 140.Johnson A, Shmatikov V. 2013. Privacy-preserving data exploration in genome-wide association studies. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1079–1087 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 141.Simmons S, Berger B. 2016. Realizing privacy preserving genome-wide association studies. Bioinformatics 32(9):1293–1300 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 142.Simmons S, Sahinalp C, Berger B. 2016. Enabling privacy-preserving GWASs in heterogeneous human populations. Cell systems 3(1):54–61 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 143.Wang S, Mohammed N, Chen R. 2014. Differentially private genome data dissemination through top-down specialization. BMC medical informatics and decision making 14(1):1–7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 144.Beaulieu-Jones BK, Wu ZS, Williams C, Lee R, Bhavnani SP, et al. 2019. Privacy-preserving generative deep neural networks support clinical data sharing. Circulation: Cardiovascular Quality and Outcomes 12(7):e005122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 145.Mohammed N, Jiang X, Chen R, Fung BC, Ohno-Machado L. 2013. Privacy-preserving heterogeneous health data sharing. Journal of the American Medical Informatics Association 20(3):462–469 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 146.Cho H, Simmons S, Kim R, Berger B. 2020. Privacy-preserving biomedical database queries with optimal privacy-utility trade-offs. Cell systems 10(5):408–416 [DOI] [PubMed] [Google Scholar]
- 147.Vinterbo SA, Sarwate AD, Boxwala AA. 2012. Protecting count queries in study design. Journal of the American Medical Informatics Association 19(5):750–757 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 148.Wei J, Lin Y, Yao X, Zhang J, Liu X. 2020. Differential privacy-based genetic matching in personalized medicine. IEEE Transactions on Emerging Topics in Computing 9(3):1109–1125 [Google Scholar]
- 149.Field E, Dyda A, Lau C. 2021. COVID-19 Real-time Information System for Preparedness and Epidemic Response (CRISPER)”, keywords = ”COVID-19, Databases as topic, Epidemics, Infectious diseases, Information management, Information storage and retrieval. Medical Journal of Australia (8) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 150.Liu X, Zhou P, Qiu T, Wu DO. 2020. Blockchain-enabled contextual online learning under local differential privacy for coronary heart disease diagnosis in mobile edge computing. IEEE Journal of Biomedical and Health Informatics 24(8):2177–2188 [DOI] [PubMed] [Google Scholar]
- 151.Kairouz P, McMahan HB, Avent B, Bellet A, Bennis M, et al. 2021. Advances and open problems in federated learning. Foundations and Trends® in Machine Learning 14(1–2):1–210 [Google Scholar]
- 152.McMahan B, Moore E, Ramage D, Hampson S, y Arcas BA. 2017. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp. 1273–1282. PMLR [Google Scholar]
- 153.Li T, Sanjabi M, Beirami A, Smith V. 2020. Fair Resource Allocation in Federated Learning. In International Conference on Learning Representations [Google Scholar]
- 154.Reddi SJ, Charles Z, Zaheer M, Garrett Z, Rush K, et al. 2020. Adaptive Federated Optimization. In International Conference on Learning Representations [Google Scholar]
- 155.Wang H, Yurochkin M, Sun Y, Papailiopoulos D, Khazaeni Y. 2020. Federated Learning with Matched Averaging. In International Conference on Learning Representations [Google Scholar]
- 156.Hegedűs I, Danner G, Jelasity M. 2019. Gossip learning as a decentralized alternative to federated learning. In Distributed Applications and Interoperable Systems: 19th IFIP WG 6.1 International Conference, DAIS 2019, Held as Part of the 14th International Federated Conference on Distributed Computing Techniques, DisCoTec 2019, Kongens Lyngby, Denmark, June 17–21, 2019, Proceedings 19, pp. 74–90. Springer [Google Scholar]
- 157.Tan AZ, Yu H, Cui L, Yang Q. 2022. Towards personalized federated learning. IEEE Transactions on Neural Networks and Learning Systems [DOI] [PubMed] [Google Scholar]
- 158.Achituve I, Shamsian A, Navon A, Chechik G, Fetaya E. 2021. Personalized federated learning with gaussian processes. Advances in Neural Information Processing Systems 34:8392–8406 [Google Scholar]
- 159.Wang S, Tuor T, Salonidis T, Leung KK, Makaya C, et al. 2019. Adaptive federated learning in resource constrained edge computing systems. IEEE journal on selected areas in communications 37(6):1205–1221 [Google Scholar]
- 160.Zhao Y, Li M, Lai L, Suda N, Civin D, Chandra V. 2018. Federated learning with non-iid data
- 161.Li T, Hu S, Beirami A, Smith V. 2021. Ditto: Fair and robust federated learning through personalization. In International Conference on Machine Learning, pp. 6357–6368. PMLR [Google Scholar]
- 162.Michieli U, Ozay M. 2021. Are all users treated fairly in federated learning systems? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2318–2322 [Google Scholar]
- 163.Zhang DY, Kou Z, Wang D. 2020. Fairfl: A fair federated learning approach to reducing demographic bias in privacy-sensitive classification models. In 2020 IEEE International Conference on Big Data (Big Data), pp. 1051–1060. IEEE [Google Scholar]
- 164.So J, Ali RE, Güler B, Jiao J, Avestimehr AS. 2023. Securing secure aggregation: Mitigating multi-round privacy leakage in federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 9864–9873 [Google Scholar]
- 165.Geiping J, Bauermeister H, Dröge H, Moeller M. 2020. Inverting gradients-how easy is it to break privacy in federated learning? Advances in Neural Information Processing Systems 33:16937–16947 [Google Scholar]
- 166.Huang Y, Gupta S, Song Z, Li K, Arora S. 2021. Evaluating gradient inversion attacks and defenses in federated learning. Advances in Neural Information Processing Systems 34:7232–7241 [Google Scholar]
- 167.Al Mallah R, Lopez D, Badu-Marfo G, Farooq B. 2023. Untargeted poisoning attack detection in federated learning via behavior attestation. IEEE Access [Google Scholar]
- 168.Tolpegin V, Truex S, Gursoy ME, Liu L. 2020. Data poisoning attacks against federated learning systems. In Computer Security-ESORICS 2020: 25th European Symposium on Research in Computer Security, ESORICS 2020, Guildford, UK, September 14–18, 2020, Proceedings, Part I 25, pp. 480–501. Springer [Google Scholar]
- 169.Hu R, Guo Y, Li H, Pei Q, Gong Y. 2020. Personalized federated learning with differential privacy. IEEE Internet of Things Journal 7(10):9530–9539 [Google Scholar]
- 170.Noble M, Bellet A, Dieuleveut A. 2022. Differentially private federated learning on heterogeneous data. In International Conference on Artificial Intelligence and Statistics, pp. 10110–10145. PMLR [Google Scholar]
- 171.Truex S, Liu L, Chow KH, Gursoy ME, Wei W. 2020. LDP-Fed: Federated learning with local differential privacy. In Proceedings of the Third ACM International Workshop on Edge Systems, Analytics and Networking, pp. 61–66 [Google Scholar]
- 172.Grammenos A, Mendoza Smith R, Crowcroft J, Mascolo C. 2020. Federated principal component analysis. Advances in Neural Information Processing Systems 33:6453–6464 [Google Scholar]
- 173.Mansour Y, Mohri M, Ro J, Suresh AT. 2020. Three approaches for personalization with applications to federated learning. arXiv preprint arXiv:2002.10619
- 174.Chen Y, Qin X, Wang J, Yu C, Gao W. 2020. Fedhealth: A federated transfer learning framework for wearable healthcare. IEEE Intelligent Systems 35(4):83–93 [Google Scholar]
- 175.Pati S, Baid U, Edwards B, Sheller M, Wang SH, et al. 2022. Federated learning enables big data for rare cancer boundary detection. Nature communications 13(1):7346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 176.Darzidehkalani E, Ghasemi-Rad M, van Ooijen P. 2022. Federated learning in medical imaging: part i: toward multicentral health care ecosystems. Journal of the American College of Radiology 19(8):969–974 [DOI] [PubMed] [Google Scholar]
- 177.Ng D, Lan X, Yao MMS, Chan WP, Feng M. 2021. Federated learning: a collaborative effort to achieve better medical imaging models for individual sites that have small labelled datasets. Quantitative Imaging in Medicine and Surgery 11(2):852. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 178.Sarma KV, Harmon S, Sanford T, Roth HR, Xu Z, et al. 2021. Federated learning improves site performance in multicenter deep learning without data sharing. Journal of the American Medical Informatics Association 28(6):1259–1264 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 179.Kaissis G, Ziller A, Passerat-Palmbach J, Ryffel T, Usynin D, et al. 2021. End-to-end privacy preserving deep learning on multi-institutional medical imaging. Nature Machine Intelligence 3(6):473–484 [Google Scholar]
- 180.Vaid A, Jaladanki SK, Xu J, Teng S, Kumar A, et al. 2021. Federated learning of electronic health records to improve mortality prediction in hospitalized patients with COVID-19: machine learning approach. JMIR medical informatics 9(1):e24207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 181.Brisimi TS, Chen R, Mela T, Olshevsky A, Paschalidis IC, Shi W. 2018. Federated learning of predictive models from federated electronic health records. International journal of medical informatics 112:59–67 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 182.Liu D, Dligach D, Miller T. 2019. Two-stage federated phenotyping and patient representation learning. In Proceedings of the conference. Association for Computational Linguistics. Meeting, vol. 2019, pp. 283. NIH Public Access; [DOI] [PMC free article] [PubMed] [Google Scholar]
- 183.Paulik M, Seigel M, Mason H, Telaar D, Kluivers J, et al. 2021. Federated evaluation and tuning for on-device personalization: System design & applications. arXiv preprint arXiv:2102.08503
- 184.Wu Q, Chen X, Zhou Z, Zhang J. 2020. Fedhome: Cloud-edge based personalized federated learning for in-home health monitoring. IEEE Transactions on Mobile Computing 21(8):2818–2832 [Google Scholar]
- 185.Ghosh S, Ghosh SK. 2023. Feel: Federated learning framework for elderly healthcare using edge-iomt. IEEE Transactions on Computational Social Systems [Google Scholar]
- 186.Chor B, Kushilevitz E, Goldreich O, Sudan M. 1998. Private information retrieval. Journal of the ACM (JACM) 45(6):965–981 [Google Scholar]
- 187.Kushilevitz E, Ostrovsky R. 1997. Replication is not needed: Single database, computationally-private information retrieval. In Proceedings 38th annual symposium on foundations of computer science, pp. 364–373. IEEE [Google Scholar]
- 188.Beimel A, Ishai Y, Malkin T. 2000. Reducing the servers computation in private information retrieval: PIR with preprocessing. In Advances in Cryptology—CRYPTO 2000: 20th Annual International Cryptology Conference Santa Barbara, California, USA, August 20–24, 2000 Proceedings 20, pp. 55–73. Springer [Google Scholar]
- 189.Corrigan-Gibbs H, Kogan D. 2020. Private information retrieval with sublinear online time. In Advances in Cryptology-EUROCRYPT 2020: 39th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Zagreb, Croatia, May 10–14, 2020, Proceedings, Part I 39, pp. 44–75. Springer [Google Scholar]
- 190.Melchor CA, Barrier J, Fousse L, Killijian MO. 2016. XPIR: Private information retrieval for everyone. Proceedings on Privacy Enhancing Technologies:155–174 [Google Scholar]
- 191.Davidson A, Pestana G, Celi S. 2022. Frodopir: Simple, scalable, single-server private information retrieval. Cryptology ePrint Archive [Google Scholar]
- 192.Menon SJ, Wu DJ. 2022. Spiral: Fast, high-rate single-server PIR via FHE composition. In 2022 IEEE Symposium on Security and Privacy (S&P), pp. 930–947. IEEE [Google Scholar]
- 193.Henzinger A, Hong MM, Corrigan-Gibbs H, Meiklejohn S, Vaikuntanathan V. 2023. One server for the price of two: Simple and fast single-server private information retrieval. In Usenix Security, vol. 23 [Google Scholar]
- 194.Chor B, Gilboa N, Naor M. 1997. Private information retrieval by keywords
- 195.Patel S, Seo JY, Yeo K. 2023. Don’t be Dense: Efficient Keyword PIR for Sparse Databases. Cryptology ePrint Archive [Google Scholar]
- 196.Ishai Y, Kushilevitz E, Ostrovsky R, Sahai A. 2004. Batch codes and their applications. In Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pp. 262–271 [Google Scholar]
- 197.Angel S, Chen H, Laine K, Setty S. 2018. PIR with compressed queries and amortized query processing. In 2018 IEEE symposium on security and privacy (S&P), pp. 962–979. IEEE [Google Scholar]
- 198.Sousa JS, Lefebvre C, Huang Z, Raisaro JL, Aguilar-Melchor C, et al. 2017. Efficient and secure outsourcing of genomic data storage. BMC Medical Genomics 10(2):46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 199.Çetin GS, Chen H, Laine K, Lauter K, Rindal P, Xia Y. 2017. Private queries on encrypted genomic data. BMC Medical Genomics 10(2):45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 200.Freedman MJ, Nissim K, Pinkas B. 2004. Efficient private matching and set intersection. In International conference on the theory and applications of cryptographic techniques, pp. 1–19. Springer [Google Scholar]
- 201.Pinkas B, Rosulek M, Trieu N, Yanai A. 2019. SpOT-light: lightweight private set intersection from sparse OT extension. In Advances in Cryptology–CRYPTO 2019: 39th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 18–22, 2019, Proceedings, Part III 39, pp. 401–431. Springer [Google Scholar]
- 202.Chase M, Miao P. 2020. Private set intersection in the internet setting from lightweight oblivious PRF. In Advances in Cryptology–CRYPTO 2020: 40th Annual International Cryptology Conference, CRYPTO 2020, Santa Barbara, CA, USA, August 17–21, 2020, Proceedings, Part III 40, pp. 34–63. Springer [Google Scholar]
- 203.Baldi P, Baronio R, De Cristofaro E, Gasti P, Tsudik G. 2011. Countering gattaca: efficient and secure testing of fully-sequenced human genomes. In Proceedings of the 18th ACM conference on Computer and communications security, pp. 691–702 [Google Scholar]
- 204.Agrawal R, Evfimievski A, Srikant R. 2003. Information Sharing across Private Databases. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, SIGMOD ’03, pp. 86–97. New York, NY, USA: Association for Computing Machinery [Google Scholar]
- 205.De Cristofaro E, Gasti P, Tsudik G. 2012. Fast and private computation of cardinality of set intersection and union. In International Conference on Cryptology and Network Security, pp. 218–231. Springer [Google Scholar]
- 206.Wang XS, Huang Y, Zhao Y, Tang H, Wang X, Bu D. 2015. Efficient Genome-Wide, Privacy-Preserving Similar Patient Query Based on Private Edit Distance. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS ’15, pp. 492–503. New York, NY, USA: Association for Computing Machinery [Google Scholar]
- 207.Goldwasser S, Micali S, Rackoff C. 1985. The Knowledge Complexity of Interactive Proof-Systems. In Proceedings of the Seventeenth Annual ACM Symposium on Theory of Computing, STOC ’85, pp. 291–304. New York, NY, USA: Association for Computing Machinery [Google Scholar]
- 208.Diffie W, Hellman M. 1976. New directions in cryptography. IEEE Transactions on Information Theory 22(6):644–654 [Google Scholar]
- 209.Goldreich O, Micali S, Wigderson A. 1987. How to Play ANY Mental Game. In Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing, STOC ’87, pp. 218–229. New York, NY, USA: Association for Computing Machinery [Google Scholar]
- 210.Goldreich O, Micali S, Wigderson A. 1991. Proofs that yield nothing but their validity or all languages in np have zero-knowledge proof systems. Journal of the ACM (JACM) 38(3):690–728 [Google Scholar]
- 211.Parno B, Howell J, Gentry C, Raykova M. 2016. Pinocchio: Nearly practical verifiable computation. Communications of the ACM 59(2):103–112 [Google Scholar]
- 212.Ben-Sasson E, Bentov I, Horesh Y, Riabzev M. 2018. Scalable, transparent, and post-quantum secure computational integrity. Cryptology ePrint Archive [Google Scholar]
- 213.Bünz B, Bootle J, Boneh D, Poelstra A, Wuille P, Maxwell G. 2018. Bulletproofs: Short proofs for confidential transactions and more. In 2018 IEEE symposium on security and privacy (S&P), pp. 315–334. IEEE [Google Scholar]
- 214.Xie T, Zhang Y, Song D. 2022. Orion: Zero knowledge proof with linear prover time. In Annual International Cryptology Conference, pp. 299–328. Springer [Google Scholar]
- 215.Froelicher D, Egger P, Sousa JS, Raisaro JL, Huang Z, et al. 2017. Unlynx: a decentralized system for privacy-conscious data sharing. Proceedings on Privacy Enhancing Technologies (PoPETS) 2017(4):232–250 [Google Scholar]
- 216.Camenisch J, Stadler M. 1997. Proof systems for general statements about discrete logarithms. Technical Report/ETH Zurich, Department of Computer Science 260 [Google Scholar]
- 217.Chatel S, Pyrgelis A, Troncoso-Pastoriza JR, Hubaux JP. 2021. Privacy and integrity preserving computations with CRISP. In 30th USENIX Security Symposium (USENIX Security 21), pp. 2111–2128 [Google Scholar]
- 218.Chase M, Derler D, Goldfeder S, Orlandi C, Ramacher S, et al. 2017. Post-quantum zero-knowledge and signatures from symmetric-key primitives. In Proceedings of the 2017 acm sigsac conference on computer and communications security, pp. 1825–1842 [Google Scholar]
- 219.Ishai Y, Kushilevitz E, Ostrovsky R, Sahai A. 2009. Zero-knowledge proofs from secure multiparty computation. SIAM Journal on Computing 39(3):1121–1152 [Google Scholar]
- 220.Kuo TT, Kim HE, Ohno-Machado L. 2017. Blockchain distributed ledger technologies for biomedical and health care applications. Journal of the American Medical Informatics Association 24(6):1211–1220 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 221.Esmaeilzadeh P, Mirzaei T. 2019. The potential of blockchain technology for health information exchange: experimental study from patients’ perspectives. Journal of medical Internet research 21(6):e14184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 222.Grishin D, Raisaro JL, Troncoso-Pastoriza JR, Obbad K, Quinn K, et al. 2021. Citizen-centered, auditable and privacy-preserving population genomics. Nature Computational Science 1(3):192–198 [DOI] [PubMed] [Google Scholar]
- 223.Yan C, Yan Y, Wan Z, Zhang Z, Omberg L, et al. 2022. A multifaceted benchmarking of synthetic electronic health record generation models. Nature communications 13(1):7609. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 224.Kazerouni A, Aghdam EK, Heidari M, Azad R, Fayyaz M, et al. 2023. Diffusion models in medical imaging: A comprehensive survey. Medical Image Analysis :102846. [DOI] [PubMed] [Google Scholar]
- 225.Jeon M, Park H, Kim HJ, Morley M, Cho H. 2022. k-SALSA: k-anonymous synthetic averaging of retinal images via local style alignment. In European Conference on Computer Vision, pp. 661–678. Springer; [DOI] [PMC free article] [PubMed] [Google Scholar]
- 226.Zhang Z, Yan C, Lasko TA, Sun J, Malin BA. 2021. SynTEG: a framework for temporal structured electronic health data simulation. Journal of the American Medical Informatics Association 28(3):596–604 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 227.Zhang Z, Yan C, Malin BA. 2022. Membership inference attacks against synthetic health data. Journal of biomedical informatics 125:103977. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 228.Carlini N, Hayes J, Nasr M, Jagielski M, Sehwag V, et al. 2023. Extracting training data from diffusion models. In 32nd USENIX Security Symposium (USENIX Security 23), pp. 5253–5270 [Google Scholar]
- 229.Torkzadehmahani R, Kairouz P, Paten B. 2019. Dp-cgan: Differentially private synthetic data and label generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0 [Google Scholar]
- 230.Sadhuka S, Fridman D, Berger B, Cho H. 2023. Assessing transcriptomic reidentification risks using discriminative sequence models. Genome research 33(7):1101–1112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 231.Gürsoy G, Li T, Liu S, Ni E, Brannon CM, Gerstein MB. 2022. Functional genomics data: privacy risk assessment and technological mitigation. Nature Reviews Genetics 23(4):245–258 [DOI] [PubMed] [Google Scholar]
- 232.Schadt EE, Woo S, Hao K. 2012. Bayesian method to predict individual snp genotypes from gene expression data. Nature genetics 44(5):603–608 [DOI] [PubMed] [Google Scholar]
- 233.Hill AC, Guo C, Litkowski EM, Manichaikul AW, Yu B, et al. 2023. Large scale proteomic studies create novel privacy considerations. Scientific Reports 13(1):9254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 234.Li H, Wu J, Gao Y, Shi Y. 2016. Examining individuals’ adoption of healthcare wearable devices: An empirical study from privacy calculus perspective. International journal of medical informatics 88:8–17 [DOI] [PubMed] [Google Scholar]
- 235.Nasr M, Carlini N, Hayase J, Jagielski M, Cooper AF, et al. 2023. Scalable Extraction of Training Data from (Production) Language Models
- 236.Guo J, Clayton EW, Kantarcioglu M, Vorobeychik Y, Wooders M, et al. 2023. A game theoretic approach to balance privacy risks and familial benefits. Scientific Reports 13(1):6932. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 237.Xia W, Liu Y, Wan Z, Vorobeychik Y, Kantacioglu M, et al. 2021. Enabling realistic health data re-identification risk assessment through adversarial modeling. Journal of the American Medical Informatics Association 28(4):744–752 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 238.Berrang P, Humbert M, Zhang Y, Lehmann I, Eils R, Backes M. 2018. Dissecting privacy risks in biomedical data. In 2018 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 62–76. IEEE [Google Scholar]
- 239.Mironov I. 2012. On significance of the least significant bits for differential privacy. In Proceedings of the 2012 ACM conference on Computer and communications security, pp. 650–661 [Google Scholar]
- 240.Liu C, Chakraborty S, Mittal P. 2016. Dependence makes you vulnberable: Differential privacy under dependent tuples. In NDSS, vol. 16, pp. 21–24 [Google Scholar]
- 241.Chen IY, Pierson E, Rose S, Joshi S, Ferryman K, Ghassemi M. 2021. Ethical machine learning in healthcare. Annual review of biomedical data science 4:123–144 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 242.Ding Y, Hou K, Xu Z, Pimplaskar A, Petter E, et al. 2023. Polygenic scoring accuracy varies across the genetic ancestry continuum. Nature :1–8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 243.Movva R, Shanmugam D, Hou K, Pathak P, Guttag J, et al. 2023. Coarse race data conceals disparities in clinical risk score performance. arXiv preprint arXiv:2304.09270
- 244.Bak M, Madai VI, Fritzsche MC, Mayrhofer MT, McLennan S. 2022. You can’t have ai both ways: balancing health data privacy and access fairly. Frontiers in Genetics 13:1490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 245.Seastedt KP, Schwab P, O’Brien Z, Wakida E, Herrera K, et al. 2022. Global healthcare fairness: We should be sharing more, not less, data. PLOS Digital Health 1(10):e0000102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 246.Xiao Y, Lim S, Pollard TJ, Ghassemi M. 2023. In the Name of Fairness: Assessing the Bias in Clinical Record De-identification. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pp. 123–137 [Google Scholar]
- 247.Suriyakumar VM, Papernot N, Goldenberg A, Ghassemi M. 2021. Chasing your long tails: Differentially private prediction in health care settings. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 723–734 [Google Scholar]
- 248.Mayer RC, Davis JH, Schoorman FD. 1995. An integrative model of organizational trust. The Academy of Management Review 20(3):709–734 [Google Scholar]
