Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Sep 19.
Published in final edited form as: Lancet Digit Health. 2023 May;5(5):e251–e253. doi: 10.1016/S2589-7500(23)00044-4

Augmenting Digital Twins with Federated Learning in Medicine

Divya Nagaraj 1, Priya Khandelwal 1, Sandra Steyaert 2, Olivier Gevaert 2
PMCID: PMC10507798  NIHMSID: NIHMS1931260  PMID: 37100540

Abstract

Providing increasingly personalized treatments to patients is a major goal of precision medicine, and digital twins are an emerging paradigm to support this goal. A clinical digital twin is a digital representation of a patient and can be used to deliver personalized treatment recommendations. However, the centralized data collection to support and train digital twin models is already brushing up against patient privacy restrictions. We posit that the use of federated learning, an approach to decentralized machine learning model training, can support digital twins’ performance for clinical applications. We emphasize that the combination of the two could alleviate privacy concerns while bolstering machine learning model performance and resulting predictions.


Clinical digital twins (DTs) are virtual representations of patients that evolve alongside treatment, making them valuable for a variety of predictive applications.1,2 Hailed as a paradigm shift in medical treatment, DTs face significant challenges in particular in regards to privacy concerns before adoption. Here, we identify federated learning (FL) as a unique solution to tackling this challenge while also enabling proliferation and active sharing of DT technology without the necessity to reveal protected information.

FL is a technique that uses a decentralized approach to training machine learning models.3,4 Rather than collecting local data samples onto large servers for constant fine-tuning of machine learning models, FL allows devices to train or update models without explicitly exchanging data but keeping the data locally on the device and only sharing the model updates.

DTs represent a new computing paradigm, where real-world systems are paired with virtual representations that can dynamically reflect changes in their real-world counterparts. The key feature of a DT is a dynamic bidirectional link with DTs being able to be customized for different purposes.5 The bidirectional links connect the digital representation and the physical system, resulting in the DT being an up-to-date representation of the physical analogue. A DT can be used to identify insights via simulations, and these insights can be fed back to the physical entity and subsequently fed back into the DT. DTs can be created at a variety of scales such as at a community, patient, organ system, biological, or molecular level, enabling different insights to be gleaned at each of these scales. One can also imagine several DTs operating simultaneously and communicating with each other. From a precision medicine perspective, each of these levels adds valuable information for personalizing treatment; however, the patient level would arguably be the foundation for precision medicine.

Precision medicine is the promise of adjusting treatment to a particular subgroup of patients or an individual patient based on their biomarkers, especially genetic or genomic biomarkers. Precision medicine enabled by DTs can offer a new degree of insights into personalized risk factors, drug interactions, procedure safety, and treatment options.2 However, precision medicine relies on a plethora of data, including molecular, clinical, lifestyle (e.g. from wearables), behavioral, and environmental data.6

Collecting and storing a large amount of data in a centralized place raises privacy concerns and increases data leak risks. Simply removing patient identifiers is not sufficient, as this data can be reconstructed.7 FL can provide a solution, where the aggregate power of these individual data modalities can be unlocked without the risks associated with centralizing this data.

Thus, while research into DTs and FL has until now been disjoint, we argue that the intersection of both emerging technologies will yield promising benefits for clinical applications. A large traditional hesitation with creating DTs is the significant data needed to build in silico patient representations, as sharing large amounts of data raise questions about data governance and privacy.8 FL can help ameliorate concerns about centralized storage of patient-sensitive information and the privacy risks associated with transferring this data.

Common challenges of federated learning include data heterogeneity, traceability, and explainability may also be present when FL is applied to DTs. However, we don’t envision that DTs will make FL implementation challenges harder. For instance, DT models may actually be more explainable due to mechanistic modeling components. Despite these challenges, FL applied to DTs may be necessary as multimodal data will not always be available in a centralized fashion for each organ, patient, or medical center.

When determining the immense potential of FL for DT applications, there are three important considerations: the lack of existing large datasets, patient privacy, and FL platforms.

First, a significant motivator for the need for DTs is the lack of existing large datasets. DTs are supported by large quantities of diverse data types, which are difficult to acquire and centralize in large repositories due to the data volume, data heterogeneity and diversity of acquisition (Figure 1).9 FL ameliorates the pressure to centralize all these various data forms to assemble a DT.

Figure 1:

Figure 1:

Figure 1:

Digital twins (DTs) can contain data from a variety of modalities such as notes, imaging, and wearables (Panel A). This vast amount of data can be processed with federated learning by just sharing model weights (Panel B) and could potentially be used to inform treatment options (for instance, lines of therapy for oncology).

A second area is with regard to patient privacy, which happens in two ways. Instead of collecting data in a centralized repository, FL for clinical DTs could happen on PHI-supported servers. These servers could be located between research centers and on-device, thus opening up computing capabilities without the need for movement of the data. FL can also open up collaborations between different research and clinical institutions, as models trained on decentralized resources can achieve the same performance without requiring direct sharing of patient information.4 This protects patient privacy by decreasing the information sharing burdens for patient-level DT creation.

Furthermore, FL and associated techniques can protect patient information when anonymization is made more challenging with centralization. As with most medical data, anonymizing records is a prerequisite to building models. Traditional methods of anonymization involve removing patient identifiers across modalities of data. However, with traditional DTs, aggregating a variety of data in one place can make anonymization less effective, as this data could be reconstructed using the same or other modalities.7 Federated learning limits the de-anonymization risks associated with centralizing the data.

Thirdly, initial platforms for DTs have had some success including commercial software (e.g. IBM, GE Digital, Ansys) and open source platforms (e.g. Fed-BioMed)10. However, no platforms have emerged as the dominant solution for FL. Teams of researchers have worked on building such platforms to support FL for DTs, but success for these platforms has been limited because of difficulty in acquiring and resolving data from different sources as well as in keeping platforms compatible with each other. However, it is notable that many groups that are building FL algorithms and platforms could have their work applied to DTs. Thus, this is an exciting time for opportunities at the intersection of FL and DT.

In summary, we emphasize the significant potential when integrating FL and DTs, and several advantages FL has to tackle issues with implementing DTs including managing multi-modal data, patient privacy and existing FL platforms. DTs are rapidly popularizing and have vast potential to augment clinical practices. FL is also gaining traction as a way to balance building models from larger datasets while providing enhanced data privacy. By combining the two in a clinical setting, FL can help address privacy concerns of DTs and DTs can provide a valuable use case to expedite the adoption of FL. Moreover, the diversity and precision of data required to support digital twins makes the traditional method of data centralization increasingly challenging. FL offers a solution by providing a way to leverage the multi-source aspect of the data without need for significant data transfer. Here, we argue how FL may be a necessity in order to develop DTs at scale while protecting patient privacy. Together, clinical digital twins and federated learning can enhance precision medicine by enabling improved data-driven clinical decision-making.

Declaration of interests:

OG reports grants to his institution from the National Cancer Institute (NCI), AstraZeneca Inc, National AI Center of Saudi Arabia, Owkin Inc., Onc.AI and Roche Molecular Systems. OG is named inventor on a submission by his institute of a provisional patent “RNA to image synthetic data generator” and a patent “Methods and Systems for Learning Gene Regulatory Networks Using Sparse Gaussian Mixture Models”.

References:

RESOURCES