Skip to main content
. Author manuscript; available in PMC: 2024 May 29.
Published in final edited form as: Nat Metab. 2022 Aug 11;4(8):970–977. doi: 10.1038/s42255-022-00607-8

Fig 4. Schematic representation of data lifecycle management needs and challenges.

Fig 4.

(left) Islet preparations are assigned a unique sample and donor identifier prior to data acquisition to ensure that each sample is unique and can be traced back to its origin (right). Data should be acquired using standardized protocols and be processed using common workflows for each technology to minimize batch and other confounding factors, ensuring data of the same type can be compared. Clinical information, sample metadata and molecular, functional, and imaging results need to be described according to international semantic standards to ensure interoperability between datasets. This enables them to be either integrated into a single database or be components of a federated network of interoperable datasets (as shown). Finally, standardized acquisition, processing and description of terminologies and pipelines enables clinical and laboratory data to be aligned and analyzed across donor cohorts, increasing statistical power to link findings to disease outcomes or molecular pathways. This can be done either by accessing a single database containing the combined data, or (as shown) via federated networks enabling interactive analysis of distributed databases (Federated analysis) or decentralised learning strategies such as Swarm learning where AI models can be learnt through exchange of model parameters between distributed databases. Current challenges and best practice are indicated. This figure was created with BioRender.com.