Skip to main content
. 2021 Jul 16;20:1243–1260. doi: 10.17179/excli2021-4002

Figure 1. Brief overview over the contents of this review. a: The three different types of genomic data that are covered in this work. A: Adenine; C: Cytosine; G: Guanine; T: Thymine; me: methyl group. b: Shown are a selection of applications that encourage data sharing. From left to right: Genomic data sharing is often required when building machine learning models in order to increase the available sample size required for training. Collecting and enriching data on minorities can reduce subpopulation bias in a trained model. Data often needs joining in multiparty studies when it is collected at different sites. Other motivators are sharing genomic data to allow the reproducibility of results or to reuse the data for new scientific questions. c: The subject re-identification is the core concern in genomic data privacy. The ability to produce uniquely identifying Single-Nucleotide-Polymorphism(SNP)-barcodes from the data allows an adversary to cross-reference these with public databases, often containing meta information that give away sensitive medical information. d: A timeline of selected laws that were introduced in several countries to protect citizens from discrimination based on genome-related data. e: Displayed are a selection of commonly used data sharing methods, colour-coded based on the maximum level of security they can provide. f: Selection of upcoming sharing techniques that are subject of ongoing research. Also shown as a necessary future step is the invocation of globally valid laws to protect subjects from discrimination in the case of a security breach. GAN: Generative Adversarial Network; RBM: Restricted Boltzmann Machine; VAE: Variational Autoencoder.

Figure 1