Skip to main content
. 2019 May 24;20(6):e48316. doi: 10.15252/embr.201948316

Table 1.

Examples of studies showing re‐identifiability and distinguishability of genomic data

Lin Z, Owen AB, Altman RB (2004) Genomic research and human subject privacy: Science 305: 183 Suggested that human beings can be uniquely identified from just 30 to 80 statistically independent SNPs
Homer N, Szelinger S, Redman M, et al (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high‐density SNP genotyping microarrays. PLoS Genet 4: e1000167 Demonstrated that specific individuals could be distinguished in genome‐wide association study (GWAS) data through summary statistics (allele frequencies)
Cassa CA, Schmidt B, Kohane IS, et al (2008) My sister's keeper?: genomic research and the identifiability of siblings. BMC Med Genomics 1: 32 Demonstrated risk of revealing one's siblings’ identity through one's SNPs
Schadt EE (2012) The changing privacy landscape in the era of big data. Mol Syst Biol 8: 612 Demonstrated that it is possible to derive genotypic information and identify an individual in large‐scale collections of genomic profiles from publicly available RNA data
Hae KI, Gamazon ER, Nicolae DL, et al (2012) On sharing quantitative trait GWAS results in an era of multiple‐omics data and the limits of genomic privacy. Am J Hum Genet 90: 591–598 Demonstrated that quantitative trait GWAS results can be linked directly to human research participants if a matched sample is available
Gymrek M, McGuire AL, Golan D, et al (2013) Identifying personal genomes by surname inference. Science 339: 321–324 Demonstrated that participants could be re‐identified by linking STRs on the Y chromosome with data found in publicly available datasets
Schloissnig S, Arumugam M, Sunagawa S, et al (2013) Genomic variation landscape of the human gut microbiome. Nature 493: 45 Indicated that individuals might have a unique metagenomic genotype
Shringarpure SS, Bustamante CD (2015) Privacy risks from genomic data‐sharing beacons. Am J Hum Genet 97: 631–646 The study shows that in a beacon with 1,000 individuals re‐identification is possible with just 5,000 queries
Lippert C, Sabatini R, Maher MC, et al (2017) Identification of individuals by trait prediction using whole‐genome sequencing data. Proc Natl Acad Sci USA 114: 10166–10171 Developed model to predict phenotypic traits (e.g., facial structure, voice, eye and skin color, height, weight, and BMI) from common genetic variation in WGS data
Erlich Y, Shor T, Pe'er I, et al (2018) Identity inference of genomic data using long‐range familial searches. Science 362: 690–694 Predicted that with a database size of ~3 million US individuals of European descent (2% of the adults of this population), over 99% of the people of this ethnicity would have at least a single 3rd cousin match