Abstract
A major task for genetics is searching for genetic variants associated with disease. But we may well be missing a large number of “unknown unknown” alleles in the “fog of genetics”.

Subject Categories: Chromatin, Epigenetics, Genomics & Functional Genomics; Genetics, Gene Therapy & Genetic Disease
Nearly all human traits are influenced by genetic variability 1, 2. Genetic variants associated with diseases can also inform about disease aetiology, may have clinical applications and often become therapeutic targets 3. Generally, the heritability of phenotypes, that is the impact of genetic variation versus the effect of environmental factors, covers a broad range from very small to large effects. A major focus of modern genetics is understanding the heritability of traits, and large‐scale genetic studies, including genome‐wide association studies (GWASs), are a major tool to this end.
However, genetic association studies—including GWAS—can only detect associations if the underlying genes have functional genetic variants. Human genetic variability is limited, a snapshot of our current species. Here, we propose that the finite genetic variability of the human species means that we have only a small search space for association studies, which constrains our understanding of the genetic and molecular basis of human phenotypes. Inspired by the term “fog of war”, to describe uncertainty in military operations, we call the uncertainty in genetics resulting from the limited natural variation of our species the “fog of genetics”.
As we try to understand the genetic basis of human traits and diseases, it is clear that we are missing a lot of important information. For most phenotypes, we are missing significant elements of heritability 2, that is contributions from unknown genetic variants that explain the variation of a phenotypes such as height, longevity or susceptibility to disease. Such missing heritability represents the known unknowns of genetics. For a given phenotype, we can estimate its heritability and, at least in theory, but often in practice, estimate the impact of genetic variants. Therefore, even if we are unaware of which and how many genes contribute to a phenotype, we can know how much of these genetic contributions we are missing.
However, there is also an ocean of unknown unknowns in genetics. These are extremely rare alleles that do not contribute to estimates of phenotypic variability. Some of these have been identified, for example PCSK9 affecting LDL cholesterol levels 1, but it is certainly conceivable that a large number of phenotypically relevant genetic variants exist in humans, yet they are so rare as to go unnoticed in GWAS and even in estimates of heritability. Another source of unknown unknowns is alleles that, owing to chance or selection, have become fixed and stopped being a genetic variant, no matter how large their phenotypic impact is compared to its now extinct variant(s). Even if the gene product plays a major role in a given phenotype, without a genetic association owing to the absence of variants, it will go undetected. Because such genes are unknown and unknowable at present, there is a huge space of potential genetic variation in the human species that we are unaware of because it does not exist at present.
For genetic variants with a high penetrance and strong, early‐onset phenotypes—such as developmental abnormalities or catastrophic phenotypes—scientists and clinicians may be able to detect them in individual patients. Yet, most human traits and diseases have a complex, non‐Mendelian genetic architecture. As such, we argue that the fog of genetics is widespread across most human phenotypes, in particular complex age‐related diseases which are now the major causes of death in modern societies. Even genetic association studies with large numbers of individuals have a limited ability to detect associations in rare alleles. In addition, rare functional variants in drug target genes have been shown to be geographically localized, making them difficult to catalog 4. Similarly, recent rare variants in family lineages may play an important role in human disease 5. On top of these are the unknowable genetic variants that have been lost from the human species (Fig 1).
Figure 1. Detectability of phenotype–genotype non‐Mendelian associations in relation to the frequency of genetic variants.

Common variants, mostly ancestral alleles, can be associated with traits in GWAS. Alleles that have low frequencies in populations, however, will typically only be detected if they have strong phenotypic effects, like early‐onset diseases. At the extreme, genetic variants extinct in current populations (due to genetic drift or purifying selection) will not be detected. Likewise, alleles now fixed in populations will not be associated with phenotypes. In addition to the genetic variant frequency and strength of the phenotype, other factors not shown like penetrance and complexity of genetic architecture also affect detectability in genetic association studies.
Our framework predicts that the genes with more genetic variants are more likely to be associated with human phenotypes. We tested this hypothesis using data from the 1000 Genomes Project (see Appendix Supplementary Methods, Dataset EV1) and observed that, indeed, genes with GWAS hits have a greater genetic diversity (see Appendix Fig S1), even after accounting for the potentially confounding effects of gene length (see Appendix Supplementary Results). Our results highlight that GWAS hits are biased towards particular types of genes. Further experimental evidence for the fog of genetics comes from artificial selection experiments. Experimental evolution experiments in yeast show that, even though the same pathways seem to be selected for adaptation, there is variation at the sequence level in different experiments 6.
Rare genetic variants can be associated with strong phenotypes, and indeed, ongoing efforts aim to characterize rare disease‐causing alleles 1. In addition, most genetic association studies are based on European or Asian populations, and expanding these to other populations, in particular in Africa with its greater genetic diversity, will reveal novel functional genetic variants. Comparative genomic studies with other species, including primates 7, can also uncover putative phenotypic effects in genetic variants that are fixed in humans.
In conclusion, we are not only missing heritability of complex traits and diseases—the known unknowns of genetics—but we are unaware of a huge number of possible variants that either no longer exist or are so rare in the human species that cannot be associated with phenotypes—the unknown unknowns of genetics. The implications of this fog of genetics are multiple and far‐reaching: what we know of gene function in health and disease is based on the limited genetic and phenotypic natural variation of the human species; genetic associations are also biased towards longer genes with more variants. Our molecular understanding of most traits derives from a very limited human genetic landscape. Indeed, the molecular components of phenotypes often do not carry genetic associations 8, and others have proposed that we must move beyond genetic association to understand disease aetiology 9. Even though unknown genetic variants that do not exist are not responsible for phenotypic variation or clinical cases, these genes are still important because they can be crucial molecular players in biological processes and diseases. Lastly, current drug discovery efforts based on genetic variants 3 employ only limited information, and expanding these by studying more diverse populations 10, evolutionary genomic analyses, artificial forms of generating genetic diversity and improved phenotyping offers great promise for biological and biomedical research.
Supporting information
Appendix
Dataset EV1
EMBO Reports (2019) 20: e48054
References
- 1. Timpson NJ, Greenwood CMT, Soranzo N et al (2018) Nat Rev Genet 19: 110–124 [DOI] [PubMed] [Google Scholar]
- 2. McClellan J, King MC (2010) Cell 141: 210–217 [DOI] [PubMed] [Google Scholar]
- 3. Nelson MR, Tipney H, Painter JL et al (2015) Nat Genet 47: 856–860 [DOI] [PubMed] [Google Scholar]
- 4. Nelson MR, Wegmann D, Ehm MG et al (2012) Science 337: 100–104 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Lupski JR, Belmont JW, Boerwinkle E et al (2011) Cell 147: 32–43 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Kryazhimskiy S, Rice DP, Jerison ER et al (2014) Science 344: 1519–1522 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Muntane G, Farre X, Rodriguez JA et al (2018) Mol Biol Evol 35: 1990–2004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Menche J, Sharma A, Kitsak M et al (2015) Science 347: 1257601 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Boyle EA, Li YI, Pritchard JK (2017) Cell 169: 1177–1186 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Sirugo G, Williams SM, Tishkoff SA (2019) Cell 177: 26–31 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Appendix
Dataset EV1
