Decades of investment into genomic science has led to a revolution in our understanding of the architecture of the human genome, genomic variation, and the biological consequences of the genome on human heath and disease. Before this scientific revolution, the genomic contribution to disease was widely thought to be a function of coding region variation in genes with functions linked to biologically plausible roles in the etiology of diseases of interest. Most disease susceptibility genes were identified through studies of families with Mendelian patterns of inheritance, where co-segregation of polymorphic DNA markers with disease would reveal the chromosomal location of the underlying susceptibility gene. The number of genetic loci identified by family-based studies numbered in the dozens to hundreds before the advent of the Human Genome Project and its descendants.
Thanks to the technical and methodological advancements that enabled GWAS, our understanding of the underlying genetics of most common diseases and traits has been fundamentally changed. As of October 2016, GWAS have identified at least 33,044 genetic associations, including at least 2,888 associations with cancer and cancer-related traits according to the NHGRI-EBI Catalog of published genome-wide association studies (https://www.ebi.ac.uk/gwas/home). Consistent with the “common disease common variant” hypothesis[1] that in part motivated the GWAS era, the effects of most loci identified are small, and the risk-associated alleles are relatively common. A large fraction of disease susceptibility loci are found in non-coding regions. Associations reported in putative regulatory regions and “gene deserts” are common. Most loci that have been identified do not lie in or near the candidate genes that were hypothesized to be involved in disease etiology. Many variants associated with common diseases and traits affect only one cancer, and there are relatively few genes that have pleiotropic effects across multiple diseases. Thus, with hindsight, it is now painfully clear why the candidate gene approach to identify common disease susceptibility loci was largely unsuccessful.
Despite the many successes of GWAS, there remain unfulfilled promises. Because so many loci have been identified that lie nowhere near known coding or regulatory regions, progress in identifying the underlying mechanism of most associations has been slow. While novel genetic mechanisms of regulation of diseases and traits have been found in post-GWAS studies, the function of most GWAS-identified loci remains unelucidated.
Similarly, the low penetrance of most GWAS-identified loci has limited the ability to translate genetic discoveries into clinical practice. In contrast with high penetrance gene mutations identified in family studies, most GWAS-identified loci do not provide clinically relevant information on their own. Polygenic risk scores and other multi-locus approaches for assessing risk have been developed, but most of these have not found their way into clinical practice. GWAS and related tools have now begun to discover moderate penetrance mutations that are playing a role in clinical practice.
But is what we know today a function of the true underlying etiology of disease, or rather a function of the kinds of genetic effects that are identifiable by GWAS approaches? Despite the success of GWAS, the amount of genetic variance in a particular cancer or risk factor that is explained by GWAS-identified loci is relatively small. In response to a literature with few replicable associations, the philosophy that guided GWAS was to use extremely stringent criteria for association to identify true positive associations. However, the consequence of this approach is that many relevant associations may have been missed. Similarly, the GWAS approach sets extremely stringent, and possibly unattainable, standards for the identification of gene-gene or gene-environment interactions. Few examples of these interactions have been identified, despite evidence from a variety of domains that suggest these interactions may exist. This may be a particular concern if regions identified through GWAS are involved in regulation of gene expression, perhaps under the influence of exposures.
The GWAS era has been a mixed bag: to geneticists, the success of GWAS in disease locus discovery has been an unqualified success. To biologists, the discovery of many loci has opened doors to mechanisms and pathways that were not previously known. Those successes have required different assays, reagents and approaches, since sequencing of DNA to identify truncating mutations in an open reading frame didn’t apply for regulatory regions of the genome. To clinicians, the promise of GWAS studies impacting clinical practice, whether for identification of high risk populations, early detection of disease, prognosis or prediction, or insights on new therapeutics has been frustratingly complex, difficult, and slow.
In this special CEBP Focus issue, some of the leading figures in GWAS research in the past decades summarize what we have learned from this line of research and where the field can contribute biological and clinical insights into cancer.
Reference
- 1.Lander ES. The new genomics: global views of biology. Science. 1996;274(5287):536–539. doi: 10.1126/science.274.5287.536. [DOI] [PubMed] [Google Scholar]