Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Jun 4.
Published in final edited form as: Cell Metab. 2022 Apr 13;34(5):661–666. doi: 10.1016/j.cmet.2022.03.011

Evaluating human genetic support for hypothesized metabolic disease genes

Peter Dornbos 1,2,3, Preeti Singh 1, Dong-Keun Jang 1, Anubha Mahajan 4, Sudha B Biddinger 5, Jerome I Rotter 6, Mark I McCarthy 4, Jason Flannick 1,2,3
PMCID: PMC9166611  NIHMSID: NIHMS1807426  PMID: 35421386

Abstract

We investigate the extent to which human genetic data are incorporated into studies that hypothesize novel links between genes and metabolic disease. To lower the barriers to using genetic data, we present an approach to enable researchers to evaluate human genetic support for experimentally determined hypotheses.

Keywords: human genetics, translation, genetic support, gene prioritization, drug discovery

Introduction

Human genetic “experiments of nature” are a powerful resource to identify or evaluate genes involved in human disease (Claussnitzer et al., 2020). Disease-associated genetic variants represent causal links between molecular perturbations and disease risk, complementing data from animal- or cell-based experimental models that often have uncertain fidelity to human disease. Consequently, there has been an increasing appreciation that human genetics can help prioritize genes identified through experimental models – in particular, new candidate drug targets (Plenge et al., 2013).

Although large-scale human genetic studies have been established for over a decade, their analysis requires expertise – a potential barrier to their use by researchers not trained in their interpretation. Here, we investigate the frequency with which published experimental studies regarding type 2 diabetes (T2D) or glucose homeostasis incorporate human genetic data. To increase the use of human genetic data in such studies, we propose a series of simple guidelines to interpret human genetic support for hypotheses about the involvement of genes or proteins in human disease. We demonstrate this approach by evaluating recently published hypotheses about genes relevant to T2D and glucose homeostasis.

The current use of human genetics to evaluate genes hypothesized as relevant to glucose homeostasis

We reviewed articles that mention “diabetes”, “glucose”, or “insulin” in their abstracts, were published between January 2017 and October 2020 in five highly cited journals (Cell, Cell Metabolism, Nature, Nature Metabolism, and Science), and which did not describe genome-wide genetic analyses. We curated genes hypothesized in these articles as involved in human T2D or glucose/insulin metabolism, identifying 35 publications and 52 genes (Table 1). Five (14%) of these articles cited human genetic evidence for five (10%) genes: three cited previously published genetic associations, and two conducted novel genetic association tests.

Table 1. Genes recently hypothesized as relevant to diabetes, glucose metabolism, or insulin metabolism.

The table lists articles that hypothesize a novel relationship between a gene or protein and type 2 diabetes or a diabetes-related phenotype. For complexes that are encoded by multiple genes (e.g. PIK3), all genes were analyzed. In all cases, human orthologs are listed. Dashes (−) indicate that the article did not include any human genetic data.

Gene PMID/Citation Journal Type of Evidence Incorporation of Human Genetics
PPARGC1A 28340340 Cell mouse, cell culture -
STUB1 28431247 Cell C elegans, Drosophila melanogaster, cell culture -
TBK1 29425491 Cell mouse, cell culture -
ZMPSTE24 29526462 Cell yeast, cell culture Novel finding
LEPR 29670283 Nature mouse, cell culture -
VDR
BRD9
BRD7
29754817 Cell mouse, cell culture Cite previous study
PIK3CA
PIK3CB
PIK3CG
PIK3CD
PTEN
30051890 Nature mouse, cell culture -
PIK3CA
PIK3CB
HRAS
NRAS
KRAS
30982732 Cell Metabolism mouse, cell culture -
ALOX12 31353262 Cell Metabolism human study, mouse, cell culture -
PAX6 31607563 Cell Metabolism human study, mouse, cell culture -
EIF2AK3 31543404 Cell Metabolism mouse, cell culture -
CPT1A
SLC25A20
31378464 Cell Metabolism human study, Cell Culture -
GSK3A
GSK3B
30879985 Cell Metabolism human study, mouse, cell culture -
VDAC1 30293774 Cell Metabolism human study, cell culture -
C3
ATG16L1
30293775 Cell Metabolism human study, mouse, rat, cell culture -
PRKCE 30318338 Cell Metabolism mouse -
OR4M1 31230984 Cell Metabolism mouse, cell culture -
TREM2 31257031 Cell mouse -
CERS6
MFF
31150623 Cell mouse, cell culture -
FOXK1
FOXK2
30700909 Nature mouse, cell culture -
SLC25A5 31528845 Nature Metabolism human study, mouse, cell culture -
HAS2
HAS3
31602424 Nature Metabolism mouse, cell culture -
TGFB2 31032475 Nature Metabolism human study, mouse, cell culture -
ITPR1 32132708 Nature mouse, rats, cell culture -
PGRMC2 31748741 Nature mouse, cell culture
CD81 32615086 Cell human study, mouse, cell culture
GDF3 32941798 Cell Metabolism mouse, cell culture Cite previous study
PRKCE 32882164 Cell Metabolism human study, rats -
TXNIP 32726606 Cell Metabolism mouse, cell culture -
PGR4 32413335 Cell Metabolism mouse, cell culture Cite previous study
SCOT 32275862 Cell Metabolism mouse, cell culture -
TAZ 31708444 Cell Metabolism mouse, cell culture -
RIPK1 32989316 Nature Metabolism human study, mouse, cell culture Novel finding
GPNMB 32694855 Nature Metabolism mouse, cell culture -
HSL
ChREBP
ELOLV6
32694809 Nature Metabolism human study, mouse, cell culture -

Why do so few experimental studies incorporate human genetic data? One reason is that, historically, genetic association results have been hard to access. Fortunately, over the past decade, human genetic research communities have shifted toward prioritizing data sharing (Flannick and Florez, 2016): web-based catalogs of associations now exist for genome wide association studies (GWAS) and whole exome sequence (WES) studies. The common metabolic diseases knowledge portal (CMDKP), maintained by our group, provides a genetic association resource focused on common metabolic disorders.

A second challenge is that genetic associations can be difficult to interpret. For example, a 2017 study of Sin3a knockout mice proposed a novel interaction between SIN3A and FOXO1 (Langlet et al., 2017) that might suggest an effective treatment for hyperglycemia in humans. If the study authors were to query SIN3A and FOXO1 in the CMDKP, they would observe (a) an association nearby SIN3A (p=9.96×10−9) in one of the largest T2D GWAS to date (Mahajan et al., 2018a) and (b) a nominally significant (p=0.04) association for FOXO1 in one of the largest T2D WES studies to date (Flannick et al., 2019). While these data seem to support the involvement of these two genes in T2D, the degree of support is unclear absent clear guidelines for interpreting the data.

Considerations when evaluating genetic support for hypothesized links between genes and disease

How can scientists incorporate human genetic data into their research? While no automated methods or resources can (today) directly evaluate genetic support for a hypothesis, and while researchers should employ a genetic analyst when this question is central to a study, there are some fundamental principles for using genetic data that any researcher can follow. These principles apply to any complex disease with GWAS or WES data available.

Principle 1. Use public GWAS resources.

The largest collections of genetic associations come from GWAS, which have produced common variant associations for thousands of traits. Web-based GWAS association resources include the GWAS Catalog (https://www.ebi.ac.uk/gwas), the GWAS Atlas (https://atlas.ctglab.nl), and PheWeb (https://pheweb.org), which contain associations across many complex traits, while the CMDKP (https://cmdkp.org) contains a larger collection of GWAS associations for common metabolic disorders. Each resource allows queries of the GWAS associations “nearby” (typically within 50kb-250kb) a gene. Researchers should determine which human phenotype(s) should show a GWAS association under their hypothesis and query if any associations (with p<5×10−8) have been observed near their gene of interest. If so, their hypothesis has some human genetic support.

Principle 2. Don’t over-interpret an association.

Proximity of a gene to a GWAS association does not necessarily imply that the gene is responsible for the association. On average, seven genes lie “nearby” each GWAS association, and thus (absent further information) each such gene has about a 15% (~1/7) chance of mediating the association. Additional information about the genomic region and regulatory elements surrounding a gene can link it to an association with more confidence. Although rigorous methods for doing so are in their infancy, a few simple analyses are possible today: (a) if the gene harbors a significant coding variant association, the likelihood it mediates the association increases to ~50% (Mahajan et al., 2018b), (b) if the gene is the nearest gene to the strongest associated SNP in the region, the likelihood increases to ~70% (Stacey et al., 2019), and (c) if the gene harbors a coding variant association stronger than any other association in the region, the likelihood increases to >95% (Mahajan et al., 2018b).

Principle 3. Use WES associations to complement GWAS associations.

Rare coding variant associations from WES studies directly implicate human disease genes, even if they are usually less significant than GWAS associations. Public resources of WES associations (calculated by aggregating rare variants at the “gene-level”) include the CMDKP (for 25 metabolic traits) and GeneBass (for ~4,000 traits within the UK Biobank; https://genebass.org). Exome-wide significant gene-level associations (p<2.5×10−6) provide very high (>95%) genetic support for a hypothesis. Nominally significant associations (p<0.05 after correcting for the number of rare variant tests performed) provide lesser support – roughly equivalent (for datasets in the CMDKP) to the support provided by a nearby GWAS association (this estimate follows from the application of a previously developed equation for association statistics (Wakefield, 2008)).

Principle 4. Consider related traits.

Although researchers should specify human phenotypes of interest prior to conducting any queries, associations with related traits (e.g. fasting glucose levels as opposed to T2D) add some genetic support to a hypothesis. If such associations exist, they should be reported with transparency about the number of traits interrogated.

Principle 5. Absence of evidence does not imply evidence of absence.

A lack of association for a gene does not necessarily provide evidence against its involvement in disease – negative evidence requires confidence that the genetic variants observed for the gene (a) are “impactful” (i.e. they significantly affect its function), and (b) exhibit evidence of no association (e.g. an estimated effect size near zero with high confidence) rather than simply a lack of association. It is challenging to confidently identify impactful common noncoding variants, and it is unusual (with current WES datasets) to identify rare coding variant associations with narrow confidence intervals. Some genes do harbor enough predicted loss-of-function variants to produce evidence against the gene, although it remains possible that different gene perturbations could cause different phenotypic effects.

Proposed human genetic evidence (HuGE) guidelines

To summarize genetic support for a hypothesis about the involvement of a gene in human disease, we propose a HuGE score (Figure 1) that combines evidence from GWAS and WES associations. The score can be calculated for any complex disease with publicly available genetic associations and is representable as either a qualitative category of evidence (ranging from “anecdotal” to “compelling”) or an “order of magnitude” quantitative probability of true association. It is derived by (a) assuming 5% of genes are involved in T2D (Satterstrom et al., 2020); (b) using equations from Bayesian statistics to represent each probability estimate in Principles 2 and 3 as “Bayes Factors” that convert the 5% “prior” to an updated probability (“posterior”); and (c) multiplying the common and rare variant Bayes Factors under the assumption that GWAS and WES associations are independent. The quantitative probabilities can be estimated under either conservative (5% of genes involved in disease (Satterstrom et al., 2020)) or optimistic (20% of genes with supporting mouse data involved in disease (Flannick et al., 2019)) scenarios. Document S1 provides a more thorough description of HuGE scores and step-by-step instructions to calculate them; an automated tool that calculates HuGe scores for 341 common metabolic disorders is also available online (https://hugeamp.org/hugecalculator.html).

Figure 1. Human Genetic Evidence (HuGE) guidelines.

Figure 1.

To use our proposed HuGE guidelines to evaluate genetic support for a gene, we independently evaluate evidence from common variant associations (leftmost column) and rare variant gene-level associations (bottom row). Evidence from common variant associations, which can be obtained from any one of several public resources described in the main text, falls into one of five tiers. The lowest tier (“No evidence”) applies to genes not within 100kb of a genome-wide significant (p<5×10−8) association. If a gene is within 100kb of an association, we then identify the strongest association (i.e. with the lowest p-value) in the region and use it to determine the tier: “Causal coding variant” applies to genes that harbor a coding variant with the strongest association in the region, “Nearest gene” applies to genes that are the closest among genes in the region to the strongest association, “Coding variant” applies to genes that harbor a coding variant that does not have the strongest association, and “GWAS locus” applies to all other genes within 100kb of an association. The “GWAS locus” tier assumes that seven genes lie within 100kb of the association (the average value across the genome); for loci with more or fewer genes near the association, the support could be more accurately calculated according to the actual number of genes near the association. Evidence from rare variant gene-level associations, also available from multiple public resources, falls into one of five tiers determined by the association p-value: “Exome-wide” (p<2.5×10−6), “Strong” (p<1×10−3), “Nominal” (p<0.05), “Weak” (p<0.1), and “No evidence” (p>0.1). We combine the two sources of evidence to yield the values in the cell corresponding to the relevant row and column. The cells show qualitative descriptions of evidence strength and the estimated probability (rounded to nearest 5%) that the gene is involved in disease under conservative (no supporting experimental evidence, left of bar) and optimistic (supporting experimental evidence, right of bar) scenarios. Both the qualitative and the quantitative values follow by applying rules from Bayesian statistics together with literature estimates of evidence strength as described in the main text. Further information regarding these derivations is available on the common metabolic diseases knowledge portal (CMDKP). Document S1 includes step-by-step instructions for using the CMDKP or other public resources to evaluate HuGE scores, and an automated tool implementing them for 341 common metabolic diseases can be found on the CMDKP (https://hugeamp.org/hugecalculator.html).

To illustrate the use of HuGE scores, we analyzed eight genes targeted by current T2D drugs (Flannick et al., 2019). Six (75%) have some human genetic support: four (GLP1R, PPARG, KCNJ11, and ABCC8) have “compelling” and two (INSR and IGF1R) have “very strong” support. The other two genes (DPP4 and SLC5A2) emphasize that, for reasons of statistical power or evolutionary happenstance, even viable drug targets can lack genetic support – the WES associations for these two genes fall just below our threshold for anecdotal evidence.

Next, using HuGE scores to quantitatively interpret T2D association evidence for SIN3A and FOXO1 (our motivating example genes), we find “moderate” support for both: SIN3A is nearby a GWAS association (but has no coding variant association and is not the gene closest to the strongest association), while FOXO1 has only a nominal (p=0.04) rare coding variant association. Under the “optimistic” scenario where a researcher trusts the mouse data for SIN3A and FOXO1, moderate support corresponds to a ~45% probability that these genes are relevant to T2D, a substantial increase over the baseline of 20% from mouse data alone (Flannick et al., 2019).

Evaluating all 52 genes curated from our literature search (Table 2), we find that (including SIN3A and FOXO1) 12 (23%) have some degree of human genetic support. Eleven (21%) of the 52 genes rise above “anecdotal” evidence, two with “strong” and two (ATG16L1 and PTEN) with “extreme” evidence. The majority, however, lack any level of genetic support for a role in T2D. Most genes simply have “absence of evidence”, although some appear to have “evidence of absence”. For example, contrary to the hypothesis that GPNMB is relevant to T2D (based on observations that Gpnmb regulates lipolysis in mice), 21 rare human predicted loss-of-function variants in the CMDKP have (in aggregate) a small estimated effect on T2D risk (95% confidence interval 0.74 – 1.85 for T2D odds-ratio). As WES datasets increase in size, it may be possible to systematize these sorts of analyses and use human genetic evidence of absence to limit costly investment in genes unlikely to be involved in disease.

Table 2. Genetic support for genes recently hypothesized as relevant to diabetes and glucose homeostasis.

We analyzed each gene in Table 1 using the HuGE guidelines outlined in this essay. Dashes (−) indicate absence of data and/or evidence. Common Variation: evidence tier in the HuGE framework based on common variant (GWAS) associations. Rare Variation: evidence tier in the HuGE framework based on rare variant (WES) associations. Category: qualitative measure of genetic support. Updated Probability: quantitative measures of genetic support under conservative (5% prior) and optimistic (20% prior) scenarios.

Gene Common Variation Rare Variation Category Updated Probability
5% prior 20% prior
ALOX12 GWAS LOCUS - MODERATE 15% 40%
ATG16L1 NEAREST NOMINAL EXTREME 90% 95%
BRD7 - - - - -
BRD9 - - - - -
C3 - - - - -
CD81 GWAS LOCUS - MODERATE 15% 40%
CERS6 - - - - -
CHREBP - - - - -
CPT1A - - - - -
EIF2AK3 - - - - -
ELOVL6 - - - - -
FOXK1 NEAREST - VERY STRONG 70% 90%
FOXK2 - - - - -
FOXO1 - NOMINAL MODERATE 15% 40%
GDF3 - - - - -
GPNMB - - - - -
GSK3A - - - - -
GSK3B - WEAK ANECDOTAL 5% 25%
HAS2 - - - - -
HAS3 - - - - -
HRAS - - - - -
HSL - - - - -
ITPR1 - - - - -
KRAS - - - - -
LEPR NEAREST - VERY STRONG 70% 90%
MFF - - - - -
NRAS - - - - -
OR4M1 - - - - -
PAX6 - - - - -
PGR4 GWAS LOCUS - MODERATE 15% 40%
PGRMC2 - - - - -
PIK3CA - - - - -
PIK3CB - - - - -
PIK3CD - - - - -
PIK3CG - - - - -
PPARGC1A - - - - -
PRKCE - NOMINAL MODERATE 15% 40%
PTEN NEAREST NOMINAL EXTREME 90% 95%
RIPK1 - - - - -
SCOT - - - - -
SIN3A GWAS LOCUS - MODERATE 15% 40%
SLC25A20 - - - - -
SLC25A5 - - - - -
STUB1 - - - - -
TAZ - NOMINAL MODERATE 15% 40%
TBK1 - - - - -
TGFB2 - - - - -
TREM2 - - - - -
TXNIP - - - - -
VDAC1 - - - - -
VDR - - - - -
ZMPSTE24 - - - - -

Discussion

Despite the vast number of human genetic associations now publicly available, and despite the widely recognized value of human genetics to identify human disease-susceptibility genes (Plenge et al., 2013), few (~14%) recent studies reporting novel links between genes and T2D reference human genetic data. We suspect that this trend is true for other diseases as well.

The guidelines we propose for evaluating genetic support (Figure 1) are intended to be simple to follow and implementable using only public resources. With simplicity comes caveats: the guidelines omit features that could more accurately link common variant associations to genes (e.g. epigenomics or transcriptomics), they measure the presence of an association but not its directionality or mechanism (e.g. whether it suggests protein inhibition or activation should reduce disease risk), and they are limited by the amount of data currently available.

Nonetheless, caveats are not unique to genetic data, and it is notable that the majority of genes hypothesized in recent years as involved in T2D have no genetic support under our guidelines. We believe that readers of the original journal articles describing these genes would have benefitted from this information, as prioritizing the study of genes that do harbor human genetic associations is expected to increase the success of future research efforts (Plenge et al., 2013). We suggest that – in the future – journals might pilot some sort of “genetic reporting guidelines” akin to those used today for statistical analyses and data sharing. This will require additional work by investigators to learn how to query public genetic resources, but this is a small and – in our opinion – worthwhile investment: increasing the use of genetic data in biological research should have a positive effect on the translatability of experimental findings to human disease.

Supplementary Material

Document S1

Acknowledgements

P.D. and J.F. were supported by NIDDK grant 1R01DK125490. J.F. was also supported by NIDDK grant 5UM1DK105554. J.I.R. is supported in part by (a) NIH NHLBI contract R01HL151855 and NIH NIDDK contract UM1DK07861, (b) National Center for Advancing Translational Sciences, CTSI grant UL1TR001881, and (c) the National Institute of Diabetes and Digestive and Kidney Disease Diabetes Research Center (DRC) grant DK063491 to the Southern California Diabetes Endocrinology Research Center. S.B.B. is supported by the NIH (HL109650, DK125898, R01DK125898) and the American Heart Association (Established Investigator Award).

Footnotes

Declared Conflicts of Interest

The views expressed in this article are those of the author(s) and not necessarily those of the NHS, the NIHR, or the Department of Health. M.I.M. has served on advisory panels for Pfizer, NovoNordisk and Zoe Global, has received honoraria from Merck, Pfizer, Novo Nordisk and Eli Lilly, and research funding from Abbvie, Astra Zeneca, Boehringer Ingelheim, Eli Lilly, Janssen, Merck, NovoNordisk, Pfizer, Roche, Sanofi Aventis, Servier, and Takeda. As of June 2019, M.I.M. is an employee of Genentech, and a holder of Roche stock. A.M. is an employee of Genentech and a holder of Roche stock.

References

  1. Claussnitzer M, Cho JH, Collins R, Cox NJ, Dermitzakis ET, Hurles ME, Kathiresan S, Kenny EE, Lindgren CM, MacArthur DG, et al. (2020). A brief history of human disease genetics. Nature 577, 179–189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Flannick J, and Florez JC (2016). Type 2 diabetes: genetic data sharing to advance complex disease research. Nature reviews. Genetics 17, 535–549. [DOI] [PubMed] [Google Scholar]
  3. Flannick J, Mercader JM, Fuchsberger C, Udler MS, Mahajan A, Wessel J, Teslovich TM, Caulkins L, Koesterer R, Barajas-Olmos F, et al. (2019). Exome sequencing of 20,791 cases of type 2 diabetes and 24,440 controls. Nature 570, 71–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Langlet F, Haeusler RA, Linden D, Ericson E, Norris T, Johansson A, Cook JR, Aizawa K, Wang L, Buettner C, et al. (2017). Selective Inhibition of FOXO1 Activator/Repressor Balance Modulates Hepatic Glucose Handling. Cell 171, 824–835.e818. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Mahajan A, Taliun D, Thurner M, Robertson NR, Torres JM, Rayner NW, Payne AJ, Steinthorsdottir V, Scott RA, Grarup N, et al. (2018a). Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nature genetics 50, 1505–1513. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Mahajan A, Wessel J, Willems SM, Zhao W, Robertson NR, Chu AY, Gan W, Kitajima H, Taliun D, Rayner NW, et al. (2018b). Refining the accuracy of validated target identification through coding variant fine-mapping in type 2 diabetes. Nature genetics 50, 559–571. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Plenge RM, Scolnick EM, and Altshuler D (2013). Validating therapeutic targets through human genetics. Nature reviews. Drug discovery 12, 581–594. [DOI] [PubMed] [Google Scholar]
  8. Satterstrom FK, Kosmicki JA, Wang J, Breen MS, De Rubeis S, An JY, Peng M, Collins R, Grove J, Klei L, et al. (2020). Large-Scale Exome Sequencing Study Implicates Both Developmental and Functional Changes in the Neurobiology of Autism. Cell 180, 568–584.e523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Stacey D, Fauman EB, Ziemek D, Sun BB, Harshfield EL, Wood AM, Butterworth AS, Suhre K, and Paul DS (2019). ProGeM: a framework for the prioritization of candidate causal genes at molecular quantitative trait loci. Nucleic acids research 47, e3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Wakefield J (2008). Reporting and interpretation in genome-wide association studies. International journal of epidemiology 37, 641–653. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1

RESOURCES