Skip to main content
eLife logoLink to eLife
. 2022 Sep 8;11:e79348. doi: 10.7554/eLife.79348

Integrative analysis of metabolite GWAS illuminates the molecular basis of pleiotropy and genetic correlation

Courtney J Smith 1,†,, Nasa Sinnott-Armstrong 1,2,†,, Anna Cichońska 3, Heli Julkunen 3, Eric B Fauman 4, Peter Würtz 3, Jonathan K Pritchard 1,5,
Editors: Alexander Young6, George H Perry7
PMCID: PMC9536840  PMID: 36073519

Abstract

Pleiotropy and genetic correlation are widespread features in genome-wide association studies (GWAS), but they are often difficult to interpret at the molecular level. Here, we perform GWAS of 16 metabolites clustered at the intersection of amino acid catabolism, glycolysis, and ketone body metabolism in a subset of UK Biobank. We utilize the well-documented biochemistry jointly impacting these metabolites to analyze pleiotropic effects in the context of their pathways. Among the 213 lead GWAS hits, we find a strong enrichment for genes encoding pathway-relevant enzymes and transporters. We demonstrate that the effect directions of variants acting on biology between metabolite pairs often contrast with those of upstream or downstream variants as well as the polygenic background. Thus, we find that these outlier variants often reflect biology local to the traits. Finally, we explore the implications for interpreting disease GWAS, underscoring the potential of unifying biochemistry with dense metabolomics data to understand the molecular basis of pleiotropy in complex traits and diseases.

Research organism: Human

Introduction

A central challenge in the field of human genetics is understanding the mechanism of how genetic variants influence complex traits and diseases. Genome-wide association studies (GWAS) have begun characterizing the genetic architecture of complex traits, but the molecular mechanisms connecting genetic variants to these traits are rarely understood. This is particularly true for understanding pleiotropy, when a variant affects multiple traits (Solovieff et al., 2013). It is possible to estimate the genetic correlation between traits (Bulik-Sullivan et al., 2015b; Shi et al., 2017), but it is often unclear what contributes to this at a molecular or physiological level. A handful of in vitro disease-focused ‘post-GWAS’ studies have convincingly shown the mechanisms driving pleiotropy of individual key associations (Warren et al., 2017; Sinnott-Armstrong et al., 2021b); however, these studies are highly specific and time-consuming. Developing statistical and computational approaches to identify putative molecular mechanisms is invaluable to advancing our understanding of where and how pleiotropic GWAS variants act.

In this study, we use metabolites as model traits to understand pleiotropic features of genetic architecture. Metabolites are small molecules interconverted by a series of biochemical pathways and are an appealing model system for studying pleiotropy because their pathways are typically well-documented and biologically simpler than those underlying other complex traits (Gieger et al., 2008; Sinnott-Armstrong et al., 2021a). Previous work in Mendelian genetics has identified inborn errors of metabolism (IEM) in many enzymes (Woidy et al., 2018). Metabolite GWAS, which have long observed pervasive pleiotropy at these IEM genes and other loci (Shin et al., 2014; Yeung, 2021), offer a potential opportunity to further explore the relationships between intermediate molecules and disease outcomes at scale. Here, we jointly analyzed GWAS results of 16 plasma metabolites from the Nightingale Health Nuclear Magnetic Resonance (NMR) Spectroscopy platform in nearly 100,000 individuals in the UK Biobank (Julkunen et al., 2021; Figure 1; see ‘Methods’). These 16 metabolites included glucose, pyruvate, lactate, citrate, isoleucine, leucine, valine, alanine, phenylalanine, tyrosine, glutamine, histidine, glycine, acetoacetate, acetone, and 3-hydroxybutyrate. They were chosen based on their biochemical proximity to each other, their relevance to health and disease, and because the genes and enzymes involved in their metabolism are well-characterized. They play especially important roles in energy generation and energy storage pathways such as glycolysis, the citric acid cycle, amino acid metabolism, and ketone body formation. They are relevant to many metabolic diseases including type 2 diabetes (Laffel, 1999; Guasch-Ferré et al., 2020; Newsholme et al., 2007), cardiovascular disease (Lusis and Weiss, 2010), and non-alcoholic fatty liver disease (Watt et al., 2019).

Figure 1. Biochemistry of relevant metabolites.

Figure 1.

Pathway diagram and molecular structure of relevant metabolites, colored by their biochemical groups. The pathway diagram was curated from multiple resources (see ‘Methods’). All solid lines represent a single chemical reaction step. Dotted lines represent a simplification of multiple steps. For simplicity, only a subset of all the reactions each metabolite participates in is shown. Genes encoding the enzymes that catalyze the above chemical reactions are known and presented in Figure 3—figure supplement 1.

Numerous GWAS have begun characterizing the genetic architecture of metabolites and found them to be heritable and polygenic (Lemaitre et al., 2011; Suhre et al., 2011). Recent metabolite studies have shown that leveraging information about the biochemical pathways relevant to a given metabolite (Teslovich et al., 2018; Graham et al., 2021; Rueedi et al., 2017; Sinnott-Armstrong et al., 2021a) can allow for more interpretable gene annotation of GWAS hits. This has led to the dissection of individual associations of biomarkers, such as lipids (Kettunen et al., 2016), glycine (Wittemans et al., 2019), and intermediate clinical measures (Lotta et al., 2021), with cardiometabolic and other diseases. The pervasive pleiotropy at these GWAS loci with other metabolites as well as disease (Lotta et al., 2021; Pott et al., 2019) suggests the potential of utilizing these data for investigating the mechanism of pleiotropic effects as a core component of genetic architecture. While recent GWAS have begun jointly investigating multiple metabolites (Cichonska et al., 2016; Ruotsalainen et al., 2021; Qi and Chatterjee, 2018), they have yet to do so in the context of their biochemical pathways.

In this article, we demonstrate that investigating the effects of pleiotropic variants on biologically related metabolites allows for a better understanding of why these variants have their observed joint effects. Our results reveal striking heterogeneity in genetic correlation across the genome and provide a biologically intuitive basis for understanding this heterogeneity. Together, this allows us to dissect the molecular basis of metabolic disease GWAS variants and enables us to directly define the mechanism relating an example variant to its associated disease.

Results

Insights into the shared genetic architecture of biologically related metabolites

We chose 16 metabolites from the 249 available through the Nightingale NMR platform in a subset of the UK Biobank (Figure 1; see ‘Methods’). These 16 metabolites were selected based on their biochemical proximity, relevance to health and disease, and because the genes and enzymes involved in their metabolism are well-characterized. We classified the 16 metabolites into four groups based on shared biochemistry: Glycolysis (glucose, pyruvate, lactate, citrate), Branched Chain Amino Acid (BCAA; isoleucine, leucine, valine), Other Amino Acid (alanine, phenylalanine, tyrosine, glutamine, histidine, glycine), and Ketone Body (acetoacetate, acetone, 3-hydroxybutyrate). Trait measurements were log-transformed and adjusted for relevant technical covariates. After outlier removal, we obtained a primary dataset of 94,464 genotyped European-ancestry individuals with data for all 16 metabolites.

We first sought to characterize the genetic architecture underlying these metabolites by performing GWAS for each (Figure 2—figure supplement 1). Hits from individual GWAS were clumped with an r2 of 0.01 per megabase, combined across metabolites, then pruned to the single nucleotide polymorphism (SNP) with the most significant p-value within 0.1 cM. This resulted in 213 lead variants with a genome-wide significant association in at least one metabolite, referred to as the metabolite GWAS hits. Glycine had the largest number of significant associations with 77 hits (Figure 2a). There were 47 variants with significant associations in more than one metabolite, including rs2939302 (near the gene GLS2), which was significant in 9 of the 16 metabolites, and rs1260326 (GCKR), which was significant in 8. Glycine also had the highest Heritability Estimate from Summary Statistics (HESS) total SNP heritability of 0.284 (Supplementary file 1 and Figure 2—figure supplement 2).

Figure 2. Overview genetic architecture of metabolites.

(a) Number of genome-wide association studies (GWAS) hits per metabolite. (b) Pairwise LDSC genetic correlations between the metabolites, clustered by genetic correlation. (c) Mendelian randomization weighted results between the metabolites. (d) Biclustered standardized effect size in each metabolite for the 213 metabolite GWAS hits. For visualization, effect sizes were divided by the standard error then inverse normal transformed and standardized. Each variant was aligned to have a positive median score across metabolites.

Figure 2.

Figure 2—figure supplement 1. Manhattan plots.

Figure 2—figure supplement 1.

Manhattan plots for each metabolite overlayed by biochemical group: (a) branched-chain amino acids, (b) glycolysis related metabolites, (c) other amino acids, and (d) ketones. Coloring reflects gene type assignment for where relevant.
Figure 2—figure supplement 2. HESS heritability plots.

Figure 2—figure supplement 2.

Cumulative HESS heritability plots for the 16 metabolites, colored by biochemical group: (a) acetoacetate, (b) 3-hydroxybutyrate, (c) acetone, (d) citrate, (e) glucose, (f) pyruvate, (g) lactate, (h) isoleucine, (i) valine, (j) leucine, (k) alanine, (l) glutamine, (m) glycine, (n) histidine, (o) phenylalanine, (p) tyrosine.
Figure 2—figure supplement 3. Phenotypic correlation.

Figure 2—figure supplement 3.

Correlation matrix of the residualized phenotype levels for the 16 metabolites. Metabolites are colored by biochemical group.
Figure 2—figure supplement 4. Quantile–quantile plots.

Figure 2—figure supplement 4.

Quantile–quantile plots showing the observed -log10(p-value) from the 16 genome-wide association studies (GWAS) in each metabolite for the 213 metabolite GWAS Hits. The most significant trait for each SNP is excluded, and the plots are faceted by gene type. Note that for plotting purposes, p-values were set to a minimum of 1e-300. Variants with quantile <0.005 are labeled with their gene annotation.

To understand the shared genetics of these metabolites, we then investigated the extent of pleiotropy between and within biochemical groups. In order to examine this, we first calculated pairwise LD Score regression (LDSC) genetic correlation across the 16 metabolites. We found substantial genome-wide sharing for many pairs of metabolites, especially for metabolites within the same biochemical group (Figure 2b; phenotypic correlation in Figure 2—figure supplement 3). We then explored pleiotropic effects beyond the polygenic background by examining the structure within the metabolite GWAS hits. Pairwise Mendelian randomization (MR) between the metabolites emphasized the intertwined nature of these traits (Figure 2c). Despite only taking into account genetic effects, MR largely clustered metabolites in a way that reflects their biochemical groups. The extensive pleiotropy across the 16 traits, with similar sharing inside biochemical groups, is also illustrated by the structure visible in the normalized effect sizes for each metabolite GWAS hit (Figure 2d). Together, these analyses support substantial, but not always consistent, genetic overlap between the traits, particularly in the polygenic components. In the remainder of this article, we will seek a deeper understanding of the biochemical relationships between genotypes and metabolite levels.

Characterizing the biological functions of candidate genes

An important step in understanding the pathway-level mechanisms of variants is knowing which gene a variant is affecting and how that gene relates to the biology of the pathway. Different types of genes influence trait biology through distinct mechanisms. Metabolite biology is documented in genetic and biochemical databases based on the extensive history of biochemical research (Supplementary file 2). Thus, we developed a pipeline for annotating the 213 metabolite GWAS hits with a single most likely gene using gene proximity and manual curation of these databases (Supplementary file 3 and Figure 2—figure supplement 4; see ‘Methods’). We annotated 68 variants with genes encoding pathway-relevant enzymes (25-fold enrichment, 95% CI [20-fold, 33-fold], Poisson rate test p<2e-16), 46 with genes encoding transporters (5.2-fold enrichment, 95% CI [3.7-fold, 7.2-fold], p=9e-16), and 30 with genes encoding transcription factors (TFs; 7-fold enrichment among liver marker TFs, 95% CI [3.0-fold, 14-fold], p=3e-5; Figure 3). Overall, 69% of variants were assigned to the closest gene and 49% of variants assigned to a pathway-relevant enzyme gene were assigned known IEM genes (Woidy et al., 2018). The substantial enrichment for biologically interpretable variants suggests that examining the genetic basis of these traits will allow for the development of hypotheses around relevant molecular mechanisms underlying pleiotropy.

Figure 3. Gene annotation of metabolite genome-wide association studies (GWAS) hits.

Each gene is colored based on the biochemical group with the most associated metabolites (p<1e-4). If multiple biochemical groups are tied for the most associations for a given gene, they are all shown. (a) Expanded pathway diagram with all genes (italicized) that encode pathway-relevant enzymes and were a metabolite GWAS hits. (b) List of all genes of the metabolite GWAS hits that encode transporters and transcription factors (TFs). There were 69 metabolite GWAS hits that are not shown. Of these, 60 were annotated with genes assigned to the gene type general cell function (14 of these 60 were related to lipid function), and 9 were assigned to a gene of unknown function or that did not have any genes nearby (see ‘Methods’).

Figure 3.

Figure 3—figure supplement 1. Genes in the pathway.

Figure 3—figure supplement 1.

Pathway diagram showing the genes in the pathway, regardless of whether there was a variant annotated with them or not.
Figure 3—figure supplement 2. Quantile–quantile plots.

Figure 3—figure supplement 2.

Quantile–quantile plots showing the observed -log10(p-value) from the 16 genome-wide association studies (GWAS) in each metabolite for the full summary statistics of the ancestry-inclusive GWAS: (a) acetoacetate, (b) acetone, (c) alanine, (d) 3-hydroxybutyrate, (e) citrate, (f) glucose, (g) glutamine, (h) glycine, (i) histidine, (j) isoleucine, (k) lactate, (l) leucine, (m) phenylalanine, (n) pyruvate, (o) tyrosine, (p) valine. Note that for plotting purposes, p-values were set to a minimum of 1e-324.

Next we sought to understand which genes and subpathways were most relevant to each biochemical group. We assigned each gene to the biochemical group with the most associated metabolites (Supplementary file 4; see ‘Methods’). Genes were largely assigned to the group whose relevant biology was nearest the protein encoded by the gene. For example, BCAT2 encodes an enzyme responsible for the first step in the breakdown of all three BCAAs and was assigned to the BCAA group. OXCT1 encodes an enzyme responsible for the conversion of acetoacetyl-CoA to the ketone body acetoacetate and was assigned to the Ketone Body group. Similarly, SLC7A9 encodes a protein that transports amino acids and was assigned to the Other Amino Acid group, while TCF7L2 is a TF assigned to the Glycolysis group and involved in blood glucose homeostasis. These results confirm that these variants are affecting known trait-relevant biology and reflecting the local structure of these pathways.

Interestingly, a large fraction of the genes involved in trait-relevant biology were genome-wide significant hits for at least one of the 16 metabolites. Specifically, of the 139 total genes encoding enzymes in the pathway diagram for these metabolites (Figure 3—figure supplement 1), 51 genes had at least one GWAS hit. Additionally, we performed an ancestry-inclusive GWAS of all 98,189 individuals with complete metabolite data for follow-up analysis. In this ancestry-inclusive analysis, we identified 41 additional hits not found in the European-only GWAS, including associations at 7 additional pathway-relevant genes (Supplementary file 5 and Figure 3—figure supplement 2). This highlights the potential for large-scale, ancestry-inclusive GWAS to discover more biochemically relevant associations among these traits. Together, these findings suggest that GWAS reflect, and have the potential to illuminate, the complex biochemical pathways interconverting these metabolites.

Investigating the mechanisms of pleiotropy in trait pairs

Given the overlap between the biology of these metabolites and their hits, we next sought to understand the molecular causes of pleiotropy in trait pairs. We found 26 genetically correlated metabolite pairs at a local false sign rate < 0.005. For example, alanine and its strongest genetic correlation partner, isoleucine, share a genetic correlation of rg = 0.52 (SE = 0.05, p=9e-23). Similarly, plotting the effects of the 213 GWAS variants on these two traits indicates a strong positive correlation (Figure 4a). Nonetheless, we noted several outlier loci, including rs370014171 (PDPR) and rs77010315 (SLC36A2), which have strong discordant effects. We were intrigued to understand why these two variants had discordant effects on alanine and isoleucine relative to their overall positive genetic correlation, while the majority of other variants had concordant effects.

Figure 4. Discordant variant analysis.

(a) Comparison between the effects on alanine levels versus the effects on isoleucine levels for the 213 metabolite genome-wide association studies (GWAS) hits. This highlights two discordant variants: rs370014171 (PDPR) and rs77010315 (SLC36A2). Variants without an association p<1e-4 in both metabolites are labeled ‘Neither.’ (b) Graphical representation of where these two discordant variants act in the pathway, represented by green stars, relative to other upstream variants driving the positive genetic correlation. Below are the hypothesized mechanisms explaining the GWAS results in relevant metabolites for each of these discordant variants. Data are shown in circles with the coloring corresponding to the effect (beta) of that variant on that metabolite. A black outline represents an association with p<1e-4. Orange text and arrows represent a hypothesized increase (direction, not magnitude) in flux and blue corresponds to a decrease. (c) Results for rs370014171 near the gene PDPR, which encodes a protein that activates the conversion of pyruvate to acetyl-CoA. All solid lines represent a single chemical reaction step. Dotted lines represent a simplification of multiple steps. (d) Results for rs77010315 in the gene SLC36A2, which encodes a small amino acid transporter.

Figure 4.

Figure 4—figure supplement 1. Full pathway results for PDPR.

Figure 4—figure supplement 1.

Full pathway results and possible mechanism for discordant variant rs370014171 with gene annotation PDPR. rs370014171 (PDPR): The discordant variant rs370014171 had gene annotation PDPR, which encodes the regulatory unit of the protein pyruvate dehydrogenase phosphatase that is responsible for the activation of the pyruvate dehydrogenase (PDH) complex that catalyzes the conversion of pyruvate to acetyl-CoA (supplementary expanded pathway Figure 4-figure supplement 1). One possible mechanism for this variant is that it is increasing PDPR activity, leading to increased conversion of pyruvate to acetyl-CoA, resulting in decreased pyruvate levels and, by compensation, decreased lactate levels. In addition, to compensate for the decreased pyruvate levels, there could be an increased conversion of alanine to pyruvate and glutamate. This would cause a decrease in levels of alanine and an increase in levels of glutamate. In response to the increased acetyl-CoA, there could be decreased breakdown of metabolites that are normally broken down for its production, including isoleucine and leucine, resulting in an increase in their levels. With the increase in acetyl-CoA levels, less HMG-CoA is broken down to acetyl-CoA. The reaction of HMG-CoA to acetyl-CoA also produces acetoacetate, so this decrease results in a decrease in acetoacetate, as well as the other ketone bodies (acetone and β-hydroxybutyrate), which are directly downstream of acetoacetate. Some of the increase in HMG-CoA (from its decreased breakdown) leads to an increase in cholesterol. Meanwhile, alanine to glutamate is an important regulator of BCAA levels since the first step in BCAA breakdown is a reversible conversion to glutamate. An increase in glutamate’s levels means again less BCAA will need to be broken down, which is another potential reason for increased levels of the BCAAs (isoleucine, leucine, and valine). This variant was also found to be significantly associated with an increase in isoleucine, leucine, and valine and a decrease in alanine in another recent metabolomics study (Lotta et al., 2021). In this dataset, after these four amino acids, the metabolite with the next most significant association with this variant lysine (Z = 3.8, p=1e-4) (Lotta et al., 2021). Lysine is another amino acid that can be catabolized to produce acetyl-CoA and thus like isoleucine, leucine, and valine, based on the hypothesized mechanism for this variant it makes sense lysine would have a positive association.
Figure 4—figure supplement 2. PDPR locus colocalization.

Figure 4—figure supplement 2.

Colocalization results for PDPR locus demonstrating pleiotropic effects of variant rs370014171 on alanine (one of multiple independent associations at this locus), pyruvate, isoleucine, valine, and leucine.
Figure 4—figure supplement 3. Colocalization at PDPR.

Figure 4—figure supplement 3.

Colocalization probabilities at PDPR for the joint association signals of all metabolites. Note that alanine has multiple associations at the locus (Figure 4—figure supplement 2) and thus has reduced colocalization since coloc assumes a single causal variant; see Figure 4—figure supplement 4 for conditional colocalization.
Figure 4—figure supplement 4. Colocalization at PDPR conditioned on secondary signals.

Figure 4—figure supplement 4.

Colocalization probabilities at PDPR for the joint association signals of all metabolites for summary statistics conditioned on rs8061221 and rs12924171. By conditioning on these two secondary SNPs, there is broad colocalization across traits.
Figure 4—figure supplement 5. Full pathway results for SLC36A2.

Figure 4—figure supplement 5.

Full pathway results and possible mechanism for discordant variant rs77010315 with gene annotation SLC36A2. rs77010315 (SLC36A2): the discordant variant rs77010315 is a missense variant in SLC36A2, which encodes a transporter for small amino acids such as alanine (supplementary expanded pathway Figure 4—figure supplement 5 ). We hypothesize this variant is discordant because it increases the activity of SLC36A2, leading to increased transport of alanine into cells. This results in a decrease in levels of alanine in the blood, but results in increased intracellular conversion of alanine to pyruvate and glutamate. Pyruvate can then be reversibly converted to lactate, resulting in an increase in lactate levels, as well as converted to acetyl-CoA. The increased pyruvate results in an increase in glycine levels because less needs to be broken down to produce it. The increase in acetyl-CoA leads to an increase in its downstream citric acid cycle intermediates including citrate, fumarate, succinyl-CoA, and alpha-ketoglutarate. The increased glutamate leads to an increase in glutamine and the increase in fumarate leads to an increase in tyrosine and phenylalanine. The increase in acetyl-CoA also leads to an increase in isoleucine, HMG-CoA, and leucine since less needs to be broken down to produce it, and to increased acetoacetyl-CoA since more acetyl-CoA is available to be converted to it. Acetoacetyl-CoA can then be converted to acetoacetate, leading to an increase in all three ketone bodies (acetoacetate, acetone, and β-hydroxybutyrate), as well as succinyl-CoA. Finally, the increased succinyl-CoA leads to an increase in valine because less valine needs to be broken down to produce it. This variant also had consistent associations consistent in direction of effect with alanine (Z = –4.3, p=2e-5), glutamine (Z = 3.9, p=1e-4), valine (Z = 3.7, p=2e-4), and glycine (Z = 3.4, p=8e-4) in another dataset (Lotta et al., 2021). In this dataset, the most significant association with this variant was carnitine (Z = 13.2, p=2e-39) (Lotta et al., 2021). Carnitine can be made from methionine, which is a metabolite that can be produced from glycine breakdown. Because increased levels of glycine from this variant may result in increased levels of carnitine, based on the hypothesized mechanism for this variant it may make sense that carnitine would have a positive association.
Figure 4—figure supplement 6. SLC36A2 locus colocalization.

Figure 4—figure supplement 6.

Colocalization results for SLC36A2 locus demonstrating pleiotropic effects of variant rs77010315 on alanine, isoleucine, valine, and leucine. There is limited evidence for an association with pyruvate levels at the locus, suggesting the potential for larger sample sizes to resolve the contributions of changes to pyruvate levels at SLC36A2.
Figure 4—figure supplement 7. Colocalization at SLC36A2.

Figure 4—figure supplement 7.

Colocalization probabilities at SLC36A2 for the joint association signals of all metabolites. Note that pyruvate has a weak or nonexistent marginal association at the locus, which likely contributes to the lack of colocalization; see Figure 4—figure supplement 6.

Outlier variants are appealing case studies for understanding the molecular basis of pleiotropy because they affect traits in an exceptional way. Thus, we reasoned that understanding large-effect variants inconsistent with the global genetic correlation would reflect interesting biology relevant to the traits. For example, the proteins encoded by PDPR and SLC36A2 are both located between alanine and isoleucine in the biochemical pathway (Figure 4b). This suggests that where variants act in the pathway may influence the direction of effect they have on metabolites. To better understand how these two variants affect alanine and isoleucine and explain their outlier behavior, we examined their effect size and direction in the context of their location in the pathway. We then used the variants’ metabolite associations to develop candidate mechanisms for how each variant could be jointly influencing the levels of these metabolites.

As an illustration, we first consider variant rs370014171. This variant was assigned to gene PDPR because it was the second closest gene, the closest pathway-relevant enzyme, and within 100 kb (12.3 kb to its gene boundaries). PDPR activates the enzyme that catalyzes the conversion of pyruvate to acetyl-CoA (Figure 4c, Figure 4—figure supplement 1). A candidate mechanism for this variant, supported by the effect size and direction for the 16 metabolites where relevant, is that it increases PDPR activity. There was colocalization of association signals across the five significant metabolites using both conditional SNP-level analyses (Figure 4—figure supplement 2) and running coloc once adjusting for secondary signals at alanine ( Figure 4—figure supplement 3, Figure 4—figure supplement 4). This would lead to increased conversion of pyruvate to acetyl-CoA and thus decreased pyruvate (β = -0.023 SDs, SE = 0.003, p=3e-20). To compensate for the subsequent decreased pyruvate levels, there would be increased conversion of alanine to pyruvate causing a decrease in alanine. In response to the increased acetyl-CoA, there would be decreased breakdown of metabolites normally catabolized for its production, including isoleucine, resulting in an increase in isoleucine levels. Thus, this variant has an opposite effect on alanine and isoleucine, despite their overall positive genetic correlation, likely because it affects the activity of an enzyme that acts in the pathway between the pair of metabolites. As expected due to the high correlation between the levels of the three BCAAs, this variant is also a discordant variant for alanine with valine (rg = 0.51, SE = 0.05, p=2e-21), and alanine with leucine (rg = 0.49, SE = 0.06, p=1e-16).

As a second example, variant rs77010315 is a missense variant in SLC36A2. SLC36A2 encodes a transporter for small amino acids such as alanine (Figure 4d, Figure 4—figure supplement 5). There was colocalization of association signals across the four significant metabolites using both conditional SNP-level analyses (Figure 4—figure supplement 6) and running coloc (Figure 4—figure supplement 7). A candidate mechanism explaining the observed metabolite associations in our data and outlier behavior for this variant is that it increases transport of alanine into cells by SLC36A2. This would result in a decrease in levels of alanine in the blood, but an increase of alanine in cells. This additional intracellular alanine would then allow for increased conversion of alanine to pyruvate, thereby increasing levels of downstream metabolites in the blood, including isoleucine. Thus, this variant has an opposite effect on alanine and isoleucine, despite their overall positive genetic correlation, but in this case because it affects biology between the metabolites at the transporter level.

Quantifying global properties of molecular pleiotropy

Based on these results, we hypothesized that the two variants described above, and others like them, exhibit outlier behavior because they affect biology between the two metabolites (Figure 5a). We consider biology ‘between’ a given pair of metabolites as the shortest biochemical path connecting them, which can include a path converting one metabolite to the other, as well as other scenarios such as those involving colliders. This is because all biochemical reactions, including one-way reactions, can have bidirectional causal relationships due to Le Châtelier’s principle (see ‘Methods’ for details; Figure 5—figure supplement 1). Genetic correlation reflects the direction of effect that most associated variants have on two traits. However, when two metabolites are biologically near each other, the region containing ‘between’ biology is relatively small, such that only a minority of variants directly affect the ‘between’ region.

Figure 5. Characterization of discordant and concordant variants.

(a) Proposed model for the mechanism of a discordant variant. This example is for a discordant variant that has opposite effect directions on a pair of metabolites with a positive overall genetic correlation because it affects biology between them. (b) Proposed model for the mechanism of a concordant variant. This example is for a concordant variant that has the same effect direction on a pair of metabolites with a positive overall genetic correlation because it affects biology upstream both metabolites. (c) Fraction of the discordant and concordant variants that have a pathway-relevant enzyme or transporter gene-type annotation versus those with a different gene-type annotation (N total = 62). Discordant variants are enriched for the gene types of pathway-relevant enzyme or transporter, as would be expected in the model of discordant variants generally affecting biology between metabolites. (d) Fraction of the discordant and concordant variants annotated with a pathway-relevant enzyme that affect biology between versus not between their significant metabolite pairs (N total = 17). Significance tests were performed using Fisher’s exact method and the plotted SEs are from 95% CI calculated by Wilson score interval.

Figure 5.

Figure 5—figure supplement 1. Additional variant models.

Figure 5—figure supplement 1.

Additional Variant Model 1 defining between vs. outside biology (a), example discordant variant (b), and example concordant variant (c) for a more complicated example with a pathway like that in Figure 4c and d. Additional Variant Model 2 defining between vs. outside biology (d), example discordant variant (e), and example concordant variant (f) for a variant affecting a transmembrane transporter, labeled next to the metabolite the transporter acts on.
Figure 5—figure supplement 2. Negative genetic correlation variant models.

Figure 5—figure supplement 2.

Model for an example discordant (a) and an example concordant (b) variant when there is an overall negative genetic correlation.

Thus, we hypothesized that the genetic correlation of two biologically related metabolites mostly reflects the effects of variants upstream or downstream of the metabolites, masking the effects of those between. We developed an analogous hypothesis that variants affecting biology upstream or downstream of the two metabolites have concordant effects (Figure 5b). While less common, the overall genetic correlation for two biologically related metabolites can also be negative due to factors such as feedback loops. In this case, variants acting between the two metabolites would have the same direction of effect on both metabolites, making them discordant with the negative overall genetic correlation (Figure 5—figure supplement 2).

To evaluate these models, we defined outliers based on the consistency of their effects with the overall LDSC genetic correlation. If a variant had an effect direction opposite the overall LDSC genetic correlation in at least one significant metabolite pair (p<5e-8 in one, p<1e-4 in the other), it was classified as ‘discordant.’ For example, a discordant variant for a metabolite pair with a positive genetic correlation would have a negative association in one of the metabolites and a positive association in the other. If a variant had an effect direction consistent with the overall genetic correlation for its significant metabolite pairs, it was classified as ‘concordant.’ Variants without multiple associations, or where associated traits were not significantly genetically correlated, were classified as ‘neither.’ In total, of the 62 metabolite GWAS hits that had at least one significant metabolite pair, we found 26 total discordant variant–metabolite pairs across 14 variants (Supplementary file 6).

We then investigated overall properties of discordant variants relative to concordant ones. We discovered that discordant variants are more likely to affect genes encoding enzymes and transporters than all other genes types, including TFs, general cell function genes, and those of unknown function (odds ratio = 4.09, 95% CI [1.08, 23.1], p=0.034; Figure 5c). This is in contrast to concordant variants, which do not show an enrichment for enzymes and transporters relative to other gene types (odds ratio = 0.75, 95% CI [0.38, 1.48], p=0.42). These observations are consistent with our model that discordant variants tend to affect biology between relevant pairs of metabolites since TFs and general cell function genes generally act outside these metabolic pathways. Thus, they are more likely to affect biology upstream or downstream of both metabolites. In addition, for variants affecting pathway-relevant enzymes, where the location in the pathway that the variant is acting relative to the metabolites is clear, we were able to directly test our hypothesis. We found that discordant variants affecting pathway-relevant enzymes are much more likely to act between, rather than upstream or downstream, the metabolites for which they are discordant (odds ratio = 23.0, 95% CI [1.58, 1510.45], p=0.0072; Figure 5d).

We then sought to extend this finding by developing a model contrasting the effects of all variants affecting between versus outside biology at a pathway and genome-wide level (Figure 6a). In aggregate, this model predicts that pathways overlapping biology between two metabolites will have a local genetic correlation opposite that of nearby adjacent pathways and that the magnitude of both will exceed that of the global polygenic background. As a case study, we focused on alanine and glutamine, which have a weak positive overall genetic correlation (rg = 0.16, SE = 0.09, p=0.08; Figure 6b; Figure 6—figure supplement 1). We then ran BOLT-REML (Loh et al., 2015) on variants within 100 kb of genes in each pathway and estimated the corresponding local genetic correlations (see ‘Methods’).

Figure 6. Local genetic correlation.

(a) Model of expected local genetic correlation direction, with contrasting effects of variants affecting ‘between’ versus outside biology at a pathway and genome-wide level. (b) LD Score shows the polygenic correlation of alanine and glutamine. For the x-axis, LD Scores were binned into 25 bins. The y-axis shows the mean and SE within each bin genome wide, and overall genetic correlation significance was calculated using the jackknife standard error (participant N = 94464). (c) Results for the local genetic correlation of alanine and glutamine for variants within 100 kb of genes in each pathway. Standard errors are shown (participant N = 94464). Genesets listed below the dotted line include only enzymes and are considered pathway-relevant enzymes for these metabolites. Summary statistics for BOLT-REML and other methods can be found in Supplementary file 7. (d) Pathway diagram showing the pathways included in the local genetic correlation analysis and the positioning of their genes relative to alanine and glutamine. *Ketone body genes were omitted from panel (c) because the limited number of genes meant they failed to robustly converge. All arrows and nodes in the gray section are hypothetical and shown for illustration purposes.

Figure 6.

Figure 6—figure supplement 1. Genetic correlation results.

Figure 6—figure supplement 1.

p-Values for all metabolite pairs for all pathway regions for Haseman–Elston regression genetic correlation analysis.

We found that the local genetic correlations around genes in the Glycolysis, Gluconeogenesis, and Citric Acid Cycle Pathway and around genes in the Other Amino Acid Pathway were negative (Figure 6c). Both of these pathways encompass genes affecting biology between alanine and glutamine (Figure 6d). In striking contrast, nearby pathways, such as the Urea Cycle, had a positive local genetic correlation for these metabolites (rg,l = 0.45; SE = 0.15, p=0.003). Similarly, we found that regions overlapping genes encoding metabolite-associated transporters and TFs had strong positive genetic correlations consistent with their shared role in the upstream regulation of these two traits (rg,l = 0.44, SE = 0.05, p=1e-20; rg,l = 0.55, SE = 0.06, p=2e-20). All genes outside the core pathways had a weak positive genetic correlation, perhaps reflecting that they are embedded in the global gene regulatory network (rg,l = 0.068, SE = 0.02, p=0.003). Our findings were broadly consistent using individual-level data with Haseman–Elston regression (Yang et al., 2010), and summary statistics with ρ-HESS (Shi et al., 2017), stratified LD score regression (Bulik-Sullivan et al., 2015a), and a nonparametric Fligner–Killeen variance test (see ‘Methods’; Supplementary file 7). These results support the model that variants affecting biology between the metabolites frequently contrast with the contributions of upstream and downstream pathways. This emphasizes that the heterogeneity in genetic effects reflecting local biology shared by the traits can be masked in the global genetic correlation. In addition, these results offer biological intuition for interpreting genetic correlation of molecular traits at a pathway and genome-wide level.

Using metabolites to understand the mechanism of a disease-associated variant

Motivated by the interpretability of these results, we applied this logic to develop an example model for a variant associated with increased risk for a disease (Figure 7a). In this model, we hypothesized one mechanism for how a variant could be associated with increased risk for a disease is that it could impact metabolites in a way that is consistent with disease etiology. For example, the variant could increase metabolites associated with increased risk for the disease and/or decrease metabolites associated with decreased risk.

Figure 7. Pathway impact and pathology of example disease genome-wide association studies (GWAS) hit.

(a) Proposed model for the impact of a disease hit on a relevant pathway, contributing to an increased risk in the disease. (b) This variant is associated with an increase in levels of metabolites that have been implicated with increased risk of coronary artery disease (CAD), and a decrease in the levels of metabolites that have been implicated with decreased risk. Parentheses indicate nonsignificant associations, ‘NA’ indicates no evidence was found, and ‘-’ indicates a placeholder because CAD is being compared with itself. (c) Results for rs61791721 with gene assignment PCCB, which encodes a protein that catalyzes the conversion of propionyl-CoA to succinyl-CoA. The hypothesized mechanism is that the variant is decreasing the activity of PCCB, resulting in the above metabolite associations. Ammonium is represented by its chemical formula (NH4+). Data are shown in circles with the coloring corresponding to the effect (beta) of that variant on that metabolite. All solid lines represent a single chemical reaction step. Dotted lines represent a simplification of multiple steps.

Figure 7.

Figure 7—figure supplement 1. Full pathway results for PCCB.

Figure 7—figure supplement 1.

Full pathway results and possible mechanism for disease-associated variant rs61791721 with gene annotation PCCB. rs61791721 (PCCB): The variant rs61791721 has gene assignment PCCB, which encodes a protein that catalyzes the conversion of propionyl-CoA to succinyl-CoA (supplementary expanded pathway Figure 7—figure supplement 1). The hypothesized mechanism is that the variant decreases PCCB activity, resulting in lower levels of succinyl-CoA and increased propionyl-CoA. The increased propionyl-CoA results in excess ammonium being produced because propionyl-CoA inhibits N-acetylglutamate synthase, which is an important cofactor for the enzyme (carbamoyl phosphate synthetase) that catalyzes the first step in the urea cycle for ammonium capture. Alanine is a reservoir for nitrogen waste so there is an increase in pyruvate and glutamate to alanine and alpha-ketoglutarate to capture the toxic ammonium (Wongkittichote et al., 2017; Smith and Garg, 2017). More glycine is then broken down in response to the decrease in pyruvate levels and more glutamine is converted to glutamate, leading to a decrease in glycine and glutamine levels. Conversely, the increased levels of propionyl-CoA mean less valine and isoleucine need to be broken down, resulting in an increase in their levels. In addition, the increased alpha-ketoglutarate increases downstream citric acid cycle intermediates, meaning less tyrosine needs to be broken down to produce fumarate. Also in response to increased propionyl-CoA, fewer fatty acids are broken down, resulting in decreased acetyl-CoA, which is typically the downstream product, decreased citrate, which is one step downstream of acetyl-CoA, increased total fatty acid levels, and increased total triglyceride levels. This increase may stimulate the activity of HMG-CoA reductase and synthase, resulting in increased conversion of ketone bodies to acetoacetyl-CoA to HMG-CoA and cholesterol. This results in decreased levels of ketone bodies and increased HMG-CoA and cholesterol. Increased HMG-CoA leads to increased leucine, while increased cholesterol leads to an increase in low-density lipoprotein cholesterol (LDL-C) and a decrease in high-density lipoprotein cholesterol (HDL-C). While it is not necessarily intuitive that an increase in cholesterol levels would decrease HDL-C, it is possible that this increase activates the Rho A signal transduction pathway and suppresses peroxisome proliferator-activated receptor alpha, which then decreases the amount of ApoA1, as the reverse may explain why patients on statins may experience an increase in HDL-C despite a decrease in total cholesterol (Martin et al., 2001). ApoA1 is an essential protein for HDL and thus a decrease in its levels would result in a decrease in HDL-C. We also found that PCCB was the strongest eGene in GTEx for rs61791721, with an effect shared across most tissues (Figure 7—figure supplement 5). Note that we additionally ran GWAS for total triglycerides, total fatty acids, HDL-C, and LDL-C as part of the interpretation of the PCCB variant because they are important metabolites in the discussion of cardiometabolic disease (Figure 7—figure supplement 6).
Figure 7—figure supplement 2. Incident analysis.

Figure 7—figure supplement 2.

Forest plot for coronary artery disease (CAD) incident analysis for the 16 metabolites as well as high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), total triglycerides, cholesterol, and total fatty acids in the UK Biobank.
Figure 7—figure supplement 3. PCCB locus colocalization.

Figure 7—figure supplement 3.

Colocalization results for PCCB locus demonstrating pleiotropic effects of variant rs61791721 on (a) alanine, glycine, pyruvate, isoleucine, valine, leucine, total fatty acids, total triglycerides, total free cholesterol, and (b) the UK Biobank clinical measures of high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C) and triglycerides, and coronary artery disease (CAD).
Figure 7—figure supplement 4. Colocalization at PCCB for metabolites.

Figure 7—figure supplement 4.

Colocalization probabilities at PCCB for the joint association signals of all metabolites and clinical traits. hard, hard coronary artery disease (CAD); soft, soft CAD (with angina); SR, self-reported cases included. All traits except total free cholesterol (Total_FC), urea, low-density lipoprotein cholesterol (LDL), and total cholesterol colocalize together; LDL was initially well powered enough to detect multiple putative causal variants (Figure 7—figure supplement 3), supporting the alternative coloc pattern.
Figure 7—figure supplement 5. rs61791721 eQTL effects.

Figure 7—figure supplement 5.

Tissue expression quantitative trait locus (eQTL) effects for the PCCB variant (rs61791721) in GTEx. Reproduced from https://gtexportal.org/home/snp/rs61791721(downloaded on February 12, 2022). N shown in figure and 95% confidence intervals are from linear regression of normalized expression as previously described.
Figure 7—figure supplement 6. Additional Manhattan plots.

Figure 7—figure supplement 6.

Manhattan plots for (a) high-density lipoprotein cholesterol (HDL-C), (b) low-density lipoprotein cholesterol (LDL-C), (c) total triglycerides, and (d) total fatty acids. Blue coloring represents metabolites belonging to the lipid group and green belongs to fatty acid group.

To apply this model to our data, we considered metabolite GWAS hits that were annotated with pathway-relevant enzymes and associated with increased risk for coronary artery disease (CAD) (Kichaev et al., 2019; Koyama et al., 2020). The variant that best fit these criteria was rs61791721. This variant was assigned the nearest pathway-relevant enzyme gene, PCCB, which encodes a protein that catalyzes the conversion of propionyl-CoA to succinyl-CoA at the intersection of BCAA and fatty acid oxidation (Figure 7—figure supplement 1).

We combined results from the literature and incident analysis to understand the association of relevant metabolites with CAD (Figure 7—figure supplement 2). We then compared these with the effects of this variant on these metabolites (Figure 7b). In this analysis, we included high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), total fatty acids, and total triglycerides due to the extensive evidence implicating their association with CAD and because they are directly adjacent to the biology of the other 16 metabolites. Consistent with the metabolites’ corresponding direction of risk for CAD, this PCCB variant was negatively associated with glycine and HDL-C, and positively associated with isoleucine, leucine, valine, tyrosine, total fatty acids, total triglycerides, and LDL-C (p<1e-5; Supplementary files 8 and 9).

This PCCB variant has been associated with CAD in multiple prior GWAS (Kichaev et al., 2019; Koyama et al., 2020), yet neither the gene this variant affects nor the biological mechanism explaining its association with CAD are known. However, this variant affects many metabolites associated with CAD in a direction consistent with increased risk. Thus, we hypothesized that we could begin to understand why this variant is associated with CAD by understanding the pleiotropic effects of this variant on the metabolites.

A potential mechanism that we hypothesized could result in this pathogenic constellation of metabolite effects is that the variant could decrease PCCB activity, resulting in lower levels of succinyl-CoA and increased propionyl-CoA (Figure 7c). Consistent with this model, there was colocalization of association signals across all associated traits other than lipids using both conditional SNP-level analyses (Figure 7—figure supplement 3) and running coloc (Figure 7—figure supplement 4). The increased propionyl-CoA could result in excess ammonium being produced, and because alanine is a reservoir for nitrogen waste, this would increase conversion of pyruvate to alanine to capture the toxic ammonium (Wongkittichote et al., 2017; Smith and Garg, 2017). More glycine may then be broken down in response to the decrease in pyruvate levels, decreasing glycine levels. Conversely, the increased levels of propionyl-CoA would likely mean less valine, isoleucine, fatty acids, and thus triglycerides, would need to be broken down, resulting in an increase in their levels. This increase in fatty acids may stimulate the activity of 3-hydroxy-3-methylglutaryl-CoA (HMG-CoA) reductase and synthase, resulting in an increase in HMG-CoA and cholesterol ;(Salam et al., 1989; Wu et al., 2013). Increased HMG-CoA could lead to increased leucine because less leucine would need to be broken down to produce HMG-CoA, while increased cholesterol would lead to an increase in LDL-C and a decrease in HDL-C. Therefore, this variant is potentially associated with CAD because it is decreasing PCCB activity, resulting in myriad deleterious downstream metabolic consequences.

While in vivo functional validation would be needed to draw causal conclusions about the effect of this variant on these metabolites and of these metabolites on CAD, this example demonstrates that we can begin to generate informed hypotheses and dissect the molecular basis underlying disease GWAS hits by understanding the mechanism of relevant pleiotropic effects on metabolites. In addition, the pathways implicated by this analysis can also be independently prioritized as potentially playing an important role in cardiometabolic disease by leveraging the molecular basis of genetic correlation discussed in Figure 6. For example, alanine and glutamine have opposite associations with CAD and type 2 diabetes despite having an overall positive phenotypic correlation (Tillin et al., 2015; Jauhiainen et al., 2021). This suggests that the pathways described above with a negative local genetic correlation for alanine and glutamine are likely relevant to the molecular basis of these diseases. Thus, understanding the molecular basis of pleiotropy and genetic correlation of metabolites can improve our understanding of the variants and pathways contributing to complex disease biology.

Discussion

In this work, we investigate the joint effects of pleiotropic variants on 16 biologically-related metabolites in the context of their biochemical pathways. We build on prior studies examining the genetic architecture of metabolites by characterizing the genes and mechanisms through which variants affect these metabolites, and find a strong enrichment for genes encoding pathway-relevant enzymes and transporters. Our results offer biological intuition for the interpreting genetic correlation of molecular traits at a pathway and genome-wide level.

We demonstrate the effects of variants acting on biology between metabolites often contrast substantially with the contributions of upstream and downstream pathways, as well as the polygenic background. Perhaps paradoxically, while the overall genetic correlation between two traits provides a global view of shared effects, the genes that are directly involved in the traits’ core biology are most likely to have divergent effects. We show that one explanation of this is the substantial outlier contributions from variants acting directly between metabolites of interest. We anticipate that further mechanisms, such as context-specific variant effects and differential regulation by peripheral genes, will be discovered in future studies.

In addition, we show specific examples of candidate molecular mechanisms explaining the association of variants with multiple biologically related metabolites. These include associations at PDPR, SLC36A2, and PCCB, where we show that the direction and magnitude of their effects are consistent with metabolite biochemistry and disease etiology. These proposed molecular mechanisms enable biological prioritization of interesting candidates for future post-GWAS in vitro studies. Overall, these results suggest specific genetic and molecular underpinnings of complex disease variants and provide a road map for further discovery through the interpretation of pleiotropic variant effects on disease-relevant metabolites.

In this work, we focus on metabolites clustered at the intersection of amino acid catabolism, glycolysis, and ketone body metabolism. However, the approaches and results from this article have the potential to reveal novel insights into genetic effects on many biochemical pathways and molecular traits. In addition, integrating this work with proteomic and intermediate metabolomic data will offer additional evidence to develop and support these hypothesized mechanisms. These data may also clarify the relevance of additional mechanisms – such as buffering, feedback, and kinetics – in controlling the plasma levels of these metabolites. Finally, expanding the sample size and diversity of ancestries included in future GWAS, measurements of which are currently underway on the Nightingale platform and others, will increase power to detect novel findings such as important associations for variants with low allele frequency.

While we largely focused on the molecular trait space here, many of these concepts may be useful in the endophenotype and disease space as well. For instance, this approach may help identify variants and pathways most relevant to the core shared biology of a given pair of diseases, potentially revealing more about the molecular bases of the diseases and prioritizing additional drug target candidates. For example, discordant variants have already been identified for some disease pairs, such as for body mass index and type 2 diabetes (Mahajan et al., 2018), and non-alcoholic fatty liver disease and type 2 diabetes (Sliz et al., 2018).

One limitation of this study is that the metabolites were measured in the blood, while most of the relevant biology and pathways occurs within cells in various tissues throughout the body. Thus, we anticipate extensions of this work to include biomarker measurements from additional cell types and tissues, such as urine, saliva, biopsy samples, and in vitro-differentiated cells. Further, longitudinal analysis of relevant disease cohorts will allow insights into disease progression and subtyping.

In conclusion, this work underscores the potential of unifying biochemistry with genetic data to understand the molecular basis of complex traits and diseases and the mechanism through which variants impact these traits.

Methods

Population definition

We defined our GWAS population as a subset of the UK Biobank (Bycroft et al., 2018). For our cohort, we use the individuals for which Nightingale plasma metabolite data was available after filtering based on trait QC characteristics (see ‘Trait QC and covariate adjustment’). We then filtered individuals by the following QC metrics:

  1. Not marked as outliers for heterozygosity and missing rates (het_missing_outliers column).

  2. Do not show putative sex chromosome aneuploidy (putative_sex_chromosome_aneuploidy column).

  3. Have at most 10 putative third-degree relatives (excess_relatives column).

  4. No closer than second-degree relatives.

From these, we defined three cohorts: White British, non-British White, and everyone. We identified White British individuals using the in_white_British_ancestry_subset column in the sample QC file. We identified non-British White individuals through self-identification as White, excluding individuals marked as in_white_British_ancestry_subset (n = 30,116 who passed QC metrics 1–4 above). As was done for the White British in the initial UK Biobank study design (Bycroft et al., 2018), we identified global principal components (PCs) of the genotype data, and then defined ancestry clusters using aberrant with the strictness parameter λ = 20. Non-British White individuals who were outliers for any of projected PC pairs PC1/PC2, PC3/PC4, and PC5/PC6 were excluded (n = 25,137 remaining). We performed our first GWAS on the set of individuals in these White British and non-British White cohorts.

The combination of the two sources of European and White British ancestry individuals resulted in a total of 433,390 European ancestry individuals in UK Biobank, of whom 94,464 had available quality-controlled Nightingale data. Our main goal for this study was to understand general principles of genetic architecture, which are not expected to vary among human populations, and thus in the main analysis we excluded non-European individuals on the basis of power and concerns about structure confounding. However, this analysis is significantly limited by the allele frequency differences between populations, and we sought to develop an alternative, inclusive strategy that did not rely on self-identity.

Metabolomics data generation

The metabolomics data was generated by Nightingale Health using a high-throughput NMR-based platform developed by Nightingale Health Ltd. Randomly selected EDTA non-fasting (average 4 hr since last meal) plasma samples (aliquot 3) from approximately 120,000 UK Biobank participants were measured in molar concentration units. No power calculation was performed, but we anticipated numerous discoveries on the basis of prior GWAS with similar or smaller sample sizes (Kettunen et al., 2016; Willer et al., 2013; Lotta et al., 2021). The measurements took place between June 2019 and April 2020 using six spectrometers at Nightingale Health, based in Finland. The Nightingale NMR biomarker profile contains 249 metabolic measures from each plasma sample in a single experimental assay, including 168 measures in absolute levels and 81 ratio measures. The biomarker coverage is based on feasibility for accurate quantification in a high-throughput manner and therefore mostly reflects molecules with high concentration in circulation, rather than selected based on prior biological knowledge. Additional details about the data generation can be found at here.

Trait selection and grouping

Sixteen metabolites were chosen from the available Nightingale metabolites based on their biochemical proximity, relevance to health and disease, and because the genes and enzymes involved in their metabolism are well-characterized. Specifically, we first filtered to the 168 metabolites that were not metabolite ratio measurements (n = 81) because we wanted to focus on absolute metabolites levels. We then filtered out the lipids and lipoprotein measures, including cholesterol and fatty acids, because the complexity of their biochemistry make it difficult to map out the chemical reactions directly interconverting one to another, and because many of these metabolites have already been extensively studied in large GWAS (Willer et al., 2013; Graham et al., 2021). However, many of these are important metabolites in the discussion of cardiometabolic disease so we additionally ran GWAS for total triglycerides, total fatty acids, HDL-C, and LDL-C as part of the interpretation of the PCCB variant using the same pipeline as for the 16 metabolites below.

Finally we removed remaining derived measures (such as total combined concentration of BCAA) and those primarily reflecting physiological conditions such as fluid balance (creatinine and albumin) and GlycA (inflammation). One exception to this filtering was the three ketone bodies (3-hydroxybutyrate, acetone, and acetoacetate), which were included due to their proximity and clear direct interconversions connecting them to the metabolic pathways of the remaining amino acid and glycolysis-related metabolites. Metabolites were classified into four biochemical groups based on biochemical similarity. The three branched chain amino acids – valine, leucine, and isoleucine – were classified as “BCAA”; the remaining amino acids in the dataset – glycine, alanine, glutamine, tyrosine, phenylalanine, and histidine – were classified as ‘Other Amino Acid’; the three ketone bodies – 3-hydroxybutyrate, acetone, and acetoacetate – were classified as ‘Ketone Body’; and the four metabolites in or immediately adjacent to glycolysis – glucose, pyruvate, lactate, and citrate – were classified as ‘Glycolysis'.

Trait QC and covariate adjustment

Trait measurements were filtered to only include baseline samples then log-transformed. Outlier removal was performed by dropping any sample that had a metabolite level greater than 20-fold the interquartile range or greater than 10-fold below the median across all samples for that metabolite. Principal component analysis (PCA) was run for the remaining samples and outliers were dropped using aberrant (lambda = 20) on the top two PCs (Bellenguez et al., 2012). Remaining log-transformed measurements were adjusted for spectrometer, week, and weekday. A total of 106,175 individuals had quality-controlled metabolomics data, ranging from 46 to 80 years old (mean 65.5; median 67), and of whom 54% were female, 90% were genotyped on the UK Biobank array (10% on BiLEVE), 94% identified as White (field 21000 code 1 or 1001–1003), 0.6% identified as multiracial (field 21000 code 2 or 2001–2004), 1.9% identified as Asian or Asian British (field 21000 code 3 or 3001–3004), 1.5% of whom identified as Black or Black British (field 21000 code 4 or 4001–4004), and 1.5% identified with a different label or declined to provide a label (field 21000 code −1, –3, 5, or 6). Samples were subset to the GWAS population defined above, resulting in 94,464 individuals for the European ancestry GWAS.

Genome-wide association studies

We performed GWAS in BOLT-LMM v2.3.2 (Loh et al., 2015) adjusting for sex, array, age, and genotype PCs 1–10 using the following command (data loading arguments removed for brevity):

bolt --phenoCol= [Metabolite] \ 
         --covarCol=sex \ 
     --covarCol=Array \ 
     --qCovarCol=age \ 
     --qCovarCol=PC{1:10} \ 
     --lmmForceNonInf \ 
     --numThreads=24 \ 
     --bgenMinMAF=1e-3 \ 
     --bgenMinINFO=0.3

The resulting GWAS summary statistics were then filtered to minor allele frequency (MAF) > 0.01 and INFO score >0.7 for further analyses (referred to as the Filtered Metabolite Sumstats). The LDSC munge_sumstats.py script was then used to munge the data (referred to as the Munged Metabolite Sumstats) (Bulik-Sullivan et al., 2015b).

GWAS hit processing

To evaluate GWAS hits, we took the Filtered Metabolite Sumstats and ran the following command using plink version 1.9 (Chang et al., 2015):

plink --bfile [] --clump [GWAS input file] --clump-p1 1e-4 
      --clump-p2 1e-4 --clump-r2 0.01 --clump-kb 1000 --clump-field P_BOLT_LMM --clump-snp-field SNP

We greedily merged GWAS hits across the 16 metabolites located within 0.1 cM of each other and took the SNP with the minimum p-value across all merged lead SNPs. In this way, we avoided potential overlapping variants that were driven by the same, extremely large, gene effects. This resulted in 213 lead GWAS variants, referred to as the metabolite GWAS hits.

Gene and gene-type annotation

We defined all genes in any Gene Ontology (GO) (Ashburner et al., 2000; Consortium et al., 2021), Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa and Goto, 2000), or REACTOME MSigDB (Liberzon et al., 2011; Subramanian et al., 2005) pathway as our full list of putative genes (in order to avoid pseudogenes and genes of unknown function). We initially extended genes by 100 kb (truncating at the chromosome ends) and used the corresponding regions, overlapped with SNP positions, to define SNPs within range of a given gene. Gene positions were defined based on Ensembl 87 gene annotations on the GRCh37 genome build. We then performed manual curation using GeneCards (Stelzer et al., 2016) to validate gene assignments and prioritize a single gene per SNP. Gene boundaries for genes encoding pathway-relevant enzymes in KEGG were extended up to 500 kb and assigned to a variant if the gene was biologically relevant to the metabolites the variant was significant in. If there were multiple genes within 100 kb of the variant, then gene assignments were made based on the following priority order: any genes encoding a pathway-relevant enzyme, genes encoding transporters, genes involved in translation/transcription regulation (referred to as TF for brevity), and any genes whose function is known. If there were multiple genes of the same gene type, then the assignment was made based on the relevance of the gene to the metabolites the variant was significant in, proximity of the gene to the variant, and, if applicable, any additional evidence in the literature (Oxford BIG [Elliott et al., 2018] and Open Target Genetics [Ghoussaini et al., 2021; Mountjoy et al., 2021]). However, even for these cases where there was not high confidence in the exact gene assignment, for instance, because there were multiple genes from the same gene family nearby, the top gene candidates all had the same gene type. Thus, because the major downstream analyses were designed in a way that only the gene type assigned to each variant mattered, the accuracy of the exact gene assignment should not affect the findings. If no genes with known function were within 100 kb of the variant, then the window was extended up to 200 kb. The distance of a variant to a given gene was defined as the number of base pairs from the variant to the closer of the start or end of the gene boundaries or was set to 0 if the variant was within the gene boundaries.

We classified each gene using GeneCards (Stelzer et al., 2016) into one of five gene types: pathway-relevant enzyme, transporter, TF, general cell function, and unknown. Genes encoding enzymes that catalyze a reaction in or adjacent to the direct synthesis or degradation of one of the 16 metabolites were defined as pathway-relevant enzymes using manual curation from GO, KEGG, REACTOME, and Stanford’s Human Metabolism Map (Pathways of Human Metabolism Map, 2021), in addition to GeneCards. Genes encoding known transporters were classified as transporters. Genes involved in translation/transcription regulation were classified as TF. Genes whose function is known but not already classified as a pathway-relevant enzyme, transporter, or TF were classified as ‘general cell function.’ Genes with unknown function or if there were no genes within 200 kb of the SNP were classified as ‘unknown.’ See Supplementary file 3 for each metabolite GWAS hit’s gene and gene-type annotations.

Gene-type enrichments were calculated with a Poisson rate test. The baseline was the total of the GWAS hits among the 1.95 Gb of the genome within 100 kb of a gene in any pathway, and the test was performed with the number of GWAS hits within 100 kb of each pathway of interebst. There were 2.8 GWAS hits per megabase within 100 kb of a pathway-relevant enzyme versus 0.1 GWAS hits per megabase among all genes (25-fold enrichment, Poisson rate test p<2e-16). There were 0.58 GWAS hits per megabase within 100 kb of a transporter versus 0.1 GWAS hits per megabase for all genes (5.2-fold enrichment, Poisson rate test p=9e-16). We also repeated this analysis using closest genes rather than assigned genes, which allowed us to use a Fisher’s exact test (as each variant has a single closest gene). This resulted in a 27-fold (p<2e-16) enrichment for pathway-relevant enzymes and an 11-fold (p<2e-15) enrichment for transporters, respectively.

For TF enrichments, we used TF-Marker (Xu et al., 2022) to annotate tissue-specific marker gene TFs. We considered ‘TF’ (n = 1316) and ‘TFMarker’ (n = 18) genes as relevant genes, and TF Pmarker (n = 1424) genes as putatively relevant. We considered enrichment among the 628 genes not associated with cancer or stem cell biology (of which 267 are putative) as our set of tissue-specific TFs for downstream analysis. We consider our background in all cases to be our total GWAS hits number (n = 213) compared to the effective genome size (2.86 Gb). In specifically this TF set, we observed a 5.96-fold enrichment over the genome-wide background (0.44 GWAS hits per megabase, p=4e-6) among relevant gene bodies and 2.50-fold enrichment (0.19 hits/Mb, p=0.0007) within 100 kb of a relevant gene, which was comparable for putatively relevant genes (4.85-fold and 2.86-fold, respectively). This was substantially higher than that of all genes in the genome (1.69-fold within gene bodies and 1.36-fold within 100 kb of genes in any pathway) and comparable to that of all TFs regardless of their function in cancer or stem cells (4.82-fold within gene bodies and 2.18-fold within 100 kb).

We next filtered tissue-specific TFs to those acting in liver (28 relevant and 30 putative), kidney (25 relevant and 20 putative), or pancreas (14 relevant and 3 putative). Kidney and pancreas TFs had no more than one GWAS hit each and were excluded for these analyses. For liver TFs, we observed an 18-fold enrichment (1.3 GWAS hits per megabase, p=0.0057) within gene bodies and a 7.6-fold enrichment (0.56 hits/Mb, p=0.002) within 100 kb of genes. Results were similar when removing the cancer and stem cell filter (1.23 hits/Mb and 0.51 hits/Mb, respectively) and dropped slightly when further including putatively relevant TFs (0.71 hits/Mb and 0.48 hits/Mb). Together, this suggests that liver marker TFs are specifically enriched for variants affecting our metabolite levels.

Ancestry-inclusive GWAS

For the ancestry-inclusive analysis, we performed the same method as the European-ancestry-only analysis except we omitted the step filtering individuals on the basis of self-identified race/ethnicity and ancestry PC outlier status. For the ancestry-inclusive analysis, we again used the European ancestry LD matrix as European-ancestry individuals were the overwhelming majority in the study. This resulted in a total of 98,189 individuals for the GWAS, which identified 238 lead GWAS variants across the 16 metabolites. This was inspired by recent ‘mega-analysis’ studies (Wojcik et al., 2019). We examined these associations and identified novel genes by comparing the list of pathway genes in the European-only analysis to those discovered in the ancestry-inclusive analysis.

Coloc-based colocalization

Full summary statistics for all quality-controlled SNPs within 1 Mb of the target gene were considered at each locus, and these were loaded for all the evaluated traits. Standard deviations of the technical covariate-adjusted trait measurements were used for input standard deviation calculations, and dichotomous traits were set to ‘cc’-type datasets while continuous trait were set to ‘quant.’ coloc.abf was run using the default arguments (Giambartolomei et al., 2014; Wallace, 2013). Output was aggregated and PP.H4.abf (the posterior probability of both traits having an association, and that this association is shared) was plotted for all pairs of traits run individually.

HESS trait heritability and pathway enrichments

We ran HESS (Shi et al., 2016) using the following commands:

hess.py --local-hsqg {\filtsumstats} --chrom {chrom} \ 
        --bfile 1kg_eur_1pct_chr{chrom} 
        --partition EUR/fourier_ls-chr{chrom}.bed \ 
        --out {Metabolite}_step1 
hess.py --prefix {Metabolite}_step1 --out {Metabolite}_step2

where 1kg_eur_1pct_chr{chrom} were downloaded from here and EUR/fourier_ls-chr{chrom}.bed were downloaded from here.

We intersected the resulting heritability estimates per LD block with gene lists from each pathway (see local ρ-HESS; within 100 kb of the gene boundary was used as the tested window) and calculated the total heritability within the pathway as the sum of the heritabilities across LD blocks and the variance of the heritability within the pathway as the sum of the variances within each LD block. Overall, this gave a per-pathway estimate. We generated genome-wide estimates of heritability as well as heritability estimates for the subset of the genome nearby any coding gene in MSigDB as background controls from which to estimate the heritability enrichments, and used the coding gene numbers for reporting as they are more conservative.

LDSC genetic correlation

LD score regression (Bulik-Sullivan et al., 2015b) was used to generate genetic correlation estimates. The following command was used:

ldsc.py --rg {\mungsumstats} --ref-ld-chr eur_ref_ld_chr 
    --w-ld-chr eur_w_ld_chr

eur_*_ld_chr were downloaded from https://data.broadinstitute.org/alkesgroup/LDSCORE/.

Mendelian randomization

The Rücker model selection framework was applied. Briefly, MR was run with inverse-variance-weighted (IVW) and MR-Egger with fixed and random effects, and selection between different methods for results to present was based on the goodness-of-fit and heterogeneity parameters for the individual MR regressions as previously described (Bowden et al., 2018; Sinnott-Armstrong et al., 2021c).

Discordant variant analysis

All pairwise combinations of LDSC genetic correlation (as described above) were performed for the 16 metabolites. Pairs were filtered to those that had a genetic correlation significantly different than 0 using ashR (Stephens, 2017) with a local false sign rate of 0.005. We then annotated all metabolite GWAS hits with pairs of metabolites for which the variant had a p<1e-4 association with both metabolites and a p<5e-8 association with at least one, defined as significant metabolite pairs. A variant was classified as ‘discordant’ if it had the same effect direction in both metabolites of at least one significant metabolite pair that had a negative global genetic correlation, or if it had opposite effect directions in the two metabolites of at least one significant metabolite pair that had a positive global genetic correlation. Fourteen variants of the 62 that had at least one significant metabolite pair were classified as discordant. Variants that had the same set of effect directions as the sign of the global LDSC genetic correlation for all of its significant metabolite pairs were classified as ‘concordant.’ Variants that had no significant metabolite pairs were classified as ‘neither’.

The ‘between’ region for a given pair of metabolites was defined as the shortest realistic biochemical path connecting them, and any alternative paths of reasonably similar distance and likelihood. This region can include a path converting one metabolite to the other, as well as other scenarios such as those involving colliders. This is because all biochemical reactions, including one-way reactions, can have bidirectional causal relationships due to Le Châtelier’s principle. In other words, even in a one-way (irreversible) reaction, changes in product levels can induce changes in reactant levels in order to re-establish equilibrium. Genes were defined as acting between a given metabolite pair either if they encoded an enzyme catalyzing a reaction in the ‘between’ region defined above or if they encoded a transporter that primarily transports either of the two metabolites themselves or an intermediate metabolite in the ‘between’ region. Variants were defined as acting between a given metabolite pair if the gene they affect was defined as between. Pathways were defined as between a given metabolite pair if many of the genes defined as between the metabolites were part of the pathway or if many of the genes in the pathway were defined as between. Note that even if a pathway is defined as ‘between,’ not all genes in the pathway will always be between and vice versa; however, this is likely to only make the resulting differences in genetic correlation for ‘between’ vs. not ‘between’ pathways more conservative.

Local ρ-HESS

We ran HESS (Shi et al., 2017) using the following commands:

hess.py --local-rhog {Met1_sumstats} 
         {Met2_sumstats} --chrom {chrom} --bfile 1kg_eur_1pct_chr{chrom} \ 
         --partition EUR/fourier_ls-chr{chrom}.bed --out {Met1_Met2}_step1 \ 
hess.py --prefix {Met1_Met2}_step1_trait1 \ 
        --out {Met1_Met2}_step2_trait1 
hess.py --prefix {Met1_Met2}_step1_trait2 \ 
        --out {Met1_Met2}_step2_trait2 
hess.py --prefix {Met1_Met2}_step1 \ 
        --local-hsqg-est {Met1_Met2}_step2_trait1 {Met1_Met2}_step2_trait2 \ 
        --num-shared 94464 \ 
        --pheno-cor {gcov_int from LDSC genetic correlation for Met1_Met2} \ 
        --out {Met1_Met2}_step3

where 1kg_eur_1pct_chr{chrom} were downloaded here and EUR/fourier_ls-chr{chrom}.bed were downloaded here.

We then used the local rho HESS results and estimated the local genetic covariance and correlation across all LD blocks overlapping pathway regions.

We defined the pathway regions based on gene boundaries of relevant genes in Figure 3—figure supplement 1 as follows: ‘Other Amino Acid Genes’ includes all genes colored orange, ‘Ketone Body Genes’ includes all genes colored purple, ‘Glycolysis, Gluconeogenesis, and TCA Genes’ includes all genes colored red, and ‘Urea Cycle Genes’ includes all genes colored green. ‘BCAA Genes’ included all genes in KEGG_VALINE_LEUCINE_AND_ISOLEUCINE_DEGRADATION except OXCT2, HMGCL, HMGCS1, HMGCS2, ACAT1, ACAT2, OXCT1, DLD, AGXT2, ABAT, and AACS and also included ECHDC1. ‘All Regions Outside Pathway Genes’ was defined as all LD blocks not overlapping any of the regions defined above. ‘Metabolite-Associated TF Genes’ and ‘Metabolite-Associated Transporters Genes’ were defined as all LD blocks overlapping any of TFs or transporters, respectively annotating the metabolite GWAS hits.

Fligner–Killeen variance test

Rather than aggregating variant effects and estimating total genetic covariance and heritability per pathway, which is not robust to outlier effects, we additionally tried a nonparametric approach. Individual rg and h2 estimates for LD blocks were compared between the baseline (all coding genes) and the pathway of interest by listing all per-block genetic covariance scores and computing a Fligner–Killeen variance test within each pathway in R. This enables direct evaluation of genetic covariances between the pathways at the cost of simultaneously capturing the enrichment of heritability and genetic covariance therein.

BOLT-REML

Genotyped variants within 100 kb of genes in each pathway were aggregated, and the resulting matrices were tested using the following command in BOLT-LMM:

bolt 
     --remove {non-European ancestry individuals} 
     --phenoFile={Technical-adjusted metabolites} \ 
     --phenoCol=Ala \ 
     --phenoCol=Gln \ 
     --covarCol=sex --covarCol=Array 
     --qCovarCol=age
     --qCovarCol=PC{1:10} \ 
     --geneticMapFile=genetic_map_hg19_withX.txt.gz ‘# downloaded with bolt‘ \ 
     --numThreads=24 
     --verboseStats \ 
     --modelSnps {pathway SNPs} \ 
     --reml \ --noMapCheck

Standard errors were as reported by BOLT-REML.

Haseman–Elston regression

Genotyped variants were pruned to MAF > 1% and approximate linkage equilibrium among individuals included in the GWAS using,

{GCTA} --HEreg-bivar {trait1} {trait2} --thread-num 16 --grm {GRM}

Results using multiple GRMs (--mgrm) to jointly test all pathways were qualitatively similar outside of the genome-wide GRM, which no longer captured the within-pathway component.

Stratified LD score regression

Analyses were performed as described in LDSC genetic correlation, except that rather than eur_ref_ld_chr as the reference LD scores, instead LD scores computed on variants within 100 kb of genes in each pathway were utilized.

Disease variant analysis

The metabolite GWAS hits annotated with pathway-relevant enzymes were overlapped with significant hits for CAD, identifying the variant rs61791721 as the most significant variant (Kichaev et al., 2019; Koyama et al., 2020). Incident CAD cases were defined among UK Biobank participants as those individuals who received a first diagnosis of myocardial infarction (MI) using the analytical MI model (field 42000) after the date of baseline assessment. Prevalent cases (individuals with a first diagnosis before date of assessment) were excluded. A Cox proportional hazard model was run with the technical-covariate-adjusted, log-transformed metabolite levels predicting incident MI status, adjusted for age, age2, age * sex, age2 * sex, and statin usage (defined based on a list of individual drug codes as previously described; Sinnott-Armstrong et al., 2021a). Effect sizes presented are based on the estimates from these models run independently for each metabolite.

Colocalization analysis

We wanted to evaluate the extent to which our associations might represent single causal variants across multiple traits and used conditional association at the locus to evaluate this. For each variant within 500 kb of our lead SNPs in at least one metabolite, we ran a conditional analysis for the variants within 1 Mb of the gene body of our putative target gene. Then we ran the following association test in plink2:

plink2 --glm cols = chrom,pos,ref,alt,a1freq,firth,test, 
nobs,orbeta,se,ci,tz,p hide-covar omit-ref 
     --pfile<imputed genotypes> 
--covar<age/sex/PCs> 
--keep<94,464 European-ancestry individuals in the BOLT-LMM GWAS> 
--out conditional/$gene/$snp 
--pheno<technical-residualized traits> 
  --extract <(variants within 1Mb of gene body) 
--condition<conditional SNP>

For single SNP conditioning tests and --condition-list for conditioning on multiple variants. Associations were visually inspected to detect highly linked variants and conditioning tests were repeated with top associations in any of the key traits until there were no significant variants remaining.

For the PCCB vignette, additional traits were included in the analysis, including fatty acids and lipids in the Nightingale-assayed individuals and clinical biomarkers in the full cohort of European-ancestry UK Biobank participants, where traits were residualized as previously described (Sinnott-Armstrong et al., 2021a). We further included a GWAS for ‘hard’ CAD as previously defined (Inouye et al., 2018), for which results were qualitatively similar when evaluating ‘soft’ CAD (including angina cases) and employing only EHR-based diagnoses (rather than additionally including self-reported case status). Results for ‘hard’ CAD are shown in the supplement.

Pathway diagrams

Diagrams were drawn using Affinity Design, and molecular structures were made using ChemDraw. Pathway information was curated from GO (Ashburner et al., 2000; Consortium et al., 2021), KEGG (Kanehisa and Goto, 2000), or REACTOME MSigDB (Liberzon et al., 2011; Subramanian et al., 2005), and Stanford’s Human Metabolism Map (Pathways of Human Metabolism Map, 2021), along with manual curation from public domain biochemistry knowledge (Supplementary file 2).

Acknowledgements

We thank Alyssa Lyn Fortier, Hanna M Ollila, Shoa Clarke, and other members of the Pritchard lab and Assimes lab for helpful discussions. The authors are grateful to UK Biobank and its participants for access to data to undertake this study (Project #30418 and #24983). Nightingale Health Plc is acknowledged for early access to the UK Biobank NMR biomarker data. CJS was supported by a National Science Foundation Graduate Research Fellowship and Stanford’s Knight-Hennessy Scholars Program. This work was supported by NIH grants 5R01HG011432 and 5R01AG066490 (to JKP). We would like to thank the reviewers for their careful reading of our manuscript and their thoughtful comments and suggestions that improved our manuscript.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Courtney J Smith, Email: courtrun@stanford.edu.

Nasa Sinnott-Armstrong, Email: nasa@fredhutch.org.

Jonathan K Pritchard, Email: pritch@stanford.edu.

Alexander Young, University of California, Los Angeles, United States.

George H Perry, Pennsylvania State University, United States.

Funding Information

This paper was supported by the following grants:

  • Stanford Knight-Hennessy Scholars Program Graduate Student Fellowship to Courtney J Smith.

  • National Science Foundation Graduate Student Fellowship to Courtney J Smith.

  • National Institute of Health 5R01HG011432 and 5R01AG066490 to Jonathan K Pritchard.

Additional information

Competing interests

No competing interests declared.

is a former employee and holds stock options with Nightingale Health Plc.

is an employee and holds stock options with Nightingale Health Plc.

is affiliated with Pfizer Worldwide Research, has no financial interests to declare, contributed as an individual and the work was not part of a Pfizer collaboration nor was it funded by Pfizer.

is an employee and shareholder of Nightingale Health Plc.

No competing interests declared.

Author contributions

Conceptualization, Data curation, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – original draft, Project administration, Writing – review and editing.

Conceptualization, Data curation, Formal analysis, Supervision, Validation, Investigation, Visualization, Methodology, Writing – original draft, Project administration, Writing – review and editing.

Data curation, Methodology, Writing – review and editing.

Data curation, Methodology, Writing – review and editing.

Data curation, Investigation, Writing – review and editing.

Data curation, Supervision, Funding acquisition, Methodology, Writing – review and editing.

Conceptualization, Supervision, Funding acquisition, Methodology, Writing – original draft, Project administration, Writing – review and editing.

Ethics

Human subjects: All participants provided written informed consent and ethical approval was obtained from the North West Multi-Center Research Ethics Committee (11/NW/0382). The current analysis was approved under UK Biobank Project 24983 and 30418.

Additional files

Supplementary file 1. Metabolite HESS heritabilities.

Heritability results from HESS for each metabolite.

elife-79348-supp1.tsv (1.8KB, tsv)
Supplementary file 2. Gene function sources.

Sources for biochemical characterization of genes mentioned in the variant vignettes.

elife-79348-supp2.xlsx (10.9KB, xlsx)
Supplementary file 3. Metabolite genome-wide association studies (GWAS) hits annotation.

Annotation for the 213 metabolite GWAS hits, including the assigned gene, assigned gene type, variant classification, and nearest gene.

elife-79348-supp3.tsv (16.6KB, tsv)
Supplementary file 4. Gene biochemical groups.

The number of significant (p<1e-4) metabolite associations each metabolite genome-wide association studies (GWAS) gene had for each biochemical group. For a given gene, only biochemical groups that had at least one significant metabolite association were listed.

elife-79348-supp4.tsv (6.5KB, tsv)
Supplementary file 5. Ancestry-inclusive genome-wide association studies (GWAS) hits.

List of additional metabolite GWAS hits from the ancestry-inclusive analysis that were not present in the European-only GWAS results.

elife-79348-supp5.tsv (1.5KB, tsv)
Supplementary file 6. Discordant variant annotation.

List of each discordant variant–metabolite association, including the variant annotations and relevant metabolite pair genetic correlation and genome-wide association studies (GWAS) summary statistics.

elife-79348-supp6.tsv (4.5KB, tsv)
Supplementary file 7. Local genetic correlation results.

Combined results for different methods of calculating the local genetic correlation for different pathways for alanine and glutamine, demonstrating the consistency across the different approaches.

elife-79348-supp7.tsv (1.9KB, tsv)
Supplementary file 8. Metabolite associations with coronary artery disease (CAD).

Literature evidence and citations for metabolite associations with CAD.

elife-79348-supp8.xlsx (10.9KB, xlsx)
Supplementary file 9. PCCB genome-wide association studies (GWAS) results.

GWAS summary statistics for rs61791721 (PCCB) in the 16 metabolites.

elife-79348-supp9.tsv (1.7KB, tsv)
MDAR checklist

Data availability

The source data and analyzed data have been deposited in Dryad at https://doi.org/10.5061/dryad.79cnp5hxs. Code are available at the github link (https://github.com/courtrun/Pleiotropy-of-UKB-Metabolites copy archived at swh:1:rev:bff4f4d8fb0562f222b1f73560b23ca9b8f57047). The raw individual level data are available through application to UK Biobank.

The following dataset was generated:

Smith C, Sinnott-Armstrong N, Cichonska A, Julkunen H, Fauman E, Wurtz P, Pritchard J. 2022. Pleiotropy of UK Biobank Metabolites [preliminary] Dryad Digital Repository.

References

  1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. Nature Genetics. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bellenguez C, Strange A, Freeman C, Donnelly P, Spencer CCA, Wellcome Trust Case Control Consortium A robust clustering algorithm for identifying problematic samples in genome-wide association studies. Bioinformatics. 2012;28:134–135. doi: 10.1093/bioinformatics/btr599. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bowden J, Spiller W, Del Greco M F, Sheehan N, Thompson J, Minelli C, Davey Smith G. Improving the visualization, interpretation and analysis of two-sample summary data mendelian randomization via the radial plot and radial regression. International Journal of Epidemiology. 2018;47:1264–1278. doi: 10.1093/ije/dyy101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bulik-Sullivan B, Finucane HK, Anttila V, Gusev A, Day FR, Loh PR, Duncan L, Perry JRB, Patterson N, Robinson EB, Daly MJ, Price AL, Neale BM, ReproGen Consortium. Psychiatric Genomics Consortium. Genetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control Consortium 3 An atlas of genetic correlations across human diseases and traits. Nature Genetics. 2015a;47:1236–1241. doi: 10.1038/ng.3406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bulik-Sullivan BK, Loh PR, Finucane HK, Ripke S, Yang J, Patterson N, Daly MJ, Price AL, Neale BM, Schizophrenia Working Group of the Psychiatric Genomics Consortium LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature Genetics. 2015b;47:291–295. doi: 10.1038/ng.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, Motyer A, Vukcevic D, Delaneau O, O’Connell J, Cortes A, Welsh S, Young A, Effingham M, McVean G, Leslie S, Allen N, Donnelly P, Marchini J. The UK biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cichonska A, Rousu J, Marttinen P, Kangas AJ, Soininen P, Lehtimäki T, Raitakari OT, Järvelin MR, Salomaa V, Ala-Korpela M, Ripatti S, Pirinen M. MetaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis. Bioinformatics. 2016;32:1981–1989. doi: 10.1093/bioinformatics/btw052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Consortium TGO, Carbon S, Douglass E. The gene ontology resource: enriching a gold mine. Nucleic Acids Research. 2021;49:D325–D334. doi: 10.1093/nar/gkaa1113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Elliott LT, Sharp K, Alfaro-Almagro F, Shi S, Miller KL, Douaud G, Marchini J, Smith SM. Genome-wide association studies of brain imaging phenotypes in UK biobank. Nature. 2018;562:210–216. doi: 10.1038/s41586-018-0571-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Ghoussaini M, Mountjoy E, Carmona M, Peat G, Schmidt EM, Hercules A, Fumis L, Miranda A, Carvalho-Silva D, Buniello A, Burdett T, Hayhurst J, Baker J, Ferrer J, Gonzalez-Uriarte A, Jupp S, Karim MA, Koscielny G, Machlitt-Northen S, Malangone C, Pendlington ZM, Roncaglia P, Suveges D, Wright D, Vrousgou O, Papa E, Parkinson H, MacArthur JAL, Todd JA, Barrett JC, Schwartzentruber J, Hulcoop DG, Ochoa D, McDonagh EM, Dunham I. Open targets genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics. Nucleic Acids Research. 2021;49:D1311–D1320. doi: 10.1093/nar/gkaa840. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace C, Plagnol V. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLOS Genetics. 2014;10:e1004383. doi: 10.1371/journal.pgen.1004383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Gieger C, Geistlinger L, Altmaier E, Hrabé de Angelis M, Kronenberg F, Meitinger T, Mewes H-W, Wichmann H-E, Weinberger KM, Adamski J, Illig T, Suhre K. Genetics meets metabolomics: A genome-wide association study of metabolite profiles in human serum. PLOS Genetics. 2008;4:e1000282. doi: 10.1371/journal.pgen.1000282. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Graham SE, Clarke SL, Wu K-HH, Kanoni S, Zajac GJM, Ramdas S, Surakka I, Ntalla I, Vedantam S, Winkler TW, Locke AE, Marouli E, Hwang MY, Han S, Narita A, Choudhury A, Bentley AR, Ekoru K, Verma A, Trivedi B, Martin HC, Hunt KA, Hui Q, Klarin D, Zhu X, Thorleifsson G, Helgadottir A, Gudbjartsson DF, Holm H, Olafsson I, Akiyama M, Sakaue S, Terao C, Kanai M, Zhou W, Brumpton BM, Rasheed H, Ruotsalainen SE, Havulinna AS, Veturi Y, Feng Q, Rosenthal EA, Lingren T, Pacheco JA, Pendergrass SA, Haessler J, Giulianini F, Bradford Y, Miller JE, Campbell A, Lin K, Millwood IY, Hindy G, Rasheed A, Faul JD, Zhao W, Weir DR, Turman C, Huang H, Graff M, Mahajan A, Brown MR, Zhang W, Yu K, Schmidt EM, Pandit A, Gustafsson S, Yin X, Luan J, Zhao J-H, Matsuda F, Jang H-M, Yoon K, Medina-Gomez C, Pitsillides A, Hottenga JJ, Willemsen G, Wood AR, Ji Y, Gao Z, Haworth S, Mitchell RE, Chai JF, Aadahl M, Yao J, Manichaikul A, Warren HR, Ramirez J, Bork-Jensen J, Kårhus LL, Goel A, Sabater-Lleal M, Noordam R, Sidore C, Fiorillo E, McDaid AF, Marques-Vidal P, Wielscher M, Trompet S, Sattar N, Møllehave LT, Thuesen BH, Munz M, Zeng L, Huang J, Yang B, Poveda A, Kurbasic A, Lamina C, Forer L, Scholz M, Galesloot TE, Bradfield JP, Daw EW, Zmuda JM, Mitchell JS, Fuchsberger C, Christensen H, Brody JA, Feitosa MF, Wojczynski MK, Preuss M, Mangino M, Christofidou P, Verweij N, Benjamins JW, Engmann J, Kember RL, Slieker RC, Lo KS, Zilhao NR, Le P, Kleber ME, Delgado GE, Huo S, Ikeda DD, Iha H, Yang J, Liu J, Leonard HL, Marten J, Schmidt B, Arendt M, Smyth LJ, Cañadas-Garre M, Wang C, Nakatochi M, Wong A, Hutri-Kähönen N, Sim X, Xia R, Huerta-Chagoya A, Fernandez-Lopez JC, Lyssenko V, Ahmed M, Jackson AU, Irvin MR, Oldmeadow C, Kim H-N, Ryu S, Timmers PRHJ, Arbeeva L, Dorajoo R, Lange LA, Chai X, Prasad G, Lorés-Motta L, Pauper M, Long J, Li X, Theusch E, Takeuchi F, Spracklen CN, Loukola A, Bollepalli S, Warner SC, Wang YX, Wei WB, Nutile T, Ruggiero D, Sung YJ, Hung Y-J, Chen S, Liu F, Yang J, Kentistou KA, Gorski M, Brumat M, Meidtner K, Bielak LF, Smith JA, Hebbar P, Farmaki A-E, Hofer E, Lin M, Xue C, Zhang J, Concas MP, Vaccargiu S, van der Most PJ, Pitkänen N, Cade BE, Lee J, van der Laan SW, Chitrala KN, Weiss S, Zimmermann ME, Lee JY, Choi HS, Nethander M, Freitag-Wolf S, Southam L, Rayner NW, Wang CA, Lin S-Y, Wang J-S, Couture C, Lyytikäinen L-P, Nikus K, Cuellar-Partida G, Vestergaard H, Hildalgo B, Giannakopoulou O, Cai Q, Obura MO, van Setten J, Li X, Schwander K, Terzikhan N, Shin JH, Jackson RD, Reiner AP, Martin LW, Chen Z, Li L, Highland HM, Young KL, Kawaguchi T, Thiery J, Bis JC, Nadkarni GN, Launer LJ, Li H, Nalls MA, Raitakari OT, Ichihara S, Wild SH, Nelson CP, Campbell H, Jäger S, Nabika T, Al-Mulla F, Niinikoski H, Braund PS, Kolcic I, Kovacs P, Giardoglou T, Katsuya T, Bhatti KF, de Kleijn D, de Borst GJ, Kim EK, Adams HHH, Ikram MA, Zhu X, Asselbergs FW, Kraaijeveld AO, Beulens JWJ, Shu X-O, Rallidis LS, Pedersen O, Hansen T, Mitchell P, Hewitt AW, Kähönen M, Pérusse L, Bouchard C, Tönjes A, Chen Y-DI, Pennell CE, Mori TA, Lieb W, Franke A, Ohlsson C, Mellström D, Cho YS, Lee H, Yuan J-M, Koh W-P, Rhee SY, Woo J-T, Heid IM, Stark KJ, Völzke H, Homuth G, Evans MK, Zonderman AB, Polasek O, Pasterkamp G, Hoefer IE, Redline S, Pahkala K, Oldehinkel AJ, Snieder H, Biino G, Schmidt R, Schmidt H, Chen YE, Bandinelli S, Dedoussis G, Thanaraj TA, Kardia SLR, Kato N, Schulze MB, Girotto G, Jung B, Böger CA, Joshi PK, Bennett DA, De Jager PL, Lu X, Mamakou V, Brown M, Caulfield MJ, Munroe PB, Guo X, Ciullo M, Jonas JB, Samani NJ, Kaprio J, Pajukanta P, Adair LS, Bechayda SA, de Silva HJ, Wickremasinghe AR, Krauss RM, Wu J-Y, Zheng W, den Hollander AI, Bharadwaj D, Correa A, Wilson JG, Lind L, Heng C-K, Nelson AE, Golightly YM, Wilson JF, Penninx B, Kim H-L, Attia J, Scott RJ, Rao DC, Arnett DK, Walker M, Koistinen HA, Chandak GR, Yajnik CS, Mercader JM, Tusié-Luna T, Aguilar-Salinas CA, Villalpando CG, Orozco L, Fornage M, Tai ES, van Dam RM, Lehtimäki T, Chaturvedi N, Yokota M, Liu J, Reilly DF, McKnight AJ, Kee F, Jöckel K-H, McCarthy MI, Palmer CNA, Vitart V, Hayward C, Simonsick E, van Duijn CM, Lu F, Qu J, Hishigaki H, Lin X, März W, Parra EJ, Cruz M, Gudnason V, Tardif J-C, Lettre G, ’t Hart LM, Elders PJM, Damrauer SM, Kumari M, Kivimaki M, van der Harst P, Spector TD, Loos RJF, Province MA, Psaty BM, Brandslund I, Pramstaller PP, Christensen K, Ripatti S, Widén E, Hakonarson H, Grant SFA, Kiemeney LALM, de Graaf J, Loeffler M, Kronenberg F, Gu D, Erdmann J, Schunkert H, Franks PW, Linneberg A, Jukema JW, Khera AV, Männikkö M, Jarvelin M-R, Kutalik Z, Cucca F, Mook-Kanamori DO, van Dijk KW, Watkins H, Strachan DP, Grarup N, Sever P, Poulter N, Rotter JI, Dantoft TM, Karpe F, Neville MJ, Timpson NJ, Cheng C-Y, Wong T-Y, Khor CC, Sabanayagam C, Peters A, Gieger C, Hattersley AT, Pedersen NL, Magnusson PKE, Boomsma DI, de Geus EJC, Cupples LA, van Meurs JBJ, Ghanbari M, Gordon-Larsen P, Huang W, Kim YJ, Tabara Y, Wareham NJ, Langenberg C, Zeggini E, Kuusisto J, Laakso M, Ingelsson E, Abecasis G, Chambers JC, Kooner JS, de Vries PS, Morrison AC, North KE, Daviglus M, Kraft P, Martin NG, Whitfield JB, Abbas S, Saleheen D, Walters RG, Holmes MV, Black C, Smith BH, Justice AE, Baras A, Buring JE, Ridker PM, Chasman DI, Kooperberg C, Wei W-Q, Jarvik GP, Namjou B, Hayes MG, Ritchie MD, Jousilahti P, Salomaa V, Hveem K, Åsvold BO, Kubo M, Kamatani Y, Okada Y, Murakami Y, Thorsteinsdottir U, Stefansson K, Ho Y-L, Lynch JA, Rader DJ, Tsao PS, Chang K-M, Cho K, O’Donnell CJ, Gaziano JM, Wilson P, Rotimi CN, Hazelhurst S, Ramsay M, Trembath RC, van Heel DA, Tamiya G, Yamamoto M, Kim B-J, Mohlke KL, Frayling TM, Hirschhorn JN, Kathiresan S, VA Million Veteran Program. Global Lipids Genetics Consortium*. Boehnke M, Natarajan P, Peloso GM, Brown CD, Morris AP, Assimes TL, Deloukas P, Sun YV, Willer CJ. The power of genetic diversity in genome-wide association studies of lipids. Nature. 2021;600:675–679. doi: 10.1038/s41586-021-04064-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Guasch-Ferré M, Santos JL, Martínez-González MA, Clish CB, Razquin C, Wang D, Liang L, Li J, Dennis C, Corella D, Muñoz-Bravo C, Romaguera D, Estruch R, Santos-Lozano JM, Castañer O, Alonso-Gómez A, Serra-Majem L, Ros E, Canudas S, Asensio EM, Fitó M, Pierce K, Martínez JA, Salas-Salvadó J, Toledo E, Hu FB, Ruiz-Canela M. Glycolysis/gluconeogenesis- and tricarboxylic acid cycle-related metabolites, mediterranean diet, and type 2 diabetes. The American Journal of Clinical Nutrition. 2020;111:835–844. doi: 10.1093/ajcn/nqaa016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Inouye M, Abraham G, Nelson CP, Wood AM, Sweeting MJ, Dudbridge F, Lai FY, Kaptoge S, Brozynska M, Wang T, Ye S, Webb TR, Rutter MK, Tzoulaki I, Patel RS, Loos RJF, Keavney B, Hemingway H, Thompson J, Watkins H, Deloukas P, Di Angelantonio E, Butterworth AS, Danesh J, Samani NJ. Genomic risk prediction of coronary artery disease in 480,000 adults. Journal of the American College of Cardiology. 2018;72:1883–1893. doi: 10.1016/j.jacc.2018.07.079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Jauhiainen R, Vangipurapu J, Laakso A, Kuulasmaa T, Kuusisto J, Laakso M. The association of 9 amino acids with cardiovascular events in finnish men in a 12-year follow-up study. The Journal of Clinical Endocrinology and Metabolism. 2021;106:3448–3454. doi: 10.1210/clinem/dgab562. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Julkunen H, Cichońska A, Slagboom PE, Würtz P, Nightingale Health UK Biobank Initiative Metabolic biomarker profiling for identification of susceptibility to severe pneumonia and COVID-19 in the general population. eLife. 2021;10:e63033. doi: 10.7554/eLife.63033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Research. 2000;28:27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Kettunen J, Demirkan A, Würtz P, Draisma HHM, Haller T, Rawal R, Vaarhorst A, Kangas AJ, Lyytikäinen L-P, Pirinen M, Pool R, Sarin A-P, Soininen P, Tukiainen T, Wang Q, Tiainen M, Tynkkynen T, Amin N, Zeller T, Beekman M, Deelen J, van Dijk KW, Esko T, Hottenga J-J, van Leeuwen EM, Lehtimäki T, Mihailov E, Rose RJ, de Craen AJM, Gieger C, Kähönen M, Perola M, Blankenberg S, Savolainen MJ, Verhoeven A, Viikari J, Willemsen G, Boomsma DI, van Duijn CM, Eriksson J, Jula A, Järvelin M-R, Kaprio J, Metspalu A, Raitakari O, Salomaa V, Slagboom PE, Waldenberger M, Ripatti S, Ala-Korpela M. Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of LPA. Nature Communications. 2016;7:11122. doi: 10.1038/ncomms11122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Kichaev G, Bhatia G, Loh P-R, Gazal S, Burch K, Freund MK, Schoech A, Pasaniuc B, Price AL. Leveraging polygenic functional enrichment to improve GWAS power. American Journal of Human Genetics. 2019;104:65–75. doi: 10.1016/j.ajhg.2018.11.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Koyama S, Ito K, Terao C, Akiyama M, Horikoshi M, Momozawa Y, Matsunaga H, Ieki H, Ozaki K, Onouchi Y, Takahashi A, Nomura S, Morita H, Akazawa H, Kim C, Seo J-S, Higasa K, Iwasaki M, Yamaji T, Sawada N, Tsugane S, Koyama T, Ikezaki H, Takashima N, Tanaka K, Arisawa K, Kuriki K, Naito M, Wakai K, Suna S, Sakata Y, Sato H, Hori M, Sakata Y, Matsuda K, Murakami Y, Aburatani H, Kubo M, Matsuda F, Kamatani Y, Komuro I. Population-specific and trans-ancestry genome-wide analyses identify distinct and shared genetic risk loci for coronary artery disease. Nature Genetics. 2020;52:1169–1177. doi: 10.1038/s41588-020-0705-3. [DOI] [PubMed] [Google Scholar]
  23. Laffel L. Ketone bodies: a review of physiology, pathophysiology and application of monitoring to diabetes. Diabetes/Metabolism Research and Reviews. 1999;15:412–426. doi: 10.1002/(sici)1520-7560(199911/12)15:6&#x0003c;412::aid-dmrr72&#x0003e;3.0.co;2-8. [DOI] [PubMed] [Google Scholar]
  24. Lemaitre RN, Tanaka T, Tang W, Manichaikul A, Foy M, Kabagambe EK, Nettleton JA, King IB, Weng LC, Bhattacharya S, Bandinelli S, Bis JC, Rich SS, Jacobs DR, Cherubini A, McKnight B, Liang S, Gu X, Rice K, Laurie CC, Lumley T, Browning BL, Psaty BM, Chen YDI, Friedlander Y, Djousse L, Wu JHY, Siscovick DS, Uitterlinden AG, Arnett DK, Ferrucci L, Fornage M, Tsai MY, Mozaffarian D, Steffen LM. Genetic loci associated with plasma phospholipid n-3 fatty acids: A meta-analysis of genome-wide association studies from the CHARGE consortium. PLOS Genetics. 2011;7:e1002193. doi: 10.1371/journal.pgen.1002193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P, Mesirov JP. Molecular signatures database (msigdb) 3.0. Bioinformatics. 2011;27:1739–1740. doi: 10.1093/bioinformatics/btr260. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Loh P-R, Tucker G, Bulik-Sullivan BK, Vilhjálmsson BJ, Finucane HK, Salem RM, Chasman DI, Ridker PM, Neale BM, Berger B, Patterson N, Price AL. Efficient bayesian mixed-model analysis increases association power in large cohorts. Nature Genetics. 2015;47:284–290. doi: 10.1038/ng.3190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Lotta LA, Pietzner M, Stewart ID, Wittemans LBL, Li C, Bonelli R, Raffler J, Biggs EK, Oliver-Williams C, Auyeung VPW, Luan J, Wheeler E, Paige E, Surendran P, Michelotti GA, Scott RA, Burgess S, Zuber V, Sanderson E, Koulman A, Imamura F, Forouhi NG, Khaw K-T, MacTel Consortium. Griffin JL, Wood AM, Kastenmüller G, Danesh J, Butterworth AS, Gribble FM, Reimann F, Bahlo M, Fauman E, Wareham NJ, Langenberg C. A cross-platform approach identifies genetic regulators of human metabolism and health. Nature Genetics. 2021;53:54–64. doi: 10.1038/s41588-020-00751-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Lusis AJ, Weiss JN. Cardiovascular networks. Circulation. 2010;121:157–170. doi: 10.1161/CIRCULATIONAHA.108.847699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Mahajan A, Taliun D, Thurner M, Robertson NR, Torres JM, Rayner NW, Payne AJ, Steinthorsdottir V, Scott RA, Grarup N, Cook JP, Schmidt EM, Wuttke M, Sarnowski C, Mägi R, Nano J, Gieger C, Trompet S, Lecoeur C, Preuss MH, Prins BP, Guo X, Bielak LF, Below JE, Bowden DW, Chambers JC, Kim YJ, Ng MCY, Petty LE, Sim X, Zhang W, Bennett AJ, Bork-Jensen J, Brummett CM, Canouil M, Ec Kardt K-U, Fischer K, Kardia SLR, Kronenberg F, Läll K, Liu C-T, Locke AE, Luan J, Ntalla I, Nylander V, Schönherr S, Schurmann C, Yengo L, Bottinger EP, Brandslund I, Christensen C, Dedoussis G, Florez JC, Ford I, Franco OH, Frayling TM, Giedraitis V, Hackinger S, Hattersley AT, Herder C, Ikram MA, Ingelsson M, Jørgensen ME, Jørgensen T, Kriebel J, Kuusisto J, Ligthart S, Lindgren CM, Linneberg A, Lyssenko V, Mamakou V, Meitinger T, Mohlke KL, Morris AD, Nadkarni G, Pankow JS, Peters A, Sattar N, Stančáková A, Strauch K, Taylor KD, Thorand B, Thorleifsson G, Thorsteinsdottir U, Tuomilehto J, Witte DR, Dupuis J, Peyser PA, Zeggini E, Loos RJF, Froguel P, Ingelsson E, Lind L, Groop L, Laakso M, Collins FS, Jukema JW, Palmer CNA, Grallert H, Metspalu A, Dehghan A, Köttgen A, Abecasis GR, Meigs JB, Rotter JI, Marchini J, Pedersen O, Hansen T, Langenberg C, Wareham NJ, Stefansson K, Gloyn AL, Morris AP, Boehnke M, McCarthy MI. Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nature Genetics. 2018;50:1505–1513. doi: 10.1038/s41588-018-0241-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Martin G, Duez H, Blanquart C, Berezowski V, Poulain P, Fruchart J-C, Najib-Fruchart J, Glineur C, Staels B. Statin-induced inhibition of the rho-signaling pathway activates pparalpha and induces HDL apoa-I. The Journal of Clinical Investigation. 2001;107:1423–1432. doi: 10.1172/JCI10852. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Mountjoy E, Schmidt EM, Carmona M, Schwartzentruber J, Peat G, Miranda A, Fumis L, Hayhurst J, Buniello A, Karim MA, Wright D, Hercules A, Papa E, Fauman EB, Barrett JC, Todd JA, Ochoa D, Dunham I, Ghoussaini M. An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci. Nature Genetics. 2021;53:1527–1533. doi: 10.1038/s41588-021-00945-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Newsholme P, Bender K, Kiely A, Brennan L. Amino acid metabolism, insulin secretion and diabetes. Biochemical Society Transactions. 2007;35:1180–1186. doi: 10.1042/BST0351180. [DOI] [PubMed] [Google Scholar]
  33. Pathways of Human Metabolism Map Stanford Med Education. 2021. [October 4, 2021]. https://metabolicpathways.stanford.edu
  34. Pott J, Bae YJ, Horn K, Teren A, Kühnapfel A, Kirsten H, Ceglarek U, Loeffler M, Thiery J, Kratzsch J, Scholz M. Genetic association study of eight steroid hormones and implications for sexual dimorphism of coronary artery disease. The Journal of Clinical Endocrinology and Metabolism. 2019;104:5008–5023. doi: 10.1210/jc.2019-00757. [DOI] [PubMed] [Google Scholar]
  35. Qi G, Chatterjee N. Heritability informed power optimization (HIPO) leads to enhanced detection of genetic associations across multiple traits. PLOS Genetics. 2018;14:e1007549. doi: 10.1371/journal.pgen.1007549. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Rueedi R, Mallol R, Raffler J, Lamparter D, Friedrich N, Vollenweider P, Waeber G, Kastenmüller G, Kutalik Z, Bergmann S. Metabomatching: using genetic association to identify metabolites in proton NMR spectroscopy. PLOS Computational Biology. 2017;13:e1005839. doi: 10.1371/journal.pcbi.1005839. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Ruotsalainen SE, Partanen JJ, Cichonska A, Lin J, Benner C, Surakka I, FinnGen. Reeve MP, Palta P, Salmi M, Jalkanen S, Ahola-Olli A, Palotie A, Salomaa V, Daly MJ, Pirinen M, Ripatti S, Koskela J. An expanded analysis framework for multivariate GWAS connects inflammatory biomarkers to functional variants and disease. European Journal of Human Genetics. 2021;29:309–324. doi: 10.1038/s41431-020-00730-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Salam WH, Wilcox HG, Cagen LM, Heimberg M. Stimulation of hepatic cholesterol biosynthesis by fatty acids: effects of oleate on cytoplasmic acetoacetyl-CoA thiolase, acetoacetyl-CoA synthetase and hydroxymethylglutaryl-CoA synthase. The Biochemical Journal. 1989;258:563–568. doi: 10.1042/bj2580563. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Shi H, Kichaev G, Pasaniuc B. Contrasting the genetic architecture of 30 complex traits from summary association data. American Journal of Human Genetics. 2016;99:139–153. doi: 10.1016/j.ajhg.2016.05.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Shi H, Mancuso N, Spendlove S, Pasaniuc B. Local genetic correlation gives insights into the shared genetic architecture of complex traits. American Journal of Human Genetics. 2017;101:737–751. doi: 10.1016/j.ajhg.2017.09.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Shin SY, Fauman EB, Petersen AK, Krumsiek J, Santos R, Huang J, Arnold M, Erte I, Forgetta V, Yang TP, Walter K, Menni C, Chen L, Vasquez L, Valdes AM, Hyde CL, Wang V, Ziemek D, Roberts P, Xi L, Grundberg E, Waldenberger M, Richards JB, Mohney RP, Milburn MV, John SL, Trimmer J, Theis FJ, Overington JP, Suhre K, Brosnan MJ, Gieger C, Kastenmüller G, Spector TD, Soranzo N, Multiple Tissue Human Expression Resource MuTHER Consortium An atlas of genetic influences on human blood metabolites. Nature Genetics. 2014;46:543–550. doi: 10.1038/ng.2982. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Sinnott-Armstrong N, Naqvi S, Rivas M, Pritchard JK. GWAS of three molecular traits highlights core genes and pathways alongside a highly polygenic background. eLife. 2021a;10:e58615. doi: 10.7554/eLife.58615. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Sinnott-Armstrong N, Sousa IS, Laber S, Rendina-Ruedy E, Nitter Dankel SE, Ferreira T, Mellgren G, Karasik D, Rivas M, Pritchard J, Guntur AR, Cox RD, Lindgren CM, Hauner H, Sallari R, Rosen CJ, Hsu YH, Lander ES, Kiel DP, Claussnitzer M. A regulatory variant at 3q21.1 confers an increased pleiotropic risk for hyperglycemia and altered bone mineral density. Cell Metabolism. 2021b;33:615–628. doi: 10.1016/j.cmet.2021.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Sinnott-Armstrong N, Tanigawa Y, Amar D, Mars N, Benner C, Aguirre M, Venkataraman GR, Wainberg M, Ollila HM, Kiiskinen T, Havulinna AS, Pirruccello JP, Qian J, Shcherbina A, Rodriguez F, Assimes TL, Agarwala V, Tibshirani R, Hastie T, Ripatti S, Pritchard JK, Daly MJ, Rivas MA, FinnGen Genetics of 35 blood and urine biomarkers in the UK biobank. Nature Genetics. 2021c;53:185–194. doi: 10.1038/s41588-020-00757-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Sliz E, Sebert S, Würtz P, Kangas AJ, Soininen P, Lehtimäki T, Kähönen M, Viikari J, Männikkö M, Ala-Korpela M, Raitakari OT, Kettunen J. NAFLD risk alleles in PNPLA3, TM6SF2, GCKR and LYPLAL1 show divergent metabolic effects. Human Molecular Genetics. 2018;27:2214–2223. doi: 10.1093/hmg/ddy124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Smith LD, Garg U. In: Biomarkers in Inborn Errors of Metabolism. Garg U, Smith LD, editors. San Diego: Elsevier; 2017. Chapter 5 - urea cycle and other disorders of hyperammonemia; pp. 103–123. [Google Scholar]
  47. Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. Pleiotropy in complex traits: challenges and strategies. Nature Reviews. Genetics. 2013;14:483–495. doi: 10.1038/nrg3461. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Stelzer G, Rosen N, Plaschkes I, Zimmerman S, Twik M, Fishilevich S, Stein TI, Nudel R, Lieder I, Mazor Y, Kaplan S, Dahary D, Warshawsky D, Guan-Golan Y, Kohn A, Rappaport N, Safran M, Lancet D. The genecards suite: from gene data mining to disease genome sequence analyses. Current Protocols in Bioinformatics. 2016;54:1. doi: 10.1002/cpbi.5. [DOI] [PubMed] [Google Scholar]
  49. Stephens M. False discovery rates: a new deal. Biostatistics. 2017;18:275–294. doi: 10.1093/biostatistics/kxw041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS. 2005;102:15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Suhre K, Shin S-Y, Petersen A-K, Mohney RP, Meredith D, Wägele B, Altmaier E, CARDIoGRAM. Deloukas P, Erdmann J, Grundberg E, Hammond CJ, de Angelis MH, Kastenmüller G, Köttgen A, Kronenberg F, Mangino M, Meisinger C, Meitinger T, Mewes H-W, Milburn MV, Prehn C, Raffler J, Ried JS, Römisch-Margl W, Samani NJ, Small KS, Wichmann H-E, Zhai G, Illig T, Spector TD, Adamski J, Soranzo N, Gieger C. Human metabolic individuality in biomedical and pharmaceutical research. Nature. 2011;477:54–60. doi: 10.1038/nature10354. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Teslovich TM, Kim DS, Yin X, Stancáková A, Jackson AU, Wielscher M, Naj A, Perry JRB, Huyghe JR, Stringham HM, Davis JP, Raulerson CK, Welch RP, Fuchsberger C, Locke AE, Sim X, Chines PS, Narisu N, Kangas AJ, Soininen P, Ala-Korpela M, Gudnason V, Musani SK, Jarvelin MR, Schellenberg GD, Speliotes EK, Kuusisto J, Collins FS, Boehnke M, Laakso M, Mohlke KL, Genetics of Obesity-Related Liver Disease Consortium GOLD, The Alzheimer’s Disease Genetics Consortium ADGC, The DIAbetes Genetics Replication And Meta-analysis DIAGRAM Identification of seven novel loci associated with amino acid levels using single-variant and gene-based tests in 8545 finnish men from the METSIM study. Human Molecular Genetics. 2018;27:1664–1674. doi: 10.1093/hmg/ddy067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Tillin T, Hughes AD, Wang Q, Würtz P, Ala-Korpela M, Sattar N, Forouhi NG, Godsland IF, Eastwood SV, McKeigue PM, Chaturvedi N. Diabetes risk and amino acid profiles: cross-sectional and prospective analyses of ethnicity, amino acids and diabetes in a south asian and european cohort from the SABRE (southall and brent revisited) study. Diabetologia. 2015;58:968–979. doi: 10.1007/s00125-015-3517-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Wallace C. Statistical testing of shared genetic control for potentially related traits. Genetic Epidemiology. 2013;37:802–813. doi: 10.1002/gepi.21765. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Warren CR, O’Sullivan JF, Friesen M, Becker CE, Zhang X, Liu P, Wakabayashi Y, Morningstar JE, Shi X, Choi J, Xia F, Peters DT, Florido MHC, Tsankov AM, Duberow E, Comisar L, Shay J, Jiang X, Meissner A, Musunuru K, Kathiresan S, Daheron L, Zhu J, Gerszten RE, Deo RC, Vasan RS, O’Donnell CJ, Cowan CA. Induced pluripotent stem cell differentiation enables functional validation of GWAS variants in metabolic disease. Cell Stem Cell. 2017;20:547–557. doi: 10.1016/j.stem.2017.01.010. [DOI] [PubMed] [Google Scholar]
  56. Watt MJ, Miotto PM, De Nardo W, Montgomery MK. The liver as an endocrine organ-linking NAFLD and insulin resistance. Endocrine Reviews. 2019;40:1367–1393. doi: 10.1210/er.2019-00034. [DOI] [PubMed] [Google Scholar]
  57. Willer CJ, Schmidt EM, Sengupta S, Peloso GM, Gustafsson S, Kanoni S, Ganna A, Chen J, Buchkovich ML, Mora S, Beckmann JS, Bragg-Gresham JL, Chang H-Y, Demirkan A, Den Hertog HM, Do R, Donnelly LA, Ehret GB, Esko T, Feitosa MF, Ferreira T, Fischer K, Fontanillas P, Fraser RM, Freitag DF, Gurdasani D, Heikkilä K, Hyppönen E, Isaacs A, Jackson AU, Johansson Å, Johnson T, Kaakinen M, Kettunen J, Kleber ME, Li X, Luan J, Lyytikäinen L-P, Magnusson PKE, Mangino M, Mihailov E, Montasser ME, Müller-Nurasyid M, Nolte IM, O’Connell JR, Palmer CD, Perola M, Petersen A-K, Sanna S, Saxena R, Service SK, Shah S, Shungin D, Sidore C, Song C, Strawbridge RJ, Surakka I, Tanaka T, Teslovich TM, Thorleifsson G, Van den Herik EG, Voight BF, Volcik KA, Waite LL, Wong A, Wu Y, Zhang W, Absher D, Asiki G, Barroso I, Been LF, Bolton JL, Bonnycastle LL, Brambilla P, Burnett MS, Cesana G, Dimitriou M, Doney ASF, Döring A, Elliott P, Epstein SE, Ingi Eyjolfsson G, Gigante B, Goodarzi MO, Grallert H, Gravito ML, Groves CJ, Hallmans G, Hartikainen A-L, Hayward C, Hernandez D, Hicks AA, Holm H, Hung Y-J, Illig T, Jones MR, Kaleebu P, Kastelein JJP, Khaw K-T, Kim E, Klopp N, Komulainen P, Kumari M, Langenberg C, Lehtimäki T, Lin S-Y, Lindström J, Loos RJF, Mach F, McArdle WL, Meisinger C, Mitchell BD, Müller G, Nagaraja R, Narisu N, Nieminen TVM, Nsubuga RN, Olafsson I, Ong KK, Palotie A, Papamarkou T, Pomilla C, Pouta A, Rader DJ, Reilly MP, Ridker PM, Rivadeneira F, Rudan I, Ruokonen A, Samani N, Scharnagl H, Seeley J, Silander K, Stančáková A, Stirrups K, Swift AJ, Tiret L, Uitterlinden AG, van Pelt LJ, Vedantam S, Wainwright N, Wijmenga C, Wild SH, Willemsen G, Wilsgaard T, Wilson JF, Young EH, Zhao JH, Adair LS, Arveiler D, Assimes TL, Bandinelli S, Bennett F, Bochud M, Boehm BO, Boomsma DI, Borecki IB, Bornstein SR, Bovet P, Burnier M, Campbell H, Chakravarti A, Chambers JC, Chen Y-DI, Collins FS, Cooper RS, Danesh J, Dedoussis G, de Faire U, Feranil AB, Ferrières J, Ferrucci L, Freimer NB, Gieger C, Groop LC, Gudnason V, Gyllensten U, Hamsten A, Harris TB, Hingorani A, Hirschhorn JN, Hofman A, Hovingh GK, Hsiung CA, Humphries SE, Hunt SC, Hveem K, Iribarren C, Järvelin M-R, Jula A, Kähönen M, Kaprio J, Kesäniemi A, Kivimaki M, Kooner JS, Koudstaal PJ, Krauss RM, Kuh D, Kuusisto J, Kyvik KO, Laakso M, Lakka TA, Lind L, Lindgren CM, Martin NG, März W, McCarthy MI, McKenzie CA, Meneton P, Metspalu A, Moilanen L, Morris AD, Munroe PB, Njølstad I, Pedersen NL, Power C, Pramstaller PP, Price JF, Psaty BM, Quertermous T, Rauramaa R, Saleheen D, Salomaa V, Sanghera DK, Saramies J, Schwarz PEH, Sheu WH-H, Shuldiner AR, Siegbahn A, Spector TD, Stefansson K, Strachan DP, Tayo BO, Tremoli E, Tuomilehto J, Uusitupa M, van Duijn CM, Vollenweider P, Wallentin L, Wareham NJ, Whitfield JB, Wolffenbuttel BHR, Ordovas JM, Boerwinkle E, Palmer CNA, Thorsteinsdottir U, Chasman DI, Rotter JI, Franks PW, Ripatti S, Cupples LA, Sandhu MS, Rich SS, Boehnke M, Deloukas P, Kathiresan S, Mohlke KL, Ingelsson E, Abecasis GR, Global Lipids Genetics Consortium Discovery and refinement of loci associated with lipid levels. Nature Genetics. 2013;45:1274–1283. doi: 10.1038/ng.2797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Wittemans LBL, Lotta LA, Oliver-Williams C, Stewart ID, Surendran P, Karthikeyan S, Day FR, Koulman A, Imamura F, Zeng L, Erdmann J, Schunkert H, Khaw K-T, Griffin JL, Forouhi NG, Scott RA, Wood AM, Burgess S, Howson JMM, Danesh J, Wareham NJ, Butterworth AS, Langenberg C. Assessing the causal association of glycine with risk of cardio-metabolic diseases. Nature Communications. 2019;10:1060. doi: 10.1038/s41467-019-08936-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Woidy M, Muntau AC, Gersting SW. Inborn errors of metabolism and the human interactome: a systems medicine approach. Journal of Inherited Metabolic Disease. 2018;41:285–296. doi: 10.1007/s10545-018-0140-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Wojcik GL, Graff M, Nishimura KK, Tao R, Haessler J, Gignoux CR, Highland HM, Patel YM, Sorokin EP, Avery CL, Belbin GM, Bien SA, Cheng I, Cullina S, Hodonsky CJ, Hu Y, Huckins LM, Jeff J, Justice AE, Kocarnik JM, Lim U, Lin BM, Lu Y, Nelson SC, Park S-SL, Poisner H, Preuss MH, Richard MA, Schurmann C, Setiawan VW, Sockell A, Vahi K, Verbanck M, Vishnu A, Walker RW, Young KL, Zubair N, Acuña-Alonso V, Ambite JL, Barnes KC, Boerwinkle E, Bottinger EP, Bustamante CD, Caberto C, Canizales-Quinteros S, Conomos MP, Deelman E, Do R, Doheny K, Fernández-Rhodes L, Fornage M, Hailu B, Heiss G, Henn BM, Hindorff LA, Jackson RD, Laurie CA, Laurie CC, Li Y, Lin D-Y, Moreno-Estrada A, Nadkarni G, Norman PJ, Pooler LC, Reiner AP, Romm J, Sabatti C, Sandoval K, Sheng X, Stahl EA, Stram DO, Thornton TA, Wassel CL, Wilkens LR, Winkler CA, Yoneyama S, Buyske S, Haiman CA, Kooperberg C, Le Marchand L, Loos RJF, Matise TC, North KE, Peters U, Kenny EE, Carlson CS. Genetic analyses of diverse populations improves discovery for complex traits. Nature. 2019;570:514–518. doi: 10.1038/s41586-019-1310-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Wongkittichote P, Ah Mew N, Chapman KA. Propionyl-coa carboxylase - a review. Molecular Genetics and Metabolism. 2017;122:145–152. doi: 10.1016/j.ymgme.2017.10.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Wu N, Sarna LK, Hwang SY, Zhu Q, Wang P, Siow YL. Activation of 3-hydroxy-3-methylglutaryl coenzyme A (HMG-CoA) reductase during high fat diet feeding. Biochimica et Biophysica Acta. 2013;1832:1560–1568. doi: 10.1016/j.bbadis.2013.04.024. [DOI] [PubMed] [Google Scholar]
  63. Xu M, Bai X, Ai B, Zhang G, Song C, Zhao J, Wang Y, Wei L, Qian F, Li Y, Zhou X, Zhou L, Yang Y, Chen J, Liu J, Shang D, Wang X, Zhao Y, Huang X, Zheng Y, Zhang J, Wang Q, Li C. TF-marker: a comprehensive manually curated database for transcription factors and related markers in specific cell and tissue types in human. Nucleic Acids Research. 2022;50:D402–D412. doi: 10.1093/nar/gkab1114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW, Goddard ME, Visscher PM. Common snps explain a large proportion of the heritability for human height. Nature Genetics. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Yeung VA. University of Cambridge; 2021. Common ’Inborn Errors’ of Metabolism in the General Population thesis. [Google Scholar]

Editor's evaluation

Alexander Young 1

Smith and colleagues provide a framework for understanding a seemingly paradoxical observation in human genetics: two phenotypes may be closely correlated to each other, and the patterns of genetic variation that influence both phenotypes may be widely shared at the genome-wide level, but there are often specific genetic variants that show discordant patterns. Though the observations in this article are derived from analysis of metabolic phenotypes, this may have broader relevance to interpreting the results from disease-related genetic association studies, and shed light on the processes that connect different disease phenotypes.

Decision letter

Editor: Alexander Young1
Reviewed by: Alexander Young2, Mark I McCarthy3

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Integrative analysis of metabolite GWAS illuminates the molecular basis of pleiotropy and genetic correlation" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, including Alexander Young as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by George Perry as the Senior Editor. The following individual involved in the review of your submission has agreed to reveal their identity: Mark I McCarthy (Reviewer #2).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

The reviewers all agreed that your article presented interesting and potentially important results about how to link GWAS results to biochemical mechanisms. However, the reviewers have raised issues with the presentation that need to be addressed before the manuscript can be accepted. Furthermore, please address reviewer #3's comment about colocalization, and reviewer #1's comment about the ancestry-inclusive GWAS.

Reviewer #1 (Recommendations for the authors):

I am quite confused as to where the ancestry inclusive analysis was used? It is mentioned in the introduction as identifying additional hits, but are these used anywhere in the main text results? I also found the details on the ancestry inclusive analysis in the methods lacking. Was the same GWAS procedure used as for the European ancestry GWAS? Did the results look reasonable in terms of QQ-plots, etc.?

For the model proposed in Figure 5a, this seems quite different from the example given in Figure 4c for the variant affecting PDPR. For the variant affecting PDPR, it appears that this discordant effect is generated due to the variant affecting the conversion of pyruvate into Acetyl-CoA, and the corresponding increase in conversion of alanine into pyruvate, and a decrease in the conversion of BCAAs into Acetyl-CoA. So the key here seems to be the level of something (Acetyl-CoA) that is downstream of both of the pair of metabolites. The model in Figure 5a, in contrast, appears to be implying that discordant effects are generated by variants affecting the pathway leading to the conversion of one metabolite into another. So is the PDPR example different from the model proposed in Figure 5a? It does not appear to be consistent with the definition of 'between' biology as being a short pathway converting one metabolite to another, but perhaps I am misunderstanding.

I also found the description of the mechanism in Figure 4d a little confusing. isoleucine is described as 'downstream' of alanine (line 185), but the diagram in Figure 4d does not appear to show that.

For the discussion of the putative mechanism linking the PCCB variant with CAD: I think you need to be more careful about whether you are talking about correlation or causation when you talk about a 'pathogenic constellation of metabolite effects'. Many of the correlations between metabolites and CAD risk may not reflect the causal effects of those metabolites on CAD.

Figure 4a: what is the regression line? Did it adjust for sampling error for effects on both phenotypes?

Figure 4b: the position of the 'star' SLC36A2 I found a little ambiguous.

Figure 5 panels c and d: some of the confidence intervals go into negative territory. I don't think it's critical for the paper, but you could try alternative ways of constructing the confidence interval for a binomial proportion, such as a Wilson Score Interval, that should not have this issue.

Often point estimates are given along with P-values (e.g. for genetic correlations), but no confidence interval is given. Please also give confidence intervals.

Line 382: "The included participants are therefore meant to be representative of the 502,543 participants in the full cohort." How is this known?

Line 411: I think alanine is missing from the list

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Integrative analysis of metabolite GWAS illuminates the molecular basis of pleiotropy and genetic correlation" for further consideration by eLife. Your revised article has been evaluated by George Perry (Senior Editor) and a Reviewing Editor.

Please address the remaining comment from reviewer 1.

Reviewer #1 (Recommendations for the authors):

With respect to my comment about the definition of 'between' biology, I thank the authors for the inclusion of Figure 5 – Supplement 1, which is helpful in tying this to the example to Figure 4c. However, I'm still not convinced that the example given in Figure 5 – Supplement 1a is consistent with the definition in the main text: 'We consider biology "between" a given pair of metabolites as the shortest realistic biochemical path converting one to the other'. The example in Figure 5 – Supplement 1a appears to show two metabolites both being converted into something else, rather than one being converted into the other. In the language of directed acyclic graphs, Figure 5 Supplement 1a shows a 'collider' and Figure 5 Supplement 1b shows a 'chain', but the main text definition appears to only consider the 'chain' model. Some further clarification on this in the main text is warranted.

eLife. 2022 Sep 8;11:e79348. doi: 10.7554/eLife.79348.sa2

Author response


Essential revisions:

The reviewers all agreed that your article presented interesting and potentially important results about how to link GWAS results to biochemical mechanisms. However, the reviewers have raised issues with the presentation that need to be addressed before the manuscript can be accepted. Furthermore, please address reviewer #3's comment about colocalization, and reviewer #1's comment about the ancestry-inclusive GWAS.

Thank you for the feedback. We have now made these changes as well as the additional ones below. See responses to individual reviewer comments below and the document with tracked changes for more specifics.

Reviewer #1 (Recommendations for the authors):

I am quite confused as to where the ancestry inclusive analysis was used? It is mentioned in the introduction as identifying additional hits, but are these used anywhere in the main text results? I also found the details on the ancestry inclusive analysis in the methods lacking. Was the same GWAS procedure used as for the European ancestry GWAS? Did the results look reasonable in terms of QQ-plots, etc.?

Thank you for mentioning this. We have now made the following changes to improve the clarity around this analysis and answer your questions:

1. We have removed mention of the ancestry-inclusive analysis from the introduction and instead only discussed it during a dedicated paragraph in the main text results (lines 139-144). We made this change to make it clear that the majority of the analysis and results are from the European-ancestry only GWAS as these are less susceptible to confounding by population structure. However, we still ran and included the ancestry-inclusive GWAS to evaluate the utility of GWAS on diverse ancestries in discovering additional biologically important hits. In the main text we directly mention the number of additional hits identified by the ancestry-inclusive analysis and in a supplementary table (Supplementary File 5) we list the SNPs and their respective summary statistics from each of the GWAS for the associations at additional pathway-relevant genes.

2. We have now created a dedicated section in the methods called “Ancestry-Inclusive GWAS” to describe the details of the method (lines 544-552) and clarify that the method was identical to that of the European-ancestry only GWAS, other than the omission of a single filtering step (the step filtering individuals on the basis of self-identified race/ethnicity and ancestry PC outlier status).

3. We have now added an additional supplementary figure (Figure 3 —figure supplement 2) with QQ-plots for each of the 16 metabolite ancestry-inclusive GWAS. These results and the overlap in these summary statistics and hits with those from the European-ancestry only GWAS support that the ancestry-inclusive GWAS results are reasonable. Please note that for the ancestry-inclusive GWAS we opted to only focus on variant discovery and use the largest European ancestry for methods examining sub-threshold variants to minimize the risk of population structure confounding in those analyses (Meier et al. and Berg et al. height papers, eLife 2019).

For the model proposed in Figure 5a, this seems quite different from the example given in Figure 4c for the variant affecting PDPR. For the variant affecting PDPR, it appears that this discordant effect is generated due to the variant affecting the conversion of pyruvate into Acetyl-CoA, and the corresponding increase in conversion of alanine into pyruvate, and a decrease in the conversion of BCAAs into Acetyl-CoA. So the key here seems to be the level of something (Acetyl-CoA) that is downstream of both of the pair of metabolites. The model in Figure 5a, in contrast, appears to be implying that discordant effects are generated by variants affecting the pathway leading to the conversion of one metabolite into another. So is the PDPR example different from the model proposed in Figure 5a? It does not appear to be consistent with the definition of 'between' biology as being a short pathway converting one metabolite to another, but perhaps I am misunderstanding.

I also found the description of the mechanism in Figure 4d a little confusing. isoleucine is described as 'downstream' of alanine (line 185), but the diagram in Figure 4d does not appear to show that.

Thank you for your feedback on this. We agree that the “between” concept for more complicated examples like that in Figure 4c and 4d may be confusing to readers, especially those used to looking at gene regulatory networks where the concept of downstream/upstream means something different than that in pathways of chemical reactions. To address this, we have now added an additional supplementary figure (Figure 5 —figure supplement 1) with two additional example models. The first model is based on the metabolite pathway in Figure 4c and 4d and more explicitly defines the regions of between biology vs outside biology (Figure 5 —figure supplement 1a), as well as gives an example of a discordant (Figure 5 —figure supplement 1b) and concordant model (Figure 5 —figure supplement 1c). The second model is an example of a variant affecting a gene encoding a transporter and more explicitly defines the regions of between biology vs outside biology (Figure 5 —figure supplement 1d), as well as gives an example of a discordant (Figure 5 —figure supplement 1e) and concordant model (Figure 5 —figure supplement 1f) for that situation.

For the discussion of the putative mechanism linking the PCCB variant with CAD: I think you need to be more careful about whether you are talking about correlation or causation when you talk about a 'pathogenic constellation of metabolite effects'. Many of the correlations between metabolites and CAD risk may not reflect the causal effects of those metabolites on CAD.

This is an important point and definitely something we agree with. We have now modified the language in the “Using metabolites to understand the mechanism of a disease-associated variant” Results section to more clearly describe the analysis as a proof-of-concept analysis in terms of hypothesis generation and a new way of thinking about these kinds of data, as opposed to implying causality between the PCCB variant or relevant metabolites and CAD. We have also softened the language in the section, replacing many “would”s with “could”s. For example, the phrase you mentioned now reads “A potential mechanism that we hypothesized could result in this pathogenic constellation of metabolite effects is that the variant could decrease PCCB activity…”. We also added a sentence explicitly warning against causal conclusions being drawn: “… in vivo functional validation would be needed to draw causal conclusions about the effect of this variant on these metabolites and of these metabolites on CAD…”.

Figure 4a: what is the regression line? Did it adjust for sampling error for effects on both phenotypes?

Thank you for pointing this out. We have decided to remove the regression line from Figure 4a altogether because it may confuse readers and isn’t explicitly used in our results.

Figure 4b: the position of the 'star' SLC36A2 I found a little ambiguous.

Thank you for mentioning this, we have now modified the location of the star in Figure 4b to more clearly represent that it is a transporter of alanine.

Figure 5 panels c and d: some of the confidence intervals go into negative territory. I don't think it's critical for the paper, but you could try alternative ways of constructing the confidence interval for a binomial proportion, such as a Wilson Score Interval, that should not have this issue.

Thank you for the suggestion to use an alternative way of constructing the confidence interval for the data shown in Figure 5c and 5d. We have now updated the figure to show the 95% confidence interval calculated by Wilson Score Interval, which you were correct did not have the issue of going into negative values.

Often point estimates are given along with P-values (e.g. for genetic correlations), but no confidence interval is given. Please also give confidence intervals.

Thank you, we have now added SE for all the genetic correlation values and GWAS summary statistics, and 95% confidence intervals for the Fisher's exact tests and enrichment analyses mentioned in the text.

Line 382: "The included participants are therefore meant to be representative of the 502,543 participants in the full cohort." How is this known?

This is a good point that just because the participants were randomly selected to undergo metabolomic analysis, that does not mean the random sample will be representative for all relevant characteristics. We have now removed this sentence from the text. In case you are interested, a table comparing some basic characteristics of the ~120k randomly selected participants vs the full ~500k cohort is available in Nightingale’s recent medRxiv paper describing the measurements (Table 1 in https://www.medrxiv.org/content/10.1101/2022.06.13.22276332v2).

Line 411: I think alanine is missing from the list

Thank you for catching this. We have now added alanine to the list.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Please address the remaining comment from reviewer 1.

Reviewer #1 (Recommendations for the authors):

With respect to my comment about the definition of 'between' biology, I thank the authors for the inclusion of Figure 5 – Supplement 1, which is helpful in tying this to the example to Figure 4c. However, I'm still not convinced that the example given in Figure 5 – Supplement 1a is consistent with the definition in the main text: 'We consider biology "between" a given pair of metabolites as the shortest realistic biochemical path converting one to the other'. The example in Figure 5 – Supplement 1a appears to show two metabolites both being converted into something else, rather than one being converted into the other. In the language of directed acyclic graphs, Figure 5 Supplement 1a shows a 'collider' and Figure 5 Supplement 1b shows a 'chain', but the main text definition appears to only consider the 'chain' model. Some further clarification on this in the main text is warranted.

Thank you for this important feedback! We have now updated the definition in the main text (lines 201-205) to read “We consider biology "between" a given pair of metabolites as the shortest biochemical path connecting them, which can include a path converting one metabolite to the other, as well as other scenarios such as those involving colliders. This is because all biochemical reactions, including one-way reactions, can have bi-directional causal relationships due to Le Châtelier's Principle (see Methods for details; Supplementary Figure 5 —figure supplement 1).”

We also updated the expanded definition given in the methods section (lines 605-611) to “The ``between'' region for a given pair of metabolites was defined as the shortest realistic biochemical path connecting them, and any alternative paths of reasonably similar distance and likelihood. This region can include a path converting one metabolite to the other, as well as other scenarios such as those involving colliders. This is because all biochemical reactions, including one-way reactions, can have bi-directional causal relationships due to Le Châtelier's Principle. In other words, even in a one-way (irreversible) reaction, changes in product levels can induce changes in reactant levels in order to re-establish equilibrium.”

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Smith C, Sinnott-Armstrong N, Cichonska A, Julkunen H, Fauman E, Wurtz P, Pritchard J. 2022. Pleiotropy of UK Biobank Metabolites [preliminary] Dryad Digital Repository. [DOI]

    Supplementary Materials

    Supplementary file 1. Metabolite HESS heritabilities.

    Heritability results from HESS for each metabolite.

    elife-79348-supp1.tsv (1.8KB, tsv)
    Supplementary file 2. Gene function sources.

    Sources for biochemical characterization of genes mentioned in the variant vignettes.

    elife-79348-supp2.xlsx (10.9KB, xlsx)
    Supplementary file 3. Metabolite genome-wide association studies (GWAS) hits annotation.

    Annotation for the 213 metabolite GWAS hits, including the assigned gene, assigned gene type, variant classification, and nearest gene.

    elife-79348-supp3.tsv (16.6KB, tsv)
    Supplementary file 4. Gene biochemical groups.

    The number of significant (p<1e-4) metabolite associations each metabolite genome-wide association studies (GWAS) gene had for each biochemical group. For a given gene, only biochemical groups that had at least one significant metabolite association were listed.

    elife-79348-supp4.tsv (6.5KB, tsv)
    Supplementary file 5. Ancestry-inclusive genome-wide association studies (GWAS) hits.

    List of additional metabolite GWAS hits from the ancestry-inclusive analysis that were not present in the European-only GWAS results.

    elife-79348-supp5.tsv (1.5KB, tsv)
    Supplementary file 6. Discordant variant annotation.

    List of each discordant variant–metabolite association, including the variant annotations and relevant metabolite pair genetic correlation and genome-wide association studies (GWAS) summary statistics.

    elife-79348-supp6.tsv (4.5KB, tsv)
    Supplementary file 7. Local genetic correlation results.

    Combined results for different methods of calculating the local genetic correlation for different pathways for alanine and glutamine, demonstrating the consistency across the different approaches.

    elife-79348-supp7.tsv (1.9KB, tsv)
    Supplementary file 8. Metabolite associations with coronary artery disease (CAD).

    Literature evidence and citations for metabolite associations with CAD.

    elife-79348-supp8.xlsx (10.9KB, xlsx)
    Supplementary file 9. PCCB genome-wide association studies (GWAS) results.

    GWAS summary statistics for rs61791721 (PCCB) in the 16 metabolites.

    elife-79348-supp9.tsv (1.7KB, tsv)
    MDAR checklist

    Data Availability Statement

    The source data and analyzed data have been deposited in Dryad at https://doi.org/10.5061/dryad.79cnp5hxs. Code are available at the github link (https://github.com/courtrun/Pleiotropy-of-UKB-Metabolites copy archived at swh:1:rev:bff4f4d8fb0562f222b1f73560b23ca9b8f57047). The raw individual level data are available through application to UK Biobank.

    The following dataset was generated:

    Smith C, Sinnott-Armstrong N, Cichonska A, Julkunen H, Fauman E, Wurtz P, Pritchard J. 2022. Pleiotropy of UK Biobank Metabolites [preliminary] Dryad Digital Repository.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES