Abstract
Integrating molecular traits into genetic studies enhances our understanding of how DNA variation influences complex clinical and physiological phenotypes. In a recent article, Benson and colleagues apply this systems genetics approach with proteomics and metabolomics data in plasma from humans to identify and validate several previously unrecognized causal protein-to-metabolite associations.
Keywords: proteomics, metabolomics, systems genetics, Mendelian randomization
In the era of genome-wide association studies (GWAS), the integration of molecular traits obtained from transcriptomic, proteomic, and metabolomic analyses has emerged as a promising avenue for elucidating the mechanisms underlying genetic associations with complex clinical and physiological phenotypes [1]. Furthermore, the large amount of publicly available data from GWAS analyses has allowed causal inferences to be made between ‘omics traits’ and clinical phenotypes using Mendelian randomization (MR) approaches. In this regard, MR leverages Mendel’s laws of inheritance to treat DNA variants as instrumental variables that mimic the randomization of individuals in clinical trials to two “treatment” groups [2]. Thus, genetic variants that have been associated with biomarkers or intermediate traits of interest (exposures) are then tested for association with disease outcomes (ideally in independent datasets) to infer causal relationships. The best applications of MR have been with respect to classic cardiometabolic risk factors. For example, MR analyses have provided evidence that elevated blood pressure, LDL, and triglycerides are causal drivers of atherosclerosis, which is entirely consistent with the results of clinical trials that have shown targeting of these risk factors modulates disease risk [3].
In a recently published study in Cell Metabolism, Benson et al. used a four-step systems genetics strategy to identify and validate causal protein-to-metabolite associations [4] (Figure). The authors first applied targeted aptamer-based proteomics and mass spectrometry-based metabolomics analyses to profile the plasmas of 3,626 subjects from the Jackson Heart Study (JHS), the Multi-ethnic Study of Atherosclerosis (MESA), and the HERITAGE Family Study cohorts (Figure). Pairwise correlations between 1,302 proteins and 365 metabolites available in the three cohorts revealed nearly 172,000 significant associations (based on a false discovery rate-adjusted threshold) (Figure). The majority of these associations remained significant after adjustment for BMI, kidney function, and use of lipid-lowering, anti-hypertensive, and anti-diabetic medications, suggesting minimal effects of potential confounding factors. Furthermore, 535 proteins exhibited correlations that were enriched for several classes of metabolites, including lipids, amino acids, and carbohydrates. Several of these significant correlations also reflected well-known biological relationships, such as between apolipoprotein E (APOE) and lipid species. However, previously unrecognized associations were also identified, including those between lipids and cathepsin proteases and serpin peptidase inhibitors.
Figure. Four-step systems genetics strategy used to identify and validate causal protein-to-metabolite associations.
Targeted proteomics and metabolomics was first carried out in plasma of 3,626 multi-ancestry subjects (Step 1). Pairwise correlations were calculated between 1,302 proteins and 365 metabolites available and revealed nearly 172,000 significant associations (Step 2). Genetic determinants of protein levels identified by GWAS were used in Mendelian randomization (MR) analyses to provide evidence for 224 putative protein-to-metabolite causal associations (Step 3). Most significant predicted causal protein-metabolite associations were experimentally validated in vivo using knockout mouse models (Step 4).
Since correlations by themselves do not provide information regarding directionality, the authors next applied one-sample MR in which genetic instruments for the exposures (proteins) and outcomes (metabolites) are based on effect sizes derived from the same individuals. In addition, a fundamental assumption in MR analyses is that the genetic instrument(s) is only associated with the exposure (i.e., a protein) being tested and does not exhibit pleiotropic associations with other traits (i.e., other proteins). To minimize this possibility, the authors focused on their MR analyses on 547 of the 1,302 proteins for which cis variants could be used as instrumental variables. These analyses yielded 224 putative protein-to-metabolite causal associations between 52 proteins and 146 metabolites (Figure). Notable examples of such associations were between APOE and lipids and fat soluble vitamins, PCSK9 and carnitine species, and CD36, a known receptor long-chain fatty acids, and omega-3 and omega-6 polyunsaturated fatty acids. However, evidence was also obtained for causal associations between CD36 and other lipid species not previously associated with this receptor, including glycerophospholipids, acyl carnitines, sphingomyelins, ceramides, and steroids. Finally, the authors sought to experimentally validate associations predicted to be causal by MR using mouse models (Figure) and focused on APOE, CD36, and ACY1, which were three proteins that exhibited the strongest associations in the MR analyses. A comparison of the metabolome of mice deficient for Apoe, Cd36, or Acy1 with wildtype controls validated ~50% of the dozens of predicted causal protein-to-metabolite associations with APOE, CD36, and ACY1 that were predicted by MR analyses in humans.
Many prior studies that have evaluated causal associations between proteins or metabolites and clinical outcomes using systems genetics strategies (see refs. [5-8] for examples). However, Benson et al. integrated systems genetics with tests of causality and in vivo validation to systematically explore causal relationships between molecular traits themselves. In doing so, the authors highlight the power of systems genetics approaches for refining known protein-lipid, protein-amino acid, and protein-nucleic acid associations as well as revealing novel biological connections. For example, evidence was provided that CD36 and PCSK9 may have broader roles in the metabolism of lipids and carnitines, respectively, than previously appreciated. Other strengths of the study included using the same proteomics and metabolomics platforms for profiling of all subjects, who were of diverse ancestries and from different geographical locations. The large catalog of associations generated by this study are also all publicly available through a Shiny app, thus providing many opportunities for others to explore the results and initiate new lines of investigation.
In addition to the strengths of the study, there are additional areas in which the study can be expanded upon. For example, the use of targeted proteomic and metabolomics approaches would not identify other putative causal molecular relationships that likely exist and could be detected using broader, untargeted analyses. Second, a recent comparison revealed that cis protein quantitative trait loci (pQTL) were identified for only ~40% of proteins measured by aptamer-based technology, as was used by Benson et al., compared to ~70% of the proteins quantitated on the proximity extension assay proteomics platform [9]. This discrepancy was also part of the rationale for only utilizing cis variants in the MR analyses since proteins for which cis pQTLs were identified would be those whose levels were most accurately measured by aptamers. In addition, MR analyses were not carried out in the opposite direction to identify causal metabolite-to-protein associations. However, such bidirectional MR analyses would not be as straightforward since the specificity of a genetic instrument for an exposure (a metabolite in this case) may not be as biologically evident as with cis variant for a protein where the link is more obvious. Finally, two-sample MR analyses is another widely used method for testing causality to reduce bias since instrumental variables for the exposures and outcomes are derived from different subjects (but similar populations). Although two-sample MR was not carried out by Benson et al., this approach could be applied as part of independent replication analyses using publicly available datasets [5-8].
Acknowledgements
Work in the authors’ laboratories is supported, in part, through NIH grants R01HL148110, R01HL168493, R01AG059690, R01HL144651, R01HL148577, R01DK117850, R44DK136405, and U54HL170326.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Declaration of Interests
The authors declare no competing interests.
References
- 1.Allayee H. et al. (2023) Systems genetics approaches for understanding complex traits with relevance for human disease. eLife 12, e91004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Sanderson E. et al. (2022) Mendelian randomization. Nat Rev Methods Primers 2, 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Jansen H. et al. (2014) Mendelian randomization studies in coronary artery disease. Eur Heart J 35, 1917–24 [DOI] [PubMed] [Google Scholar]
- 4.Benson MD et al. (2023) Protein-metabolite association studies identify novel proteomic determinants of metabolite levels in human plasma. Cell Metab 35, 1646–1660.e3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Yin X. et al. (2022) Genome-wide association studies of metabolites in Finnish men identify disease-relevant loci. Nat Commun 13, 1644. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Surendran P. et al. (2022) Rare and common genetic determinants of metabolic individuality and their effects on human health. Nat Med 28, 2321–2332 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Ferkingstad E. et al. (2021) Large-scale integration of the plasma proteome with genetics and disease. Nat Genet 53, 1712–1721. [DOI] [PubMed] [Google Scholar]
- 8.Sun BB et al. (2023) Plasma proteomic associations with genetics and health in the UK Biobank. Nature 622, 329–338 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Eldjarn GH et al. (2023) Large-scale plasma proteomics comparisons through genetics and disease associations. Nature 622, 348–358 [DOI] [PMC free article] [PubMed] [Google Scholar]

