Abstract
Organisms maintain metabolic homeostasis through the combined functions of small molecule transporters and enzymes. While many of the metabolic components have been well-established, a substantial number remains without identified physiological substrates. To bridge this gap, we have leveraged large-scale plasma metabolome genome-wide association studies (GWAS) to develop a multiomic Gene-Metabolite Associations Prediction (GeneMAP) discovery platform. GeneMAP can generate accurate predictions and even pinpoint genes that are distant from the variants implicated by GWAS. In particular, our analysis identified SLC25A48 as a genetic determinant of plasma choline levels. Mechanistically, SLC25A48 loss strongly impairs mitochondrial choline import and synthesis of its downstream metabolite, betaine. Integrative rare variant and polygenic score analyses in UK Biobank provide strong evidence that the SLC25A48 causal effects on human disease may in part be mediated by the effects of choline. Altogether, our study provides a discovery platform for metabolic gene function and proposes SLC25A48 as a mitochondrial choline transporter.
Introduction
Metabolic reactions are central to life, playing critical roles in energy production, nutrient absorption, waste removal and biomass synthesis. Given these critical processes, approximately 20% of protein-coding genes are dedicated to maintaining the intracellular chemical landscape and include small molecule transporters and enzymes. While decades of research have revealed the functions of a substantial number of these genes, the exact molecular substrates for many metabolic components remain elusive1–8. Such gaps in our understanding arise partly from the diverse tissue-specific expression patterns, functional redundancies and metabolic promiscuity of these elements, complicating efforts to define their precise physiological roles9. Dysfunction in metabolic functions is commonly associated with a range of disorders, from congenital anomalies to neurodegeneration and cancer10–12. Recent large-scale Genome-Wide Association Studies (GWAS) of the human metabolome have revealed pervasive genetic influences underlying chemical individuality13–16. Although these studies mostly focus on the mediating effects of metabolites on health outcomes16–18, such datasets also provide an opportunity to understand gene function. Yet, the main methodological challenge for using GWAS has been linking the phenotype-associated genetic variants to the relevant gene19–21. To address this, we developed Gene-Metabolite Association Prediction (GeneMAP) Platform for discovery of metabolic gene function that leverages genetic models of gene expression and quantifies the gene-mediated genetic control of metabolites.
Results
Development of GeneMAP
To identify gene-metabolite relationships, we conducted Transcriptome-Wide Association Studies (TWAS)22,23 in two independent genomic studies of the human metabolome from CLSA15 and METSIM14 (Fig. 1a, Methods). The analysis yielded 526,934,749 gene-metabolite entries (26,956,587 unique gene-metabolite pairs) across all expression models that were matching between the two datasets (Fig. 1b). We then computed the statistic for each expression tissue model separately and assessed replicability of our findings24,25. Remarkably, the identified gene-metabolite associations were highly replicable with minimum > 0.8 (Fig. 1c). In agreement with the larger sample size for CLSA (n=8,299)15, was greater when CLSA was used as the discovery and METSIM as the validation dataset (Extended Data Fig. 1a). Using CLSA as the discovery dataset, we identified 102,058 significant gene-metabolite associations (q-value < 0.05) across all expression models, consisting of 12,041 unique pairs (Fig. 1d). Only 1,768 of the 12,041 pairs were identified by a multi-SNP gene-based assignment that leverages GCTA-COJO (conditional and joint) analysis in the CLSA study (Extended Data Fig. 1b-c)15. We next conducted Mendelian randomization to assess the causal effect of gene expression on metabolite level. This analysis demonstrated an enrichment for significant causal effects (Fig. 1e). Similarly, systematic colocalization (HyPrColoc) showed an enrichment for high posterior probability for shared causal variants among the gene-metabolite pairs (Extended Data Fig. 1d-e). Of note, the replicability weakly correlated with the GTEx tissue sample size23 used for the model training (Extended Data Fig. 1f). In contrast, the number of genes in the model and significant associations were well-correlated with tissue sample size23, demonstrating the importance of collecting large-scale datasets for enhanced detection power (Extended Data Fig. 1g-h). A proportion of gene-metabolite pairs were identified as significant only in a small number of tissue models, highlighting the utility of conducting multi-tissue model analysis (Extended Data Fig. 1i). The metabolite class categorization for the significant associations was consistent with their representations among the profiled compounds (Fig. 1f)14. Similar to CLSA single-SNP analysis where genomic loci displayed a low degree of pleiotropy, most of the genes were associated with few metabolites (Extended Data Fig. 1j)15.
Figure 1. Summary and characterization of the gene-metabolite associations.
a. Schematic for generating GeneMAP gene-metabolite associations.
b. Summary of the generated gene-metabolite entries across 49 JTI expression models from the blood metabolite GWAS summary statistics in CLSA, METSIM, and the common gene-metabolite entries between the two databases (overlap).
c. Bar graph displaying the replicability measure across JTI expression models when CLSA was used as the discovery dataset and METSIM as the validation dataset.
d. Pipeline for identification of significant gene-metabolite associations.
e. Histogram displaying the distribution of -log10(p-values) from the MR-Egger analysis on significant gene-metabolite pairs. P values (uncorrected) are from the MR-Egger analysis. P values < 2.2e-16, were approximated to 2.2e-16.
f. Pie chart showing categorization of the gene-metabolite associations based on the metabolite class.
g. Pie chart showing classification of the gene-metabolite associations based on gene subset (metabolic or other).
h. Boxplot showing proportion of significant gene-metabolite pairs among all and metabolic genes across JTI expression models. Lines connect the corresponding JTI expression model. Center line is the median, bounds of the box represent the interquartile range (IQR), all the individual datapoints representing each JTI expression model are plotted, and whiskers correspond to the values less extreme than 1.5xIQR from the box boundaries.
i. Representative QQ plot (whole blood JTI model) displaying the observed and expected distribution of -log10(METSIM p-value) for the gene-metabolite entries with metabolic genes that pass threshold p < 0.05 in CLSA. Expected distribution was calculated based on all the considered entries. The gray line is y = x, the orange dashed line is the Bonferroni threshold based on gene-metabolite associations with all genes. P values (uncorrected) are from the TWAS METSIM analysis.
j. Violin plot showing the distribution of the average AlphaMissense pathogenicity score for the genes in the non-metabolic protein coding (gray) and metabolic protein-coding genes (yellow) groups. Statistical significance was determined by two-tailed unpaired t test (h, j).
We next asked whether the subset of metabolic genes largely comprised of metabolic transporters and enzymes could be a strong determinant of blood chemical composition. Indeed, we observed that the metabolic genes were enriched among the significant associations compared to all entries by two-fold in every tissue model set (Fig. 1g-h). This is also confirmed by the p-value distribution of the associations involving metabolic genes (Fig. 1i). Interestingly, AlphaMissense analysis, which utilizes structural information from the AlphaFold methodology26 to achieve state-of-the-art performance in the classification of pathogenic variants, showed that the average predicted pathogenicity of missense variants in the metabolic genes was significantly higher compared to non-metabolic genes (Fig. 1j)27. In line with this observation, the most deleterious variants in metabolic genes were more pathogenic compared to genome background and non-metabolic genes in the CADD analysis (Extended Data Fig. 1k)28, confirming the critical role of metabolic genes in cellular and organismal processes. Altogether, the GeneMAP platform generated highly replicable and functionally informative results, with the potential to uncover novel biology.
Analysis of genetically-determined metabolic networks
Given the high replicability of the identified gene-metabolite associations, we next investigated biologically relevant complex structures such as Genetically Determined Metabolic Networks (GDMNs). For the nodes of these networks, we utilized 415 metabolites present among the significant associations with metabolic genes in whole blood in the discovery dataset. Our approach differs from conventional single-SNP GWAS methods through its focus on the genetically determined molecular traits. To build the networks, we computed metabolite-metabolite Pearson correlations and the corresponding empirical p-values from the selected gene-metabolite pairs’ effect sizes (β-values) (Fig. 2a). The correlations calculated from the CLSA and METSIM GeneMAP results were highly replicable (Fig. 2b). Furthermore, the pairs with empirical Bonferroni-adjusted p-value < 0.05 computed from CLSA were enriched in the significant part of the p-value distribution determined from the METSIM dataset (Fig. 2c).
Figure 2. Genetically Determined Metabolic Networks (GDMNs) are replicable and interpretable.
a. Schematic for the GDMNs construction pipeline for the whole blood JTI gene-metabolite associations.
b. Comparison of Pearson correlations for the metabolite-metabolite pairs computed from the METSIM and CLSA datasets. The density of the points is represented, the yellow line is the linear regression line.
c. Distribution of the empirical p-values in the validation (METSIM) dataset for all (gray) and only significant (Bonferroni-adjusted p-value < 0.05, yellow) in discovery metabolite-metabolite pairs. P values (uncorrected) are from the permutation test for each metabolite pair (Methods).
d. Comparison of the nodes and edges in GDMNs built from the CLSA and METSIM datasets. P values (Bonferroni-corrected) are from the permutation test.
e. Correlation of the node degrees in GDMNs built from the CLSA and METSIM datasets. The yellow line is the linear regression line with the gray band corresponding to the confidence interval.
f. Overlap between the Louvain communities in GDMNs from the CLSA and METSIM datasets.
g. Schematic for building the consensus GDMN.
h. Consensus GDMN with the Louvain community analysis.
i. Louvain community annotated as bilirubin from the consensus GDMN.
j. Metabolite-metabolite pairs involving the uncharacterized compound X-21796. Data are plotted as ranks vs. Pearson correlations for the given metabolite-metabolite pair computed from the discovery dataset (CLSA).
k. Louvain community annotated as glucuronides from the consensus GDMN.
l. Metabolite-metabolite pairs involving the uncharacterized compound X-11444. Data are plotted as ranks vs. Pearson correlations for the given metabolite-metabolite pair computed from discovery (CLSA).
m. Louvain community annotated as sulfate steroids from the consensus GDMN.
n. Metabolite-metabolite pairs involving the uncharacterized compound X-24947. Data are plotted as ranks vs. Pearson correlations for the given metabolite-metabolite pair computed from discovery (CLSA).
To construct the edges of the GDMNs, we filtered the metabolite-metabolite entries using empirical Bonferroni-corrected p-value < 0.05. GDMNs built from CLSA and METSIM resembled each other with almost all nodes and nearly half of the links being shared (Extended Data Fig. 2a-b, Fig. 2d). Both local (e.g., node degrees) and global (e.g., Louvain communities) network properties were also conserved (Fig. 2e-f). To generate a consensus network from the two independent GDMNs, we filtered the edges based on the empirical p-values (Bonferroni-adjusted p-value < 0.05) of the discovery and validation datasets. This consensus GDMN contained 300 nodes and 1,825 edges (Fig. 2g) with biochemically interpretable Louvain communities (Fig. 2h). As a biological network, the generated GDMN was scale-free based on the power-log distribution of node degree (Extended Data Fig. 2c)29. Moreover, it was resistant to random perturbations as indicated by a slower increase in the mean graph distance when an arbitrary node was removed as opposed to a hub (Extended Data Fig. 2d). Interestingly, this feature is in accordance with the properties of a previously reported metabolic network, which were speculated to underlie error tolerance to uniformly distributed mutations throughout evolution29. Remarkably, we could even infer the biochemical identity of some of the uncharacterized metabolites from the GDMN. For example, one of the five nodes of the bilirubin Louvain community was an uncharacterized metabolite X-21796 suggesting a potential connection to bilirubins (Fig. 2i). Consistent with this hypothesis, the top metabolite associations of X-21796 include bilirubin and biliverdin species implying that the metabolite is likely a bilirubin derivative (Fig. 2j). Similarly, we predict X-11444 and X-24947 to be related to glucuronide and sulfate steroids species (Fig. 2k-n). Overall, the analysis of the complex structures of the GDMNs further demonstrates the high quality and replicability of the GeneMAP resource.
We compared the GDMN with the network generated from the individual metabolite levels. In general, the GDMN and the metabolite-derived network (MDN) were highly concordant. We observed an enrichment for stronger Pearson correlations (more negative or more positive) for the edges in the GDMN in comparison with the MDN, suggesting higher detection power (Extended Data Fig. 2e). The observed concordance between the GDMN and the MDN was greater than expected by chance based on permutation analysis (Extended Data Fig. 2f). Permutation-derived MDNs were generated by considering the same number of nodes and edges (selecting the same number of top associations as in the GMDN) and performing Louvain community detection (Extended Data Fig. 2g). We observed significantly higher correlation of node degrees and greater Louvain community overlap of the GDMN with the actual MDN than with permutation-derived MDNs (Extended Data Fig. 2h-i).
SLC25A48 is a genetic determinant of plasma choline
Given GeneMAP’s high replicability, we next asked whether our approach can both validate known gene-metabolite associations and discover novel ones. To enrich for metabolite determinants, we prioritized 2,245 (out of 12,041 entries) metabolic genes found to be significant in at least two expression models (Fig. 3a). Most of these associations had only one gene hit for each metabolite per 10 Mb loci (Extended Fig. 3a). Then, we tested whether these associations reflected known metabolite-associated loci as mQTLs from conventional single-SNP analysis and previously proposed effector genes. The relatively small subset of the prioritized GeneMAP findings covered 70% of the CLSA mQTLs (< 0.5 Mb) and 117 out of 145 effector genes (Fig. 3b-c)15. Interestingly, 564 of the 2,245 GeneMAP gene-metabolite pairs were distal (> 0.5 Mb, Extended Data Fig. 3b for distribution of distances) from the CLSA-reported mQTLs (Fig. 3d)15. This indicates that our approach not only recovers significant genetic signals mapped to effector genes but also identifies associations that single-SNP methods are relatively underpowered to detect.
Figure 3. GeneMAP identifies SLC25A48 as a genetic determinant of blood choline levels.
a. Schematic showing filtering and prioritization of the gene-metabolite associations.
b. Pie chart displaying the proportion of the identified SNPs as causal in CLSA single-SNP analysis (Chen et al.) for the corresponding metabolite proximal (< 0.5 Mb) to the prioritized gene-metabolite GeneMAP associations. Only metabolites measured in both CLSA and METSIM were considered. Yellow represents identified by GeneMAP, gray – not.
c. Pie chart displaying the proportion of the identified gene effector-metabolite pairs in CLSA single-SNP analysis (Chen et al.) by GeneMAP. Yellow represents identified by GeneMAP, gray – not. Only metabolites measured in both CLSA and METSIM were considered.
d. Pie chart displaying the proportion of the 2,245 GeneMAP prioritized gene-metabolite proximal (yellow) or distal (gray) to identified SNPs for the corresponding metabolite in CLSA.
e. Annotation of the GeneMAP gene-metabolite associations proximal (< 0.5 Mb) and distal (> 0.5 Mb) to the significant SNPs identified in CLSA single-SNP analysis.
f. GeneMAP undescribed associations distal from the SNPs identified in CLSA single-SNP analysis. Data are plotted as ranks vs. -log10(p-value) for the given gene-metabolite association in Validation (METSIM). P values (uncorrected) are from the TWAS analysis on METSIM.
g. LocusZoom plot for METSIM choline association signals in SLC25A48 region. The sentinel SNP rs56382048 is shown as a purple diamond. P values are from the METSIM study GWAS summary statistics.
To further assess the quality of the GeneMAP gene-metabolite pairs, we utilized literature curation and several genomics databases (Supplementary Table 1)14–16,30,31. Among 1,681 proximal associations, we could find supporting evidence for 720 (which we term “Annotated”) in addition to the 117 effector-metabolite entries identified in the single-SNP analysis of CLSA15 (Fig. 3e). For 128 pairs denoted as “Unknown” (Methods) and potentially pointing to novel biology, no causal gene in the nearby genomic region could be nominated (Fig. 3e). The rest (“Miscellaneous”) included entries with uncharacterized and partially characterized metabolites, or with nearby nominated causal genes (Fig. 3e). Strikingly, we could provide supporting evidence for 252 out of 564 that were distal from the mQTL associations15 (Fig. 3e). This shows that GeneMAP captures current knowledge of gene-metabolite pairs, including those missed by a single-SNP (CLSA) analysis approach. To further demonstrate the validity of our approach, we integrated proposed causal genes from the GeneMAP associations into the GDMN (Extended Data Fig. 3c). For 1,065 out of 1,825 metabolite-metabolite links, we could identify the causal gene scoring for both metabolites in the pair. The network contained 2,510 gene-mediated metabolite connections within biochemically interpretable Louvain communities (Extended Data Fig. 3d). While most of the links represent known associations, nearly 20% reveal novel gene-mediated mechanisms underlying the metabolite-metabolite connections (Extended Data Fig. 3e).
To find a gene-metabolite pair for experimental validation, we focused on the list of “Unknown” entries and sought to find functionally informative gene-metabolite pairs among the distal entries. We therefore ranked the GeneMAP associations distal from the previously reported (CLSA) mQTLs15 by the minimum validation (METSIM) p-value across all models. This analysis yielded the SLC25A48-choline pair as the top GeneMAP hit among the “Unknown” distal associations (Fig. 3f). No variant within the SLC25A48 genomic locus or even within the entire chromosome reached the genome-wide significance threshold in the CLSA dataset for choline (Extended Data Fig. 3f). Consistent with this, the recent GCKD study found genome-wide, though not study-wide, significant SNP associations in the locus with choline in plasma (Extended Data Fig. 3g)13. While the METSIM dataset contained two significant mQTLs proposed through Bayesian fine mapping, the top-scoring variant is extremely rare in non-Finnish European subpopulations (Fig. 3g and Supplementary Table 2)14. The second variant is intronic relative to SLC25A48 with multiple eQTL target genes32. Using Mendelian randomization (MR-Egger regression and weighted median), we demonstrate the causal effect of SLC25A48 on choline (p-valueMR-Egger < 10−100, p-valuemedian-based = 0.001, Extended Data Fig. 3h-i). Altogether, these findings are consistent with a greater detection power from GeneMAP for identifying previously uncharacterized gene-metabolite associations, including the distal entries (Extended Data Fig. 4a-b).
SLC25A48 is necessary for mitochondrial choline import
To test our prediction and uncover the causal links behind the gene-metabolite relationships, we focused on the top scoring association between SLC25A48 and choline. Choline is involved in a wide range of physiological processes such as neurotransmission, methyl-group metabolism, and lipid metabolism33. In most cells, it has two major metabolic fates: in the cytosol, choline kinase alpha (CHKA) phosphorylates free choline to form phosphocholine, which is a precursor for the production of phosphatidylcholine, a major component of the cell membranes34,35. Additionally, free choline can enter mitochondria for a two-step oxidation to produce betaine, a key metabolite in one-carbon metabolism and an osmolyte34,36–38. Given that SLC25A48 is a member of SLC25A family that encompasses mitochondrial small molecule transporters, we hypothesized that SLC25A48 may regulate the availability of choline or its downstream metabolites in mitochondria.
To begin to study the function of SLC25A48 in cellular metabolism, we first asked whether loss of SLC25A48 leads to any change in the abundance of mitochondrial metabolites. We therefore generated HEK293T SLC25A48-knockout cells as well as those complemented with SLC25A48 cDNA (Extended Data Fig. 5a-c). These cells also express 3xHA-OMP25 Mito-Tag, which enables immunopurification of mitochondria for metabolic profiling39. Consistent with the predicted cellular localization40, immunofluorescence experiments confirmed mitochondrial localization of SLC25A48 (Fig. 4a). Using these engineered cells, we next performed metabolite profiling on immunopurified mitochondria by liquid chromatography-mass spectrometry (Fig. 4b, Extended Data Fig. 5d). While the abundance of most metabolites was similar, we observed a 4-fold drop of betaine and ~50-fold depletion of choline in mitochondria from SLC25A48-knockout cells compared to cDNA-complemented controls (Fig. 4c). Notably, SLC25A48 loss did not alter the cellular phosphocholine availability, suggesting that it does not impact cytosolic choline metabolism or import (Extended Data Fig. 5e). To formally test whether SLC25A48 specifically is involved in production of betaine, but not phosphocholine, we conducted a whole cell [1,2-13C2]Choline isotope tracing experiment (Fig. 4d). Indeed, HEK293T lacking SLC25A48 had a significantly reduced incorporation of isotope-labeled choline into betaine, but not phosphocholine in two different single clones as well as compared to the parental cell line (Fig. 4e-f, Extended Data Fig. 5f-i). Altogether, these results suggest that SLC25A48 is a determinant of de novo betaine synthesis from choline.
Figure 4. SLC25A48 loss impacts mitochondrial choline homeostasis.
a. Immunofluorescence analysis of SLC25A48 (Flag, green), HA (HA, red) and nucleus (DAPI staining, blue) in HEK293T cells. Scale bar is 50 μm.
b. Schematic for polar metabolomic profiling of immunopurified mitochondria from HEK293T SLC25A48-knockout cells expressing an empty vector control or 3xFLAG-SLC25A48 cDNA.
c. Volcano plot with -log10(q-value) vs. log2 fold change in metabolite abundance normalized to isotope labeled amino acid standards (ISTDs) and NAD level in immunopurified mitochondria from HEK293T SLC25A48-knockout cells expressing a vector control or SLC25A48 cDNA. The dotted line is the significance threshold of q-value < 0.05. Two-tailed unpaired t tests followed by Benjamini, Krieger, and Yekutieli multiple test correction were performed; n = 3 biological replicates.
d. Schematic for the whole cell [1,2-13C2]Choline tracing.
e. Betaine [M+2] abundance in HEK293T SLC25A48-knockout cells expressing an empty vector control or 3xFLAG-SLC25A48 cDNA after incubation with [1,2-13C2]Choline for the indicated time points. Data are individual points and normalized by ISTDs and cells seeded; n = 3 biological replicates. Line corresponds to the mean for each timepoint. Statistical significance determined by two-way RM ANOVA followed by post hoc Bonferroni multiple correction.
f. Phosphocholine [M+2] abundance in HEK293T SLC25A48-knockout cells expressing an empty vector control or 3xFLAG-SLC25A48 cDNA after incubation with [1,2-13C2]Choline for the indicated time points. Data are individual points and normalized by ISTDs and cells seeded; n = 3 biological replicates. Line corresponds to the mean for each timepoint. Statistical significance determined by two-way RM ANOVA followed by post hoc Bonferroni multiple correction. All the experiments were repeated at least twice.
Betaine production involves the two-step oxidation of choline by two mitochondrially-localized enzymes (choline dehydrogenase and aldehyde hydrogenase 7 family member 1)34,36–38. Given this, we considered the possibility that SLC25A48 mediates mitochondrial choline import and thereby impacts betaine production. To determine this, we performed a radioactive [Methyl-3H]Choline uptake assay on immunopurified mitochondria from SLC25A48-knockout and cDNA-complemented controls (Fig. 5a). During a 5-minute incubation, mitochondria without SLC25A48 took up 7-fold less [Methyl-3H]Choline compared to the protein-containing control (Fig. 5b). To further confirm this, we conducted [1,2-13C2]Choline isotope tracing using immunopurified mitochondria (Fig. 5c). Indeed, we observed a substantial drop in the labeled choline incorporation into betaine in mitochondria from SLC25A48-knockout cells compared to cDNA-expressing controls (Fig. 5d). Altogether, our results suggest that SLC25A48 is necessary for mitochondrial choline import and is a key determinant of de novo betaine synthesis in mammalian cells.
Figure 5. SL25A48 mediates mitochondrial choline import.
a. Schematic for the 5-min [3H]Choline radioactive uptake in immunopurified mitochondria from HEK293T SLC25A48-knockout cells expressing an empty vector control or 3xFLAG-SLC25A48 cDNA.
b. Uptake of [Methyl-3H]-Choline by immunopurified mitochondria from HEK293T SLC25A48-knockout cells expressing a vector control or 3xFLAG-SLC25A48 cDNA for 5 min. Data are mean ± standard deviation and normalized by ug of protein; n = 3 biological replicates. Statistical significance was determined by two-tailed unpaired t test.
c. Schematic for the [1,2-13C2]Choline tracing in immunopurified mitochondria from HEK293T SLC25A48-knockout cells expressing an empty vector control or 3xFLAG-SLC25A48 cDNA.
d. Betaine [M+2] abundance in immunopurified mitochondria from HEK293T SLC25A48-knockout cells expressing an empty vector control or 3xFLAG-SLC25A48 cDNA incubated in [1,2-13C2]Choline uptake for the indicated timepoints. Data are individual points normalized by NAD abundance; n = 3 biological replicates. Line corresponds to the mean for each timepoint. Statistical significance determined by two-way RM ANOVA followed by post hoc Bonferroni multiple correction. All the experiments were repeated at least twice.
Phenomic consequences of SLC25A48 dysfunction
Given the cellular role of SLC25A48 in choline metabolism, we sought to identify the gene’s role in human health and disease. To evaluate the impact of SLC25A48 on the medical phenome, we conducted comprehensive SKAT-O rare variant analysis of 1,356 phenotypes in 469,787 individuals (UK Biobank) using the SLC25A48 predicted loss-of-function (pLoF) variants41. Restricting to such variants that completely disable the gene may enable new insights into pathophysiological processes. The departure of the observed p-value distribution from the expected (theoretical null) one identified 8 highly significant disease associations (Extended Data Fig. 6a, Bonferroni-adjusted p < 0.05). Furthermore, we examined whether the SLC25A48 associations from the rare variant analysis would correlate with the corresponding associations with genetically determined blood choline level. We found a significant correlation (p-value = 0.03, Pearson correlation coefficient = 0.06) in association results between the rare variant analysis of SLC25A48 and the analysis of polygenic score (PS) trained in the CLSA choline GWAS. We integrated the SKAT-O and PS analyses to identify the choline-relevant phenomic consequences of SLC25A48 dysfunction (Fig. 6a). Among 8 significant rare-variant based associations, “Hereditary disturbances in tooth structure” was also significantly associated with genetically-determined choline (Fig. 6b). In addition, we found additional support for this disease association in the independent BioVU resource, Vanderbilt’s DNA biorepository linked to extensive clinical data (Extended Data Fig. 6b, Methods). Overall, the combination of SKAT-O rare variant testing and the choline PS analysis suggests that the potentially causal effects of SLC25A48 on human disease are not independent of the effects of choline.
Figure 6. Choline-related phenomic consequences of SLC25A48 dysfunction.
a. Pipeline for identification of choline-relevant phenomic consequences of SLC25A48 dysfunction by combining rare variant testing analysis on SLC25A48 pLoF variants and PS trained on choline data.
b. PheWAS plot displaying phenotypes grouped by categories (x-axis) and p-value from the rare variant testing analysis on SLC25A48 pLoF variants. The blue line is showing the threshold for p-value = 0.05, the red line for Bonferroni-adjusted p-value = 0.05. Large size dots indicate phenotypes with rare variant test (SKAT-O) Bonferroni-adjusted p-value < 0.05 and the largest dot corresponds to choline PS Bonferroni-adjusted p-value < 0.05 (threshold defined based on the number of SKAT-O-identified trait associations).
Discussion
In this study, we developed GeneMAP, a platform for predicting metabolic gene function, and demonstrated its ability to render accurate and replicable results. This platform uncovers functionally informative gene-metabolite pairs distal from the GWAS implicated variants, which single-SNP analysis approaches are relatively underpowered to detect. Importantly, in line with the robustness of our approach, we identified an orphan mitochondrial protein SLC25A48 as a mediator of choline mitochondrial import. Given the structural similarity of choline and ethanolamine, it is also possible that SLC25A48 can transport ethanolamine as has been recently shown in the case of FLVCR1, a plasma membrane choline transporter42,43. SLC25A48 is a member of the SLC25A family of solute carriers that are typically localized to the inner mitochondrial membrane, enabling the transport of essential nutrients from the cytosol. While a significant fraction of these transporters is associated with human disease, many still lack known substrates. Our data suggest that SLC25A48 is a major determinant of de novo synthesis of betaine, a critical metabolite for one-carbon metabolism and osmolyte produced from choline34,36–38. We identified significant disease associations with SLC25A48 in pLoF analysis in the UK Biobank and provided additional confirmation in the independent BioVU. Furthermore, the combined rare variant testing and polygenic score analysis, along with Mendelian Randomization, suggests that the causal effects of SLC25A48 on human disease may be mediated by the effects of choline. While our work was in review, two other preprints appeared that support our finding44,45. One of these preprints suggests that SLC25A48 is important for cold tolerance, thermogenesis, and mitochondrial respiration by using whole body knockout mouse model. Future studies involving tissue-specific knockout mouse models are required to study the physiological relevance of SLC25A48 in metabolism of organismal choline and other related metabolites. Additionally, further work is necessary to determine the cryo-EM structure of SLC25A48 as well as the mechanisms underlying the precise role of SLC25A48 in mitochondrial choline import.
Since many metabolic enzymes and transporters still do not have identified physiological substrates, GeneMAP provides a unique platform for deorphanizing these genes. This will open up an avenue for understanding the underlying basis of disease as well as development of therapeutics. Our approach combining PS and rare variant analysis to identify the phenomic consequences of the SLC25A48-choline pair can be broadly applied to other gene-metabolite relationships. Finally, using the GeneMAP results, we constructed genetically-determined metabolic networks (GDMNs) that can reveal insights into the genetic regulation of the metabolome and facilitate identification of uncharacterized metabolites. Methodologically, GDMNs can be generated using only GWAS summary statistics without requiring the individual metabolite level data. Altogether, our work highlights the utility of a genetics-anchored multiomic approach for studying the role of a large class of genes in cellular and organismal metabolism. All generated results are publicly available through an interactive online portal for the use of the scientific community.
Methods
GWAS of plasma metabolites.
We conducted the largest-to-date TWAS of the human metabolome. We utilized two large-scale blood metabolomic studies, leveraging GWAS summary statistics from the METSIM study14 (1391 metabolites in 6136 Finnish men) and the CLSA cohort15 (1,091 metabolite and 309 ratios, 8,299 individuals of European ancestry). Harmonization of metabolites was performed according to the study by Chen et al15.
Gene-mediated regulation of metabolites.
We applied S-PrediXcan22 independently to blood metabolites GWAS summary statistics from the METSIM and CLSA studies by using 49 JTI TWAS tissue models23. JTI utilizes transcriptome and functional genomics data across a broad collection of tissues (e.g., GTEx) to model gene expression as a function of genetic variants. JTI utilizes the shared regulatory architecture of gene expression across tissues to improve prediction of gene expression and reduces to conventional TWAS (PrediXcan19) in the case of highly tissue-specific regulation. JTI generates in silico TWAS models of gene expression, facilitating investigations into the genetic control of the molecular phenotype. The models can then be applied to GWAS data to identify gene-level associations with disease and quantitative phenotypes such as metabolite levels. To calculate the replicability statistic 24,25, we selected the gene-metabolite entries with q-value25 < 0.05 in a discovery dataset and filtered the corresponding entries in the validation dataset in a tissue-specific manner. To identify the significantly replicating gene-metabolite pairs, for each tissue model, we selected the gene-metabolite associations with q-value < 0.05 in the discovery dataset, and for the filtered subset of corresponding entries in the validation, computed q-value; the identified gene-metabolite pairs attaining q-value < 0.05 and having the same beta effect direction were chosen as significant.
Replicability statistic and tissue sample size.
The replicability statistic for the gene-metabolite relationships was calculated for each tissue. The number of identified significant gene-metabolite associations in each tissue was tested for its correlation (Pearson) with the tissue sample size used in the model training23.
Metabolic genes.
The metabolic gene list was compiled from the Metabolic Atlas31, a set of transporters, and a study published by Kenny et al6.
Pathogenicity scores.
We utilized AlphaMissense, which builds on AlphaFold to predict the pathogenicity of variants on proteins27. For each gene region (boundaries defined by Ensembl GRCh38.p13), we calculated the average pathogenicity score across the missense variants. CADD scores for all possible SNVs of GRCh38/hg38 were obtained from the Washington University Server28. For each gene region (boundaries defined by Ensembl GRCh38.p13), we selected the maximum CADD score. The annotation of the gene as protein coding and metabolic was based on the Ensembl GRCh38.p13 annotation and the metabolic gene list, respectively.
Genetically Determined Metabolic Networks (GDMNs).
Selection of associations.
To build the genetically determined metabolic networks, we utilized gene-metabolite associations identified using the JTI whole blood expression model. For the nodes of the networks, we selected the metabolites that were present in at least one significant association involving metabolic genes in the discovery dataset. For the gene selection, in the discovery and validation datasets analyzed independently, we identified the list of genes scoring for the selected metabolites.
Building the network.
For each dataset separately, based on the list of selected genes and metabolites, we built the gene-metabolite beta-effect matrix. For all the metabolite pairs, we computed Pearson correlations (link strength). To create a sparse, scale-free, and robust GDMNs with significant links, we computed the empirical p-value for each link by permutation test and then selected edges with Bonferroni-corrected p-value < 0.05.
Replicability assessment.
To investigate the replicability of the network properties, we compared the Pearson correlations of the metabolite pairs computed from the CLSA and METSIM datasets. Additionally, we computed and correlated the node degree (number of links connected to the node) between these datasets. To study global network properties, we performed Louvain community detection and tested the community replicability between CLSA and METSIM GDMNs. The communities from the two datasets were considered overlapping if the overlap score , where and are the sets of nodes for the communities under comparison) and Bonferroni-corrected Fisher’s exact test p-value < 0.00146.
Building the consensus GDMN.
GDMN built on the CLSA dataset was used as the discovery dataset. To identify the significant links, we selected associations with empirical p-value < 0.05 (Bonferroni-corrected) in the discovery dataset. Then, for the filtered subset of corresponding metabolite pairs in the validation dataset, Bonferroni multiple test correction of empirical p-values was performed. The entries with adjusted p-value < 0.05 were identified as significant links.
Scale-free property assessment.
By the definition of scale-free networks, the degree distribution should follow the power-law29. To test whether the observed GDMN is scale-free, we computed the distribution of the node degree and showed that it follows the power-law. For comparison, we also generated 20,000 random networks with the same number of nodes and links as in the unified GDMN. The degree distribution of the random networks did not follow the power-law.
Error tolerance.
One of the properties of a scale-free network is error tolerance. The chances of affecting the hubs are low, and disruption of a random node does not tangentially alter the global network characteristics. To assess the error tolerance, we sequentially removed 10 random nodes or hubs (from the most to less connected) and computed the mean distance between the nodes for the resulting networks. The removal of 10 random nodes was performed 500 times and the average was illustrated.
Integration of GeneMAP gene-metabolite pairs into the GDMN.
We leveraged the summary statistics from the identified GeneMAP associations to generate the network. Specifically, genes are used as input for the network if they score for both metabolites in the metabolite-metabolite link.
Metabolite-derived networks.
To build the MDN, we utilized the Pearson correlations computed from the individual metabolite levels in the CLSA study15. For the nodes of the networks, we selected the same metabolites as in the GDMN. For the edges, we chose the same number of highly negatively and positively correlated edges as in the GDMN. To assess the concordance of the MDN with the GDMN, we compared the Pearson correlations of the metabolite pairs. Additionally, we calculated the correlation of the node degree between MDN and GDMN. For the assessment of the global network properties, we performed Louvain community detection and tested the community replicability as described above. For each metric, we conducted permutation analysis to generate random networks and estimate the empirical p-value.
Selection of candidate causal genes.
We decided to focus on the associations involving metabolic genes due to their higher level of replicability. To narrow down to the list of the robust hits, we filtered the gene-metabolite pairs identified as significant in more than two expression models. Subsequently, to assess the power of GeneMAP to detect gene-metabolite relationships missed by a single-SNP analysis approach, we determined how many significant such pairs were distal (> 0.5 Mb) from significant SNPs from the CLSA GWAS.
Annotation of selected gene-metabolite pairs.
We also determined how many gene-metabolite pairs defined as “Effector” – that is, identified in CLSA through proximity to a significant SNP for the metabolite, biological relevance (i.e., participation in the biological processes of the metabolite), and colocalization with a significant cis-acting eQTL – were found with GeneMAP15,47. We classified the remaining significant gene-metabolite pairs as (1) “Annotated” (not identified in CLSA but very likely to be causal based on other literature), (2) “Unknown” (previously uncharacterized causal gene-metabolite relationship), and (3) “Miscellaneous” (associations involving uncharacterized, partially characterized compounds, or the ones with the potentially causal gene in proximity). For this classification task, we employed the annotations from CLSA15, Metabolic Atlas31, EGEA14,16,30, other published studies, and manual curation (Supplementary Table 1).
Gene-metabolite pair prioritization.
For each gene-metabolite pair, we selected the tissue model with the lowest p-value in the discovery dataset. We then ranked all gene-metabolite pairs by the p-value in the validation dataset. For the functional study, we focused on the gene-metabolite entries distal (> 0.5 Mb) from the significant SNPs identified in CLSA.
Mendelian randomization.
Mendelian randomization is an epidemiological approach that leverages genetic variants to detect causation (as opposed to correlation) in observational data (e.g., GWAS). To assess the causal effect of gene expression on a metabolite, we used MR-Egger regression for the significant gene-metabolite pairs. We illustrate the approach by testing the SLC25A48 causal effect on choline (CLSA choline GWAS summary statistics15 was used). We selected the variants ±1 MB upstream and downstream according to the SLC25A48 boundary (as defined by Ensembl GRCh38.p13). We then used the eQTLs from the GTEx Analysis Release V8 (dbGaP Accession phs000424.v8.p2)48 in this cis-region as genetic instruments in the Mendelian randomization analysis. In addition to MR-Egger, for the SLC25A48-choline association, we also conducted the weighted median estimator49. To minimize weak instrument bias in Mendelian randomization, we chose the strongest eQTLs for the gene50.
Colocalization analysis.
To test for the presence of shared causal variants for a given gene-metabolite pair, we conducted colocalization analysis on the metabolite trait and gene expression. Similar to Mendelian randomization analysis, we utilized the CLSA GWAS summary statistics15 and the strongest eQTLs for the gene in the cis-region (the GTEx Analysis Release V8 (dbGaP Accession phs000424.v8.p2))48 as input to the HyPrColoc pipeline51.
Loss-of-function variant analysis.
We leveraged the UK Biobank whole-exome sequencing dataset in 469,787 individuals to investigate the consequences of SLC25A48 on the medical phenome. We first mapped the ICD10 code clinical data to the phecode system52. The phecode system has well-defined exclusion criteria to prevent contamination by cases of the control samples. We then conducted the optimal test in an extended family of Sequence Kernel Association Tests (SKAT-O)53 to test the association between pLoF variants41 (minor allele frequency < 1%) in the gene and the disease phenotypes. In total, we tested 1,356 phenotypes in the association analysis, each with a minimum of 5 cases. We used age, sex, and the first 10 principal components (PCs) derived from the genotype data (quantifying genomic ancestry) as covariates. The associations meeting Bonferroni-adjusted p-value < 0.05 were deemed statistically significant. Finally, we conducted a comparison with the burden test (Supplementary Information). Of note, the phenotypes belonging to the categories “Other”, “Symptoms”, and “Injuries & Poisonings” were excluded as they are likely not genetically-determined.
Clinical impact of genetically determined choline level.
Using PRSice 2, we trained a genetic score (PS) of choline level and found p-value < 0.05 to be optimally predictive54. We then tested the resulting genetic model for choline level against the phecode-derived SLC25A48-associated disease phenotypes from the LoF analysis using the UK Biobank whole-genome sequencing data in 200,006 individuals. As before, we adjusted for age, sex, and the first 10 genotype-based PCs. For independent confirmation, we also trained independent PS models in the INTERVAL cohort dataset16,55.
Replication analysis.
We sought additional support for the identified SLC25A48 disease associations from the UK Biobank using BioVU, Vanderbilt’s DNA biorepository linked to electronic health records and longitudinal clinical data. Of the 300,000 DNA samples in the repository (which accrues 500 samples per week), nearly 120,000 -- 75% of European ancestry, 15% of African ancestry, and 3% Hispanic -- currently have whole-genome genetic information. We leveraged the European-ancestry sample set, based on genomic ancestry quantification (principal component analysis56), for the analysis of the effect of genetically-determined expression on disease.
Cell lines and reagents.
Human cell line HEK293T was purchased from the ATCC. The cell line was confirmed to have no mycoplasma contamination and the identity was verified by STR profiling. HEK293T cells were cultured in RPMI 1640 medium (Gibco) containing 2mM glutamine, 10% FBS, 1X penicillin and streptomycin (Gibco, #15140–122).
Powdered RPMI 1640 Medium w/o L-Glutamine, L-Methionine, Choline Chloride, Folic Acid and Vitamin B12 (US Biological Life Sciences #R8999–21) was used to make choline depleted medium. After reconstitution, the following metabolites were added to match standard RPMI 1640 medium concentrations: 2mM L-glutamine (Alfa Aesar #J60573), 100.7μM L-methionine (Alfa Aesar #J61904), 2.3mM folic acid (Alfa Aesar #J62937), and 3.7nM vitamin B12 (Alfa Aesar #A14894). The medium was supplemented with 10% dialyzed fetal bovine serum (dFBS) (Gibco #26400–044) and 1X penicillin streptomycin (Gibco #15140–122).
Antibodies against HA-Tag (3724S, 1:200 for immunofluorescence), citrate synthase (14309S, 1:1,000 for Western blot) were from Cell Signaling Technology; SLC25A12 (ab200201, 1:1,000 for Western blot) was from Abcam; GAPDH (GTX627408 1:1,000) and beta-tubulin (GTX101279, 1:2,000 for Western blot) were from GeneTex; CISD1 (16006–1-AP, 1:1,000 for Western blot) was from Proteintech, Flag-Tag (F1804, 1:2,000 for Western blot, 1:200 for immunofluorescence) was from Sigma-Aldrich. Anti-mouse IgG–HRP (7076S, 1:3,000 for Western blot) and anti-rabbit IgG–HRP (7074S, 1:3,000 for Western blot) were obtained from Cell Signaling. Antibodies for immunofluorescence staining were goat anti-mouse Alexa Fluor 488 (A11029, 1:300) and goat anti-rabbit Alexa Fluor 555 (A21428, 1:300) from Invitrogen.
Other reagents: anti-HA magnetic beads (88837, Thermo Scientific Pierce); DAPI (D1306, ThermoFisher Scientific); polybrene (H9268), puromycin (P8833) (Sigma); blasticidin (ant-bl-1, Invivogen).
Generation of knockout, knockdown and cDNA-overexpression cell lines.
sgRNA for SLC25A48 was synthesized by IDT and cloned into BsmBI linearized lentiCRISPR-v2 plasmid (Addgene, #75159) with T4 ligase (NEB #M0202). cDNA for 3xFLAG-SLC25A48 was cloned into BamHI-HF (NEB #R3136L) and EcoRI-HF (NEB #3101L) linearized pLV-EF1a-IRES-Puro (Addgene, #85132) plasmid by Gibson assembly. Mito-tag plasmids were based on pMXs-3XHA-EGFP-OMP25 (Addgene, #83356) and pMXs-3XMyc-EGFP-OMP25 (Addgene, #83355) with mCherry being replaced for the selection. The cDNAs for 3XHA-mCherry-OMP25 and 3XMyc-mCherry-OMP25 were cloned into pLV-EF1a-IRES-Blast backbone (Addgene, #85133) by using Gibson assembly. sgRNA- or cDNA-expressing vectors and lentiviral packaging vectors Delta-VPR and CMV VSV-G were transfected into HEK293T cells using XTremeGene 9 transfection reagent (Roche #636478700). The supernatant containing virus was collected 48 h after transfection and passed through a 0.45-μm filter. The transduction of target cells was performed in 6-well tissue culture plates by addition to the medium collected virus and 4 μg mL−1 of polybrene, followed by centrifugation at 1,126g for 80 min. The next day, media was changed to remove the virus. SLC25A48 KO cells were selected by fluorescence associated cell sorting (FACS) of the GFP+ cells (lentiCRISPR-v1(GFP) and consequent single-cell cloning. For Mito-tag cells, mCherry+ cells were selected by FACS (pLV-3XHA-mCherry-OMP25, pLV-3XMYC-mCherry-OMP25). As no validated antibody is available, sgSLC25A48-transduced cells were tested by ICE analysis (Synthego) of genomic DNA Sanger sequencing. Cells expressing 3xFLAG-SLC25A48 were selected with puromycin. For all the mentioned constructs the matching empty vectors (without insert) were used as a control. We validated all constructs by Sanger sequencing. The nucleotide sequences are provided in Supplementary Table 3.
Immunofluorescence staining and confocal microscopy for co-localization.
Sterile coverslips were coated with 50 μg/mL Poly-D-Lysine (ChemCruz #sc-136156) for 30 minutes at 37°C, and 500,000 cells per well of a 6-well plate were seeded and grown overnight. The next day, the cells were fixed in 4% PFA for 15 min, permeabilized with 1% triton X-100 in PBS, blocked in 1% bovine serum albumin (Sigma) for 30 min, and incubated at primary antibody overnight at 4°C. The following day, cells were washed in PBS twice and incubated in secondary antibodies for 30 minutes. After two washes in PBS, the coverslip was incubated in DAPI for 10 minutes, washed in PBS twice and mounted in ProLong Gold antifade mountant (Molecular Probes). For imaging, we used Nikon A1R MP multiphoton microscope with confocal modality with a Nikon Plan Apo γ 60X/1.40 oil immersion objective. No nuclear signal from flag-staining was observed since the cells lost Cas9 expression.
Immunoblotting.
Transmembrane buffer (10mM Tris-HCl pH 7.4, 150mM NaCl, 1mM EDTA, 1% Triton X-100, 2% SDS, 0.1% CHAPS) supplemented with protease inhibitors (EMD Millipore #535140 or Sigma-Aldrich #11836170001) was used to lyse the cells, followed by sonication. For mitochondrial immunopurification experiments, 1% Triton X-100 Buffer (50mM Tris-HCl pH 7.4, 150mM NaCl, 1mM EDTA, 1%Triton X-100) supplemented with protease inhibitors (Sigma-Aldrich #11836170001) was used for lysis. Lysates were centrifuged at maximum speed (> 14,000g) at 4°C, and supernatant was collected. Protein quantification was performed by using Pierce BCA Protein Assay Kit (Thermo Fisher #23227) with bovine serum albumin as a standard. 10–20% Tris-Glycine gels (Invitrogen #XP10205BOX) were used to resolve the samples. Transfer to PVDF membrane (EMD Millipore #1PVH00010) was performed with CAPS buffer (10mM CAPS, 10% ethanol). Membrane was blocked in skim milk (5% w/v) and incubated in primary antibodies at 4°C overnight. Secondary antibody incubation was in anti-mouse IgG-HRP (CST #7076) and anti-rabbit IgG-HRP (CST #7074) diluted at 1:3000 ratio in skim milk. Washes prior to secondary antibody incubation and blot development were performed with 0.1% Tween-20 Tris buffered saline. ECL chemiluminescence was used (Perkin Elmer #NEL105001EA or Cytiva #RPN2232) with autoradiography films (Thomas Scientific #1141J52) and SRX-101A Film Processor (Konica Minolta).
Whole cell [1,2-13C2]Choline isotope tracing.
For experiments with HEK293T SLC25A48-knockout cells expressing a control empty vector or 3xFLAG-SLC25A48 cDNA, 500,000 cells were seeded per well in 6-well plates in triplicate in choline depleted media. For experiments with HEK293T parental and SLC25A48-knockout cells, 250,000 cells were seeded in choline depleted media as the tracing was conducted for 24 hours. The following day, media was changed to the choline depleted RPMI with 21.5μM [1,2-13C2]Choline chloride (Cambridge Isotope Laboratories #CLM-548.01). At indicated timepoints, cells were washed twice with cold 0.9% NaCl (w/v), followed by metabolite extraction in ice cold 80:20 LC/MS grade methanol: water buffer containing 15N and 13C fully-labeled amino acid standards (Cambridge Isotope Laboratories #MSK-A2–1.2). Samples were vortexed for 10 minutes at 4°C and then centrifuged at maximum speed (> 14,000g) at 4°C for 10 minutes. Supernatants were dried under nitrogen and stored at −80°C. The natural isotope abundance correction was performed based on [M+2]/[M+0] ratios for the 0-minute timepoint, if available, or according to the theoretical distribution57.
Polar metabolite profiling.
The LC-MS analysis for the full MitoIP metabolome profiling and 24-hour whole cell tracing samples was conducted on a QExactive benchtop orbitrap mass spectrometer coupled to a Vanquish UPLC System (Thermo Fisher Scientific). Prior to injection, whole cell samples were resuspended in 60 μL of 50:50 Acetonitrile:Water, vortexed, and centrifuged at the maximum speed for 20 minutes (> 14,000g). External mass calibration was performed every 3 days using a standard calibration mixture. Polar extracts (5 μl) were injected onto a ZIC-pHILIC 150 × 2.1mm column (EMD Millipore) using a previously described LC-MS method6. The LC-MS metabolomics analysis for other whole cell tracing experiments as well as immunopurified mitochondria tracing was performed on an Orbitrap IQ-X Tribrid coupled to a Vanquish Horizon UHPLC system (Thermo Fisher Scientific). Prior to injection, whole cell samples were resuspended in 1000 μL of 50:50 Acetonitrile: Water, vortexed, and centrifuged at maximum speed for 20 minutes (> 14,000g). For chromatographic separation, samples were loaded on a SeQuant ZIC-pHILIC column (150 × 2.1 mm, 5 μm polymeric) and SeQuant ZIC-pHILIC Guard Kit (20 × 2.1 mm) at 40°C and eluted with the solvent system comprised mobile phase A (20mM ammonium carbonate + 0.1% ammonium hydroxide in water at pH 9.3) and mobile phase B (100% acetonitrile). The injection volume was set 5 μl, and samples were maintained at 4°C. The gradient (v/v) used was as follows: 0–22 min linear gradient from 90% to 40% B; 22–24 min: held at 40% B; 24–24.1 min: returned to 90% B; 24.1–30 min: equilibrated at 90% B at the flow rate of 150 μl/min. For the MS analysis, a HESI probe was operated in polarity switching mode, with the following source condition: spray voltage, 3.0 kV; sheath gas, 30 au; auxiliary gas, 7 au; the ion transfer tube temperature and the vaporizer temperature, 275°C. The following acquisition parameters were used for MS1 analysis: resolution, 120000; AGC target, 4e5; max injection time, 200 ms; m/z range, 55–815 Th. A pooled sample containing all biological samples was prepared and analyzed using a data-dependent acquisition method. The data-dependent MS/MS scans were acquired at a resolution of 15,000, 5e4 AGC target, auto max injection time mode, 1.6 Da isolation width, HCD activation, stepped normalized collision energy of 20, 30, 40 units, and loop count of 2. The instrument was externally calibrated using Pierce FlexMix Calibration Solution (Thermo Fisher Scientific), and the internal calibration feature (EASY-IC) was turned on. The analysis was performed using Skyline Daily v2258.
Immunopurified mitochondria metabolic profiling.
Mitochondrial immunopurification from HEK293T cells expressing either 3×HA–OMP25–mCherry (immunopurified mitochondria) or 3×Myc–OMP25–mCherry (background control) was conducted according to the protocol by Chen et al. with slight modifications39. In brief, cells were washed in a confluent 15 cm dish twice with cold saline (0.9% NaCl), collected in cold KPBS, and centrifuged at 1,000g for 1.5 minutes at 4 °C. The pellet was resuspended in 1 ml of KPBS, followed by homogenization with 30 strokes in a 2-mL homogenizer. Part of the homogenized sample was taken for the whole cell protein lysis and metabolic extraction. The remaining sample was incubated with 200 μl of anti-HA magnetic beads on a rotator shaker for 5 minutes at 4 °C, followed by three washes with ice-cold KPBS. Consequently, we used 10% of the bead volume for lysis with 1% Triton buffer (mitochondrial protein sample). The remaining 90% were extracted in 80% methanol containing heavy isotope labelled amino acid standards. The protein and metabolite samples were placed on a rotor for 10 minutes at 4°C, followed by 10 min centrifugation at maximum speed (> 14,000g) at 4°C. Metabolic samples were stored at −80°C.
Radioactive choline uptake assay in immunopurified mitochondria.
Mitochondria were immunopurified according to the above-described protocol with minor modifications. Immunopurified mitochondria bound to beads were incubated in mitochondrial uptake buffer (KPBS, 10mM HEPES, 0.5mM EGTA, 10.25uM total choline chloride, pH~7.35) for 5 minutes at room temperature. Of the total 10.75μM choline chloride, only 100nM were radioactive ([Methyl-3H]-Choline Chloride; Perkin Elmer #NET109001MC), the rest was unlabeled (Choline Chloride; MP Biomedicals #67–48-1). Uptake was stopped by addition of ice-cold KPBS, followed by three washes in cold KPBS, and extracted in 80% methanol. Protein and metabolite samples were placed on a rotor for 10 minutes at 4°C, followed by 10 min centrifugation at maximum speed (> 14,000g) at 4°C. The metabolite extract was transferred into scintillation vials with 5 mL Insta-Gel Plus scintillation cocktail (Perkin Elmer #601339). Radioactivity was measured with the TopCount scintillation counter (Perkin Elmer).
[1,2-13C2]Choline tracing in immunopurified mitochondria.
Mitochondrial Immunopurification was performed as outlined above. Immunopurified mitochondria were incubated in the mitochondrial uptake buffer for indicated timepoints at room temperature (KPBS, 10mM HEPES, 0.5mM EGTA, 21.5μM [1,2-13C2]Choline chloride (Cambridge Isotope Laboratories #CLM-548.01), pH~7.35). Uptake was stopped by addition of ice-cold KPBS, followed by three washes in cold KPBS, and extracted in 80% methanol. Protein and metabolite samples were placed on a rotor for 10 minutes at 4°C, followed by 10 min centrifugation at maximum speed (> 14,000g) at 4°C. The metabolic samples were stored at −80°C.
Statistical analysis and data visualization.
Statistical analysis and data visualization were performed in Prism9 (GraphPad Software) or R (4.3.1). The details of statistical analysis for each experiment are described in the figure legends. Genetic association signal plots were performed by using LocusZoom (Web version)59.
Extended Data
Extended Data Figure 1. Development of the GeneMAP pipeline.
a. Bar graph displaying the replicability measure across JTI expression models when CLSA was used as discovery and METSIM as validation (yellow) and vice versa (black).
b. Pie chart displaying the proportion of the CGTA-COJO-proposed gene-metabolite pairs (Chen et al.) by JTI-TWAS approach. Only metabolites measured in both CLSA and METSIM were considered. Orange represents identified by JTI-TWAS, black– not.
c. Pie chart displaying the proportion of the JTI-TWAS-proposed analysis gene-metabolite pairs by CGTA-COJO approach (Chen et al.). Only metabolites measured in both CLSA and METSIM were considered. Yellow represents identified by JTI-TWAS, gray – not.
d. Schematic for HyPrColoc analysis.
e. Histogram displaying the distribution of p-values from the HyPrColoc analysis on significant gene-metabolite pairs.
f. Replicability measure as a function of the tissue sample size in GTEx used for the JTI expression model training.
g. Number of genes in the model as a function of the tissue sample size in GTEx used for the JTI expression model training.
h. Number of identified significant gene-metabolite associations as a function of the tissue sample size in GTEx used for the JTI expression model training.
i. Histogram with the distribution of the number of JTI expression models in which the significant gene-metabolite associations are present.
j. Pleiotropicity of the genes. Histograms with the distribution of the number of metabolites associated with the gene among the significant gene-metabolite associations.
k. Violin plot showing the distribution of the top CADD scores for individual genes in the whole-genome (gray), the non-metabolic protein coding (yellow) and metabolic genes (orange) groups. Statistical significance was determined by two-tailed unpaired t test.
Extended Data Figure 2. Genetically Determined Metabolic Network Analysis.
a. GDMN built from CLSA dataset with the Louvain community analysis.
b. GDMN built from METSIM dataset with the Louvain community analysis.
c. Distribution of the node degrees for the observed consensus network (yellow points) and simulated random networks (n = 20,000, black points). The yellow line is regression line for the consensus network degree distribution with R2, and the gray band is a confidence interval.
d. Error-tolerance of the consensus GDMN represented as an average distance between the metabolic nodes in network when consequently removing 10 random nodes (black points, simulated 500 times) and 10 hubs (yellow points, ordered from the most to least connected nodes).
e. QQ-plot showing the distribution of Pearson correlation for metabolite-metabolite pairs computed from the genetically-determined component and metabolite level data.
f. Histogram displaying the distribution of Pearson correlation of metabolite-metabolite pairs between the CLSA-built GDMN and permuted MDNs (n=100). The red vertical line displays the actual value.
g. Metabolite-derived network built from CLSA dataset with the Louvain community analysis.
h. Histogram displaying the distribution of Pearson correlation for the node degrees between the CLSA-built GDMN and permuted MDNs (n=100). The red vertical line displays the actual value.
i. Histogram displaying the distribution of the number of overlapping Louvain communities between the CLSA-built GDMN and permuted MDNs (n=100). The red vertical line displays the actual value.
Extended Data Figure 3. SLC25A48 as a genetic determinant of blood choline levels.
a. Distribution of the number of genes scoring for the same metabolite per extended locus (10 Mb) among the GeneMAP pairs.
b. Number of discoveries as a function of distance from mQTLs.
c. Schematic for integrating GDMN with prioritized GeneMAP pairs.
d. Network integrating GDMN with prioritized GeneMAP pairs.
e. Pie plots displaying the number of annotated metabolite-metabolite associations (if any gene-mediated connection is annotated) and gene-mediate metabolite-metabolite associations.
f. LocusZoom plot for CLSA blood choline association signals in SLC25A48 region. P values are from the CLSA study GWAS summary statistics.
g. LocusZoom plot for GCKD blood choline association signals in SLC25A48 region. P values are from the GCKD study GWAS summary statistics.
h. Schematic for Mendelian Randomization analysis of SLC25A48 effect on plasma choline level.
i. Scatterplot showing the variants’ genetic association with SLC25A48 viewed as exposure (x-axis) and choline viewed as outcome (y-axis). Spearman correlation ρ = 0.696, p-value < 2.2e-16. P value is computed by asymptotic t approximation.
Extended Data Figure 4. Distal unknown gene-metabolite GeneMAP.
a. Bubble plot of unknown gene-metabolite associations identified by GeneMAP. Bubble color corresponds to the -log10(p-value) of the association between the indicated gene and metabolite in discovery (CLSA) dataset. Bubble size represents the -log10(p-value) of the association between the indicated gene and metabolite in validation (METSIM) dataset. P values (uncorrected) are from the TWAS METSIM and CLSA analysis.
b. Bipartite plot displaying the unknown gene-metabolite associations in the distal region.
Extended Data Figure 5. Biochemical characterization of SLC25A48 as a mediator of mitochondrial choline import.
a. ICE sequencing results for the generated single clones of HEK293T SLC25A48 knockout cells.
b. Schematic of 3xFLAG-SLC25A48 construct.
c. Immunoblot of SLC25A48 (Flag) in HEK293T knockout cells expressing an empty control vector or 3xFLAG-SLC25A48 cDNA. Tubulin was used as loading control.
d. Immunoblot of indicated proteins in input (whole cell) and immunopurified mitochondria from HEK293T SLC25A48 knockout cells expressing a vector control or 3xFLAG-SLC25A48 cDNA.
e. Volcano plot with -log10(q-value) vs. log2 fold change in metabolite abundance normalized to ISTDs and protein concentration in input (whole cell) from HEK293T SLC25A48-knockout cells expressing a vector control or SLC25A48 cDNA. The dotted line is the significance threshold of q-value < 0.05.
f. Betaine [M+2]/Phosphocholine [M+2] abundance ratio in HEK293T SLC25A48-knockout cells expressing an empty vector control or 3xFLAG-SLC25A48 cDNA after incubation with [1,2-13C2]Choline for the indicated time points. Data are individual points and normalized by ISTDs and cells seeded; n = 3 biological replicates. Line corresponds to the mean for each timepoint.
g. Barplot of betaine [M+2] abundance in HEK293T SLC25A48-knockout cells (Clone 2) expressing an empty vector control or 3xFLAG-SLC25A48 cDNA after incubation with [1,2-13C2]Choline for 2 hours. Data are mean ± standard deviation and normalized by ISTDs and cells seeded; n = 3 biological replicates.
h. Barplot of phosphocholine [M+2] abundance in HEK293T SLC25A48-knockout cells (Clone 2) expressing an empty vector control or 3xFLAG-SLC25A48 cDNA after incubation with [1,2-13C2]Choline for 2 hours. Data are mean ± standard deviation and normalized by ISTDs and cells seeded; n = 3 biological replicates.
i. Barplot showing the ratio of metabolic abundance of betaine [M+2] to phosphocholine [M+2] in HEK293 parental and SLC25A48 knockout cells after incubation with [1,2-13C2]Choline for 24 hours. Data are mean ratios ± standard deviation. The metabolite abundances were normalized by ISTDs and cells seeded. All the experiments were repeated at least twice. Two-tailed unpaired t tests followed by Benjamini, Krieger, and Yekutieli multiple test correction (e), two-way RM ANOVA followed by post hoc Bonferroni multiple correction (f), two-tailed unpaired t test (g, h, i).
Extended Data Figure 6. Phenomic consequences of SLC25A48 dysfunction.
a. QQ plot showing the observed and expected distribution of -log10(p-value) for the rare variant testing results on SLC25A48 pLoF variants. The gray line is y = x, the orange dashed line corresponds to FDR = 0.05, the black dashed line is Bonferroni threshold based on the number of considered phenotypes. P values from from the SKAT-O rare variant testing (Methods).
b. Schematic for replication of the rare variant results from UK Biobank using analysis of genetically-determined expression in the independent BioVU, Vanderbilt University’s DNA biobank. The association of SLC25A48 with “Hereditary disturbances in tooth structure” replicated (p-value = 0.008) under Bonferroni correction from the analysis of genetically-determined expression in BioVU.
Supplementary Material
Supplementary Table 1. Annotation of Gene-Metabolite Pairs.
Supplementary Table 2. Minor Allele Frequency of rs200164783.
Supplementary Table 3. Nucleotide Sequences.
Acknowledgements
We thank all members of the Birsoy and Gamazon Labs for suggestions and also members of the Rockefeller University Proteomics Resource Center and the Flow Cytometry Resource Center. Some figures use modified illustrations from Servier Medical Art licensed under a Creative Commons Attribution 3.0 Unported License. A.K. is supported by a Boehringer Ingelheim Fonds PhD Fellowship. G.U. is a Damon Runyon Fellow supported by the Damon Runyon Cancer Research Foundation (DRG-2431-21). Y.L. is supported by NIH/NCI 1F99CA284249-01. T.C.K. is supported by NIH/NIDDK (F32 DK127836), the Shapiro-Silverberg Fund for the Advancement of Translational Research, and a Merck Postdoctoral Fellowship at The Rockefeller University. K.B. is supported by the NIH/NIDDK (R01 DK123323-01) and a Mark Foundation Emerging Leader Award and is a Searle and Pew-Stewart Scholar. E.R.G. is supported by NIH/NHGRI (R01HG011138), NIH/NIGMS (R01GM140287), NIH/NIA (R56AG068026), NIH Office of the Director (U24OD035523), and a Genomic Innovator Award (R35HG010718). UK Biobank data were accessed under application number 94960. The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH and NINDS. We thank Dr. Edward Karoly (Metabolon) for help with testing the veracity of our predictions for uncharacterized metabolites.
Footnotes
Ethics declarations
Ethics
Our study complies with all relevant ethical regulations. The BioVU analysis is covered in the Vanderbilt University Medical Center Institutional Review Board (IRB) # 151187. As indicated in the IRB, the study does not qualify as “human subject” research per §46.102(f)(2). UK Biobank has approval from the North West Multi-centre Research Ethics Committee (REC reference 13/NW/0157) as a Research Tissue Bank (RTB) approval. This approval implies that investigators do not require separate ethical clearance and can operate under the RTB approval. We have leveraged summary statistics for CLSA, METSIM, and GCKD from the GWAS Catalog (see Data availability). CLSA was approved by the research ethics boards of the Jewish General Hospital, protocol number 2021–2762. METSIM was approved by the Ethics Committee at the University of Eastern Finland and the Institutional Review Board at the University of Michigan. GCKD was approved by the local ethics committees of the participating institutions (universities or medical faculties of Aachen, Berlin, Erlangen, Freiburg, Hannover, Heidelberg, Jena, München and Würzburg). Summary statistics for INTERVAL are also publicly available (see Data availability). The National Research Ethics Service approved the study (11/EE/0538).
Competing Interests
K.B. is scientific advisor to Nanocare Pharmaceuticals and Atavistik Bio. Other authors declare no competing interests.
Code Availability
Code used for analysis in this study is available on GitHub https://github.com/gamazonlab/GeneMAP/ as well as on Zenodo (https://zenodo.org/doi/10.5281/zenodo.11156916)60.
Data Availability
We provide open access to the generated results for academic use through an interactive webserver (https://birsoylab.rockefeller.edu/page/genemap/). In the study, we used the publicly available summary statistics for CLSA from the GWAS catalog (https://www.ebi.ac.uk/gwas/) with accession numbers GCST90199621–90201020, METSIM with accession numbers GCST90139389-GCST90139409, GCST90139411-GCST90139491, GCST90139493-GCST90139502, GCST90139504-GCST90139575, GCST90139577-GCST90139640, GCST90139642-GCST90139714, GCST90139716-GCST90139847, GCST90139849-GCST90139891, GCST90139893-GCST90140217, GCST90140220-GCST90140276, GCST90140280-GCST90140282, GCST90140285-GCST90140312, GCST90140314, GCST90140316-GCST90140323, GCST90140325, GCST90140327, GCST90140330-GCST90140339, GCST90140341, GCST90140345-GCST90140406, GCST90140408-GCST90140420, GCST90140422-GCST90140476, GCST90140480-GCST90140482, GCST90140484-GCST90140487, GCST90140489-GCST90140496, GCST90140498-GCST90140515, GCST90140517-GCST90140530, GCST90140532-GCST90140609, GCST90140611-GCST90140648, GCST90140650-GCST90140662, GCST90140664-GCST90140665, GCST90140667-GCST90140677, GCST90140679, GCST90140681, GCST90140683-GCST90140696, GCST90140698-GCST90140701, GCST90140704, GCST90140708-GCST90140711, GCST90140713, GCST90140715-GCST90140718, GCST90140725, GCST90140728-GCST90140730, GCST90140733, GCST90140735, GCST90140741, GCST90140743-GCST90140744, GCST90140751-GCST90140752, GCST90140757-GCST90140762, GCST90140769, GCST90140771-GCST90140777, GCST90140779, GCST90140781-GCST90140786, GCST90140790-GCST90140794, GCST90140796-GCST90140799, GCST90140801-GCST90140802, GCST90140806-GCST90140813, GCST90140819, GCST90140825, GCST90140830, GCST90140832, GCST90140834-GCST90140837, GCST90140844, GCST90140849-GCST90140853, GCST90140855-GCST90140856, GCST90140858-GCST90140874, GCST90140884-GCST90140890, GCST90140899-GCST90140901, GCST90140903-GCST90140906, GCST90140910, GCST90140912-GCST90140913, GCST90140915-GCST90140917, GCST90140924, GCST90140927-GCST90140932, INTERVAL (https://app.box.com/s/rf6p81j3o507e8c5saywtlc1p91f8po9/folder/193817919002), and GCKD from the GWAS catalog with accession numbers GCST90264176–GCST90266872. The BioVU results are made available in this study. All requests for raw (genotype and phenotype) data and materials in BioVU are reviewed by Vanderbilt University Medical Center to determine whether the request is subject to any intellectual property or confidentiality obligations. For example, patient-related data not included in the paper may be subject to patient confidentiality. Any such data and materials that can be shared will be released via a material transfer agreement. Additional information on data access can be found on the Vanderbilt Institute for Clinical and Translational Research (VICTR) website (https://victr.vumc.org/how-to-use-biovu/). Source data are provided with this paper. Additional details (code and source files) can be found on Zenodo (https://zenodo.org/doi/10.5281/zenodo.11156916)60.
References
- 1.Prosser GA, Larrouy-Maumus G & Carvalho L. P. S. de. Metabolomic strategies for the identification of new enzyme functions and metabolic pathways. EMBO Rep 15, 657 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Pizzagalli MD, Bensimon A & Superti-Furga G. A guide to plasma membrane solute carrier proteins. FEBS J 288, 2784 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Wiedmer T, Ingles-Prieto A, Goldmann U, Steppan CM & Superti-Furga G. Accelerating SLC Transporter Research: Streamlining Knowledge and Validated Tools. Clin Pharmacol Ther 112, 439–442 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.César-Razquin A et al. A Call for Systematic Research on Solute Carriers. Cell 162, 478–487 (2015). [DOI] [PubMed] [Google Scholar]
- 5.Shi X et al. Combinatorial GxGxE CRISPR screen identifies SLC25A39 in mitochondrial glutathione transport linking iron homeostasis to OXPHOS. Nature Communications 2022 13:1 13, 1–15 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Kenny TC et al. Integrative genetic analysis identifies FLVCR1 as a plasma-membrane choline transporter in mammals. Cell Metab 35, 1057–1071.e12 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Wang Y et al. SLC25A39 is necessary for mitochondrial glutathione import in mammalian cells. Nature 2021 599:7883 599, 136–140 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Unlu G et al. Metabolic-scale gene activation screens identify SLCO2B1 as a heme transporter that enhances cellular iron availability. Mol Cell 82, 2832–2843.e7 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Dvorak V et al. An Overview of Cell-Based Assay Platforms for the Solute Carrier Family of Transporters. Front Pharmacol 12, (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Barroso I & McCarthy MI. The genetic basis of metabolic disease. Cell 177, 146 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Rios S et al. Plasma metabolite profiles associated with the World Cancer Research Fund/American Institute for Cancer Research lifestyle score and future risk of cardiovascular disease and type 2 diabetes. Cardiovasc Diabetol 22, 252 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Wang F et al. Plasma metabolomic profiles associated with mortality and longevity in a prospective analysis of 13,512 individuals. Nature Communications 2023 14:1 14, 1–11 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Schlosser P et al. Genetic studies of paired metabolomes reveal enzymatic and transport processes at the interface of plasma and urine. Nature Genetics 2023 55:6 55, 995–1008 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Yin X et al. Genome-wide association studies of metabolites in Finnish men identify disease-relevant loci. Nature Communications 2022 13:1 13, 1–14 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Chen Y et al. Genomic atlas of the plasma metabolome prioritizes metabolites implicated in human diseases. Nature Genetics 2023 55:1 55, 44–53 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Surendran P et al. Rare and common genetic determinants of metabolic individuality and their effects on human health. Nature Medicine 2022 28:11 28, 2321–2332 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Yin X et al. Integrating transcriptomics, metabolomics, and GWAS helps reveal molecular mechanisms for metabolite levels and disease risk Graphical abstract Authors Integrating transcriptomics, metabolomics, and GWAS helps reveal molecular mechanisms for metabolite levels and disease risk. The American Journal of Human Genetics 109, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Lotta LA et al. A cross-platform approach identifies genetic regulators of human metabolism and health. Nature Genetics 2021 53:1 53, 54–64 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Gamazon ER et al. A gene-based association method for mapping traits using reference transcriptome data. Nat Genet 47, 1091 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Gusev A et al. Integrative approaches for large-scale transcriptome-wide association studies. Nature Genetics 2016 48:3 48, 245–252 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Porcu E et al. Mendelian randomization integrating GWAS and eQTL data reveals genetic determinants of complex and clinical traits. Nature Communications 2019 10:1 10, 1–12 (2019). [DOI] [PMC free article] [PubMed]
- 22.Barbeira AN et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nature Communications 2018 9:1 9, 1–20 (2018). [DOI] [PMC free article] [PubMed]
- 23.Zhou D et al. A unified framework for joint-tissue transcriptome-wide association and Mendelian randomization analysis. Nature Genetics 2020 52:11 52, 1239–1246 (2020). [DOI] [PMC free article] [PubMed]
- 24.Mogil LS et al. Genetic architecture of gene expression traits across diverse populations. PLoS Genet 14, (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Storey JD & Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci U S A 100, 9440 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Jumper J et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021 596:7873 596, 583–589 (2021). [DOI] [PMC free article] [PubMed]
- 27.Cheng J et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023). [DOI] [PubMed] [Google Scholar]
- 28.Rentzsch P, Schubach M, Shendure J & Kircher M. CADD-Splice-improving genome-wide variant effect prediction using deep learning-derived splice scores. Genome Med 13, (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Jeong H, Tombor B, Albert R, Oltval ZN & Barabásl AL. The large-scale organization of metabolic networks. Nature 2000 407:6804 407, 651–654 (2000). [DOI] [PubMed]
- 30.Stacey D et al. ProGeM: a framework for the prioritization of candidate causal genes at molecular quantitative trait loci. Nucleic Acids Res 47, e3–e3 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Li F, Chen Y, Anton M & Nielsen J. GotEnzymes: an extensive database of enzyme parameter predictions. Nucleic Acids Res 51, D583–D586 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Aguet F et al. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science (1979) 369, 1318–1330 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Zeisel SH & Da Costa KA. Choline: An Essential Nutrient for Public Health. Nutr Rev 67, 615 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Combs Jr., G. F. & McClung JP. Quasi-vitamins. The Vitamins 523–589 (2022) doi: 10.1016/B978-0-323-90473-5.00007-0. [DOI] [Google Scholar]
- 35.KENNEDY EP & WEISS SB. THE FUNCTION OF CYTIDINE COENZYMES IN THE BIOSYNTHESIS OF PHOSPHOLIPIDES. Journal of Biological Chemistry 222, 193–214 (1956). [PubMed] [Google Scholar]
- 36.Ducker GS & Rabinowitz JD. One-Carbon Metabolism in Health and Disease. Cell Metab 25, 27 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Dragolovich J. Dealing with salt stress in animal cells: The role and regulation of glycine betaine concentrations. Journal of Experimental Zoology 268, 139–144 (1994). [Google Scholar]
- 38.Ueland PM. Choline and betaine in health and disease. J Inherit Metab Dis 34, 3–15 (2011). [DOI] [PubMed] [Google Scholar]
- 39.Chen WW, Freinkman E, Wang T, Birsoy K & Sabatini DM. Absolute Quantification of Matrix Metabolites Reveals the Dynamics of Mitochondrial Metabolism. Cell 166, 1324–1337.e11 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Palmieri F. The mitochondrial transporter family SLC25: Identification, properties and physiopathology. Mol Aspects Med 34, 465–484 (2013). [DOI] [PubMed] [Google Scholar]
- 41.Chen S et al. A genome-wide mutational constraint map quantified from variation in 76,156 human genomes. bioRxiv 2022.03.20.485034 (2022) doi: 10.1101/2022.03.20.485034. [DOI]
- 42.Ri K et al. Structural and mechanistic insights into human choline and ethanolamine transport. bioRxiv 2023.09.15.557925 (2023) doi: 10.1101/2023.09.15.557925. [DOI]
- 43.Son Y, Kenny TC, Khan A, Birsoy K & Hite RK. Structural basis of lipid head group entry to the Kennedy pathway by FLVCR1. Nature 2024 1–7 (2024) doi: 10.1038/s41586-024-07374-4. [DOI] [PMC free article] [PubMed]
- 44.Verkerke ARP, Shi X, Abe I, Gerszten RE & Kajimura S. Mitochondrial choline import regulates purine nucleotide pools via SLC25A48. bioRxiv 2023.12.31.573776 (2024) doi: 10.1101/2023.12.31.573776. [DOI]
- 45.Patil S et al. SLC25A48 is a human mitochondrial choline transporter. medRxiv 2023.12.04.23299390 (2023) doi: 10.1101/2023.12.04.23299390. [DOI]
Methods-only References
- 46.Ardlie KG et al. The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans. Science (1979) 348, 648–660 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Giambartolomei C et al. Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics. PLoS Genet 10, e1004383 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Lonsdale J et al. The Genotype-Tissue Expression (GTEx) project. Nature Genetics 2013 45:6 45, 580–585 (2013). [DOI] [PMC free article] [PubMed]
- 49.Yavorska OO & Burgess S. MendelianRandomization: an R package for performing Mendelian randomization analyses using summarized data. Int J Epidemiol 46, 1734–1739 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Burgess S, Thompson SG, Chd C & Collaboration G. Avoiding bias from weak instruments in Mendelian randomization studies. Int J Epidemiol 40, 755–764 (2011). [DOI] [PubMed] [Google Scholar]
- 51.Foley CN et al. A fast and efficient colocalization algorithm for identifying shared genetic risk factors across multiple traits. Nature Communications 2021 12:1 12, 1–18 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Wu P et al. Mapping ICD-10 and ICD-10-CM Codes to Phecodes: Workflow Development and Initial Evaluation. JMIR Med Inform 7, (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Wu MC et al. Rare-Variant Association Testing for Sequencing Data with the Sequence Kernel Association Test. Am J Hum Genet 89, 82 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Choi SW & O’Reilly PF. PRSice-2: Polygenic Risk Score software for biobank-scale data. Gigascience 8, 1–6 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Xu Y et al. An atlas of genetic scores to predict multi-omic traits. Nature 2023 616:7955 616, 123–131 (2023). [DOI] [PMC free article] [PubMed]
- 56.Price AL et al. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics 2006 38:8 38, 904–909 (2006). [DOI] [PubMed] [Google Scholar]
- 57.Loos M, Gerber C, Corona F, Hollender J & Singer H. Accelerated isotope fine structure calculation using pruned transition trees. Anal Chem 87, 5738–5744 (2015). [DOI] [PubMed] [Google Scholar]
- 58.Pino LK et al. The Skyline ecosystem: Informatics for quantitative mass spectrometry proteomics. Mass Spectrom Rev 39, 229–244 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Boughton AP et al. LocusZoom.js: interactive and embeddable visualization of genetic association study results. Bioinformatics 37, 3017–3018 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Khan A et al. , GeneMAP. Zenodo; https://zenodo.org/records/11156917 (2024). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary Table 1. Annotation of Gene-Metabolite Pairs.
Supplementary Table 2. Minor Allele Frequency of rs200164783.
Supplementary Table 3. Nucleotide Sequences.
Data Availability Statement
We provide open access to the generated results for academic use through an interactive webserver (https://birsoylab.rockefeller.edu/page/genemap/). In the study, we used the publicly available summary statistics for CLSA from the GWAS catalog (https://www.ebi.ac.uk/gwas/) with accession numbers GCST90199621–90201020, METSIM with accession numbers GCST90139389-GCST90139409, GCST90139411-GCST90139491, GCST90139493-GCST90139502, GCST90139504-GCST90139575, GCST90139577-GCST90139640, GCST90139642-GCST90139714, GCST90139716-GCST90139847, GCST90139849-GCST90139891, GCST90139893-GCST90140217, GCST90140220-GCST90140276, GCST90140280-GCST90140282, GCST90140285-GCST90140312, GCST90140314, GCST90140316-GCST90140323, GCST90140325, GCST90140327, GCST90140330-GCST90140339, GCST90140341, GCST90140345-GCST90140406, GCST90140408-GCST90140420, GCST90140422-GCST90140476, GCST90140480-GCST90140482, GCST90140484-GCST90140487, GCST90140489-GCST90140496, GCST90140498-GCST90140515, GCST90140517-GCST90140530, GCST90140532-GCST90140609, GCST90140611-GCST90140648, GCST90140650-GCST90140662, GCST90140664-GCST90140665, GCST90140667-GCST90140677, GCST90140679, GCST90140681, GCST90140683-GCST90140696, GCST90140698-GCST90140701, GCST90140704, GCST90140708-GCST90140711, GCST90140713, GCST90140715-GCST90140718, GCST90140725, GCST90140728-GCST90140730, GCST90140733, GCST90140735, GCST90140741, GCST90140743-GCST90140744, GCST90140751-GCST90140752, GCST90140757-GCST90140762, GCST90140769, GCST90140771-GCST90140777, GCST90140779, GCST90140781-GCST90140786, GCST90140790-GCST90140794, GCST90140796-GCST90140799, GCST90140801-GCST90140802, GCST90140806-GCST90140813, GCST90140819, GCST90140825, GCST90140830, GCST90140832, GCST90140834-GCST90140837, GCST90140844, GCST90140849-GCST90140853, GCST90140855-GCST90140856, GCST90140858-GCST90140874, GCST90140884-GCST90140890, GCST90140899-GCST90140901, GCST90140903-GCST90140906, GCST90140910, GCST90140912-GCST90140913, GCST90140915-GCST90140917, GCST90140924, GCST90140927-GCST90140932, INTERVAL (https://app.box.com/s/rf6p81j3o507e8c5saywtlc1p91f8po9/folder/193817919002), and GCKD from the GWAS catalog with accession numbers GCST90264176–GCST90266872. The BioVU results are made available in this study. All requests for raw (genotype and phenotype) data and materials in BioVU are reviewed by Vanderbilt University Medical Center to determine whether the request is subject to any intellectual property or confidentiality obligations. For example, patient-related data not included in the paper may be subject to patient confidentiality. Any such data and materials that can be shared will be released via a material transfer agreement. Additional information on data access can be found on the Vanderbilt Institute for Clinical and Translational Research (VICTR) website (https://victr.vumc.org/how-to-use-biovu/). Source data are provided with this paper. Additional details (code and source files) can be found on Zenodo (https://zenodo.org/doi/10.5281/zenodo.11156916)60.