Abstract
Variation in the blood metabolome is intimately related to human health. However, few details are known about the interplay between genetics and the microbiome in explaining this variation on a metabolite-by-metabolite level. Here, we perform analyses of variance for each of the 930 blood metabolites robustly detected across a cohort of 1,569 individuals with paired genomic and microbiome data, while controlling for a number of relevant covariates. We find that 595 (64%) of these blood metabolites are significantly associated with either host genetics or the gut microbiome, with 69% of these associations driven solely by the microbiome, 15% driven solely by genetics, and 16% under hybrid genome-microbiome control. Additionally, interaction effects, where a metabolite-microbe association is specific to a particular genetic background, are quite common, albeit with modest effect sizes. This knowledge will help to guide targeted interventions designed to alter the composition of the human blood metabolome.
Editor summary:
Diener and Dai et al analyse blood metabolites from 1569 individuals and identify metabolites associated with the microbiome, host genetics, or under hybrid genetic-microbiome control
Introduction
The human blood metabolome is shaped by a combination of intrinsic and extrinsic forces and constitutes the primary resource pool for human metabolism. While the composition of the plasma metabolite pool is strongly driven by diet, lifestyle, and the ecology of the gut microbiota, the fate of individual metabolites in the blood is often tightly regulated by host genetics 1.
Genetic variants are known to alter the human blood metabolome in several disease-relevant contexts. For example, certain deleterious alleles impact cholesterol levels and contribute to hypercholesterolemia, while other well-known alleles drive phenylalanine accumulation and lead to phenylketonuria 2,3. The majority of genetic variants associated with blood metabolite levels affect either enzymes or solute carriers, thus directly influencing an individual’s ability to produce, consume, secrete, or absorb small molecules 4,5. Many of the currently identified metabolome-associated genetic variants are pleiotropic 6, which indicates a vast interconnectedness of the metabolome across different systems of the body.
Recent studies have identified the human gut microbiome as a major determinant of blood metabolite variability 1,7. In this prior work, host genetics and gut microbiome composition were found to associate with plasma metabolite levels in a largely orthogonal manner, which is supported by the observation that, for the most part, host genetics and the gut microbiota do not tend to associate strongly with one another 8, even though a number of clear examples of genome-microbiome associations have been shown 9.
Despite these broad, multivariate regression results showing a global correspondence between genomics, the microbiome, and the metabolome, little is known about how this maps onto individual blood metabolite levels. While one might expect that the variation in metabolites produced by bacteria in the human gut is mostly governed by the microbiota and that metabolites specific to human metabolic processes are associated more with host genetics, there are examples, as in the case of microbe-host co-metabolites like conjugated secondary bile acids 10, where the story becomes more complicated.
Intestinal signaling, such as the activation of FXR and TGR5 receptors in the human gut, can regulate glucose, insulin, cholesterol, and bile acid homeostasis 11,12. Furthermore, many microbiome-derived metabolites are modified by hepatic enzymes and converted into a variety of conjugated compounds, such as hippurate or polyamines 13,14. These multi-layer filters on microbe-host co-metabolism make it challenging to map blood metabolites to potential microbial precursors. For example, blood cholesterol levels are affected by host genetic variants, but they can also be regulated through intestinal signaling, involving the production of secondary bile acids by gut commensals, or through gut microbial cholesterol dehydrogenases that funnel host and dietary cholesterol into fecal coprostanol 15,16. Thus, even though host genetics and the microbiome appear to have largely orthogonal effects on the blood metabolome as a whole, they nonetheless act on an overlapping set of metabolites, potentially explaining independent components of the variance of these compounds.
Additionally, even in the absence of strong heritability of human gut commensals, there are key examples where genetics and the microbiome may interact in particular disease conditions, such as cystic fibrosis 17. This raises the question of whether there are instances where microbiome-metabolome associations are modulated by the genetic background of the host in a similar manner as other gene-environment interactions 18, where a genetic variant may augment or attenuate an individual’s risk of developing a particular disease when exposed to a specific environmental risk factor.
Here, we studied variability in plasma metabolite abundances in a cohort of 1,569 healthy individuals from the U.S. We find extensive interplay between the genetic and gut microbial determinants of individual metabolite levels in the blood, which provides deep insights into the microbe-host co-metabolisms that govern the composition of the human blood metabolome.
Results
Metabolites associated with genetics and the microbiome
We performed genome-metabolite and microbiome-metabolite association tests for a healthy cohort of 1,569 individuals with paired blood and fecal samples taken within 30 days of each other. The cohort consisted of individuals in the U.S., mostly from the Pacific West, that had enrolled in a scientific wellness program of the former company, Arivale, Inc., between 2016 and 2019. These individuals were generally healthy, between 18 and 87 years old, predominantly female (63.5%) and with a mean±SD body mass index (BMI) of 28.2±6.6. Metabolite abundances across the data set were first adjusted for common confounders and the residuals were used for further downstream analyses (927 metabolites associated with at least one confounding variable, FDR-corrected p<0.05, mean R2=0.12±0.06). The cohort was split randomly into a training cohort of 1,000 individuals and a validation cohort of 569 individuals. Microbiome-metabolite and genome-metabolite association analyses were performed within the training cohort and the resulting training cohort coefficients were fixed in the validation cohort models to generate out-of-sample R2 estimates (Fig. 1A).
Figure 1. Study design and cross-sectional variances in metabolites explained by host genetics, gut bacterial genera, or both.

(A) The study comprised a metabolome-genome-wide association analysis and a metabolome-microbiome-wide association analysis (regressions on centered log-ratio transformed bacterial genus-level abundances) performed in the Arivale cohort. Explained variance for a specific blood metabolite can be partitioned into host genetic and microbiome associated components. The cohort was split into a training cohort of 1,000 individuals and a validation cohort of 569 individuals. (B) Fraction of variance (R2) explained in the training cohort by host genetic or microbiome features across the 595 metabolites with significant associations with either the genome or the microbiome (FDR-corrected p<0.05). Specific sub-groups are annotated in the plot, including metabolites only associated with host genetics, metabolites only associated with gut genera, or metabolites significantly associated with both (i.e., hybrid). (C) Percentage of the 595 significant metabolites associated only with host genetics, only with the microbiome, or with both (hybrid) in the training cohort. (D) Fraction of variance (R2) explained in the validation cohort. Coefficients were fixed to those from the training cohort to estimate out-of-sample R2 values. (E) Percentage of the 595 significant metabolites associated only with host genetics, only with the microbiome, or with both (hybrid) in the validation cohort. (F) Comparison of the R2 values obtained in this study with the R2 values obtained in Bar, Korem et al. 1 for 159 metabolites identified as significant in both studies. The solid line denotes an ordinary least squares regression line, and the error bands denotes the 95% confidence interval of the regression. P-values were obtained by a Pearson correlation test and “r” denotes the Pearson product-moment correlation coefficient.
To identify associations between host genetics and individual circulating blood metabolite levels, we performed genome-wide association analyses on 7.68 million common variants (Minor Allele Frequency ≥ 1%) in the 1,000 individuals from the training cohort for each of the 930 blood metabolites that passed our prevalence threshold. A total of 182 metabolites of the 930 tested (19.6%) were associated with one or more of the 297 independent lead variants that passed the genome-wide significance threshold (p < 5.37 × 10−11; see Fig. S1). Of these 297 variants, 79 were in intergenic regions and 218 variants mapped to 123 genes, including those associated with inherited metabolic disorders such as ACADS, ALMS1, and GCKR. Potential loss-of-function consequences were predicted for two variants: stop gained for rs183603441 in HYKK, and a splice donor variant for rs61825638 in KMO. Missense mutations were predicted for 25 variants, affecting 39 metabolites. However, the majority of the metabolite-associated variants found within genes were synonymous.
In particular, lactosylceramide lactosyl-N-nervonoyl-sphingosine (d18:1/24:1) was associated with lead variants rs3752246 (P = 9.8 × 10−14) and rs3752248 (P = 5.6 × 10−13) located in ABCA7. rs3752246 is a missense variant and has been previously identified as a risk variant for late-onset Alzheimer’s disease 19,20. Prior studies have identified altered levels of ceramides in the blood and brain of individuals with Alzheimer’s Disease, with the d18:1/24:1 ceramide showing a strong and robust association with Alzheimer’s Disease in a recent meta-analysis 21. Our results further underscore a possible shared genetic architecture underlying lactosyl-N-nervonoyl-sphingosine (d18:1/24:1) levels and late-onset Alzheimer’s disease.
Associations with more than one metabolite were identified for 70 of the 297 significant lead variants (23.6%), revealing the extent of pleiotropy in our cohort. The effect sizes of pleiotropic variants are generally higher than non-pleiotropic variants (median effect size difference = 0.08, P = 0.04; two-sided Mann-Whitney U-test), while the distributions of minor allele frequencies were similar between the two types of variants. Overall, the 70 pleiotropic lead variants were associated with 92 metabolites (52.2% of all significant genetically-associated metabolites). Four variants (rs4149056, rs1047891, rs148982377, and rs45446698), of which rs4149056 and rs1047891 are missense variants, were each associated with more than 8 metabolites. rs4149056 in SLCO1B1 (solute transporter in liver) was associated with 8 metabolites, including primary and secondary bile acids, conjugates of polyunsaturated fatty acids, and free fatty acids. rs1047891 in CPS1 (mitochondrial enzyme involved in the urea cycle) was found to be associated with 8 metabolites, many of which were conjugated to glycine and glutamine moieties and were highly correlated. rs148982377 in ZNF789 and rs45446698 in ZSCAN25 (transcription factors) were associated with the same set of 9 steroid hormone metabolites, primarily conjugates of DHEA and androsterone.
Analysis at the gene level reveals a further degree of pleiotropy as well as polygenicity. Using annotations from dbSNP, we identified associations between genes and metabolites. 182 metabolites were associated with at least one gene, with 123 significant genes identified in total. Of these, 35 genes (27.8%) were associated with more than one metabolite. Individual pleiotropic genes tend to be associated with metabolites with similar biochemical properties, providing insights into unidentified metabolites. For example, the gene cluster of UGT1A1, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8, UGT1A9, and UGT1A10 was associated with 10 metabolites, including bilirubin, biliverdin, and five unidentified compounds. These genes encode UDP-glucuronosyltransferase (UGT) enzymes, which metabolize bilirubin 22. Consistent with our results, Metabolon later determined that these five unidentified compounds were degradation products of bilirubin. In addition to pleiotropy, we also observe instances of polygenic associations for several metabolites. For example, ethylmalonate, a breakdown product of butyrate, was significantly associated with six different genes: ACADS, SPPL3, CABP1, OASL, DYNLL1, and MLEC.
In order to identify associations between the gut microbiome and circulating blood metabolite levels, we performed regressions using centered-log-ratio (CLR) transformed bacterial genus-level abundances as independent variables, while correcting for sex, age, sex-age interactions, BMI, and genetic kinship (i.e., the first 5 principal components of the genetic distance matrix). A majority of the tested metabolites had significant associations with at least one bacterial genus in the gut microbiome (508/930 = 54.6%, FDR-corrected p<0.05). Here, the average fraction of explained variance (R2) was lower than genetic associations (mean R2 of 0.04 and 0.11 for microbiome and host genetic features in the training set, respectively; 0.02 vs. 0.09 in the validation set, respectively). However, these R2 values are not directly comparable across genomic and microbiome models due to the different significance thresholds used (Bonferroni for the GWAS and FDR-correction for the microbiome).
The blood metabolites with the largest fraction of variance explained by microbial features were dominated by compounds involved in bacterial metabolism of aromatic and phenolic compounds, such as cinnamoylglycine, hydrocinnamate, and hippurate (R2>0.25 in the training cohort and R2>0.15 in the validation cohort). Catabolism of phenylalanine and phenylacetate to cinnamate and benzoate is exclusive to the microbiome and is conspicuously absent in germ-free mice 23,24. Hippurate is formed in the liver by conjugating benzoate to a glycine and blood levels of hippurate have been positively associated with gut microbiome alpha-diversity 7,25 and with overall metabolic health 26.
Additionally, we found several products of bacterial protein fermentation, such as p-cresol derivatives and the tryptophan breakdown product indolepropionate, associated with the gut microbiome (Tables S1–S2). Furthermore, we found that all secondary bile acids identified in plasma were significantly associated with the gut microbiome (FDR-corrected p<0.05), which is expected because primarily bile acids are deconjugated into secondary bile acids by bile salt hydrolases expressed by gut commensal bacteria 27. Here, isoursodeoxycholate had the largest fraction of variance explained by gut microbiome composition (R2 of 0.36 and 0.3 in the training and validation cohorts, respectively).
R2 estimates from the current study showed good agreement with prior estimates from an Israeli cohort1 for microbiome-metabolite associations (Pearson r=0.5, Fig. 1F) and excellent agreement for genome-metabolite associations (Pearson r=0.9, Fig. 1F). The largest differences in microbiome-metabolite variances across cohorts were observed mostly for xanthine derivatives and may be due to different microbiome sequencing strategies, confounder-adjustments, different statistical modeling approaches, or differences between serum (Bar, Korem, et al.) and plasma (this study).
Additive contributions of genetic and microbial factors
We identified a total of 595 out of 930 plasma metabolites (64%) that were associated with either genetic factors, microbial factors, or both (Fig. 1A–B). Most of these metabolites (508) were significantly associated with the microbiome. 16.3% and 11.4% of these metabolites showed “hybrid” associations in the training and validation cohorts, respectively, meaning they were significantly associated with both genetic and microbial factors (i.e., >1% of total explained variance explained by both sets of features, Fig. 1B–E). In particular, we found that about ⅓ − ½ of all metabolites with a significant genetic association also showed a microbial association (i.e., 82/161 and 57/160 metabolites associated with genetic variants in the training and validation cohorts, respectively). Conversely, no more than ⅕ of all metabolites with a microbiome association also showed a hybrid association with host genetics across training and validation cohorts. Consistent with the average explained variances of metabolites reported above for genetics and the microbiome, hybrid metabolites with particularly large total explained variances (i.e., >20%) showed a tendency to have higher genetic R2 values than microbial R2 values (see Fig. 1B,D and Fig. 2A). However, as mentioned above, these patterns should be interpreted with caution because of the different significance thresholds used for microbiome and genetic association tests.
Figure 2. Metabolites associated with both genetics and the microbiome.

(A) The 20 metabolites with the highest total R2 in the validation cohort that were significantly associated with both host genetics and the microbiome in the training cohort. Blue and green colored bars denote individual R2 values in genetics-only (blue) and microbiome-only (green) regression models, respectively. Gray bars denote R2 values from regression using joined genetics and microbiome data. (B) R2 values obtained by either adding individual contributions of genetics and the microbiome (additive) or by performing a joint regression (joint) for all metabolites in the hybrid group with an identified standard. The difference between the two groups indicates a very small, but nonetheless significant, overlap in variance explained by genetics and the microbiome (n=82 metabolites in each group, two-sided Mann-Whitney test). However, the variances explained by host genetics and the microbiome were largely additive. Boxes denote the interquartile range (IQR) and the center line denotes the median. Whiskers extend up to the largest and smallest point within 1.5 IQRs. Stars denote significance (*** - p<0.001). Metabolite names ending in “*” or “**” indicate compounds for which a standard is not available, but Metabolon is confident in its identity.
In order to test whether genetic and microbial factors contained redundant information, we compared a joint genetics-microbiome model to individual genetic and microbial models. If the overlap in variance explained between genetic and microbial feature sets was large, the joint model R2 value would be substantially smaller than the sum of the individual model R2 values (see Fig. 2A). Although we found that the difference in R2 between the joint model and the sum of the individual model R2 values was statistically significant in the validation cohort across a very large number of models, the magnitude of this difference was extremely small (median difference in R2<0.003), indicating that genetics and the microbiome explain nearly exclusive components of the variance for a given metabolite. This result is consistent with prior work showing that the variances in the blood metabolome explained by the microbiome and host genetics were largely orthogonal 1,8. However, here we demonstrate that this is not only true globally, but at the individual metabolite level as well.
Genome-microbiome interaction effects influence metabolites
Finally, we investigated whether the genetic background of the host could modulate microbiome-metabolite associations (see Fig. 3A). To this end, we tested all genetic variant / bacterial genus / metabolite triplets across our full cohort (N=1,569), filtering for only those variant-metabolite pairs that were previously detected in the genome-wide associations studies (i.e., 23,126 triplets in total). We employed a strategy similar to gene-environment interaction studies, using the CLR transformed abundances of microbial genera as the environmental variable, while adjusting for the covariates mentioned above 28. 466 interaction effects were deemed significant under an FDR-corrected p-value cutoff of 0.05, involving 69 distinct metabolites. Though gene-microbiome interactions were quite common, they only explained very small fractions of plasma metabolite variability (mean R2=0.005) across all genotypes. However, gene-microbiome-metabolite interactions often showed appreciable R2 in the homozygous minor allele genotypes, where they explained up to 16% of the metabolite variances (Fig. 4).
Figure 3. Gene-microbiome interactions.

(A) Gene-microbiome interactions occur when the correlation between a gut bacterial genus and a blood metabolite is itself conditional on a specific allele pairing. (B) Pathway enrichment analysis for metabolites involved in significant gene-microbe interactions. (C) Bacterial family-level enrichment analysis for taxa involved in significant gene-microbe interactions. (D) Host gene enrichment analysis for genes involved in significant gene-microbe interactions. In (B-D) red circles denote prevalence across all tested features (i.e., background prevalence) and blue triangles denote prevalence in only those features with significant interaction terms. Stars denote significantly enriched features under a one-sided hypergeometric test with a Benjamini-Hochberg correction for multiple testing (* - p<0.05, ** - p<0.01, *** - p<0.001). (B-D) include data from the entire cohort (n=1,569).
Figure 4. Gene-microbiome interactions explain variation in blood metabolite levels.

Shown are six selected examples where the genomic background modulates the associations between bacterial genus-level abundances and metabolite levels. In (A-F) metabolite abundances were log-transformed and adjusted for common confounders described above. In (A-F) “ρ” denotes the Spearman’s Rank-Order correlation coefficient of the regression and “p” denotes the p-value under a Spearman’s correlation test. The solid line indicates an ordinary least squares regression line relating the transformed metabolite abundance with the transformed genus abundance within each group. The shaded area is the 95% confidence interval of the regression. Associations were corrected for sex, age, age2, sex:age, and sex:age2 interactions, BMI, microbiome vendor, metabolomics batch, and the first 5 principal components of genetic ancestry (N=1,569). Metabolite names ending in “*” or “**” indicate compounds for which a standard is not available, but Metabolon is confident in its identity.
Gene-microbiome interactions were significantly enriched in metabolites involved in fatty acid metabolism, phenylalanine and tyrosine metabolism, pyrimidine metabolism, and glycine metabolism (Fig. 3C, one-sided hypergeometric test, FDR-corrected p<0.05) and showed enrichment in several host genes, such as CPS1, AGXT2, and SLC2A9 (Fig. 3E, one-sided hypergeometric test FDR-corrected p<0.05). Additionally, we observed enrichment for interactions that involved the Ruminococcaceae family (Fig. 3D). We observed several instances of clinically relevant metabolites that showed stronger microbe-metabolite associations in the homozygous minor allele genotype. For example, we saw interactions with metabolites involved in a variety of disease conditions, such as hepatic failure, mitohormesis, cancer, and inflammatory diseases (Tables S3) 29–31. In particular, N-acetyltyrosine levels were associated with Ruminococcaceae UCG-002 and Bacteroides, but only in the homozygous minor allele genotypes for variants in the ALMS1 gene and in an unchracterized locus on chromosome 2 (Fig. 4A, E). Glycine levels switched from a slight negative correlation with Ruminococcaceae UBA1819 in the homozygous major allele genotype for a locus in the CPS1 gene to a stronger positive correlation within the homozygous minor allele genotype (Fig. 4B). As a final example, sphingomyelin levels were associated with Holdemania, but only in the homozygous minor allele genotype of a variant in the SYNE2 gene (Fig. 4C). All of the selected interactions could also be observed in the unadjusted data (Fig. S2).
Genetics associate with conjugated secondary bile acids
Having mapped plasma metabolite variability to genetic and microbial factors, we now asked whether the observed partitioning of variances explained may change along pathways that involve host-microbe co-metabolisms. To this end, we investigated the metabolism of secondary bile acids. Secondary bile acids are formed in the large intestine via microbial deconjugation of primary bile acids, reabsorbed into the bloodstream via the portal vein, and further metabolized in the liver 32. Thus, secondary bile acid levels in the bloodstream are influenced by both the microbiome and the host. We investigated whether individual bile acid species variances were predominantly explained by genetic factors, microbial factors, or both.
While unconjugated secondary bile acids were exclusively associated with the gut microbiome, 4 out of 6 detected secondary bile acid conjugates showed strong genetic components of their variances in both the training and validation cohorts (Fig. 5A). In particular, plasma deoxycholate and glycochenodeoxycholate abundances showed no significant genetic contribution, whereas more than 40% of the variance in the host-conjugated deoxycholate glucuronide and more than 20% of the variance in host-conjugated glycochenodexycholate glucuronide was explained by host genetics (Fig. 5B). Other glucuronidated non-bile-acid compounds, such as p-cresol glucuronide, did not show this kind of pattern (see Tables S1–S2). Modifications like glucuronidation or sulfation usually occur in the liver and are used to mark metabolites for excretion in feces or urine 33,34. These results indicate that the blood levels of conjugated secondary bile acids are under strong genetic control, while blood levels of unconjugated secondary bile acids are not.
Figure 5. Host genetic and gut microbial associations with secondary bile acids.

(A-C) Explained variances for unconjugated secondary bile acids and hepatic secondary bile acid conjugates in the training cohort (A), the validation cohort (B) and in Bar, Korem et. al.1 (C). (A-C) Gray lines connect deoxycholate and glycochenodeoxycholate with their glucuronidated forms and colored boxes denote unconjugated and conjugated secondary bile acids. (D) Associations between bacterial genera and secondary bile acids (both conjugated and unconjugated) in the training cohort. Only significant associations are shown (FDR-corrected and confounder-adjusted p<0.05, two-sided Pearson correlation test). Fill colors denote the Pearson product-moment correlation coefficients (see legend). (E) Colored cells denote associations that passed a genome-wide significance threshold in the GWAS and the color denotes the estimated coefficient in the GWAS regression. In (A-B, D-E) associations and variances were corrected for sex, age, age2, sex:age, and sex:age2 interactions, BMI, microbiome vendor, metabolomics batch, and the first 5 principal components of genetic ancestry. Metabolite names ending in “*” or “**” indicate compounds for which a standard is not available, but Metabolon is confident in its identity.
Most hepatically modified secondary bile acids showed associations with both genetic and microbial features. One exception was glycochenodeoxycholate glucuronide, which was only associated with genetics and not with the microbiome, even though the unconjugated form, glycochenodeoxycholate, was only associated with the microbiome (Fig. 5A). This result was consistent across the training and validation cohorts and in the Bar, Korem, et al. data set, where the same transition to genetic association after glucuronidation could be observed (Fig. 5A–C). The genomic variant rs4149056 was associated with 3 of the 6 modified secondary bile acids and has been shown previously to affect the abundance of bile acids in plasma 35 (see Fig. 5E). This variant is located in the solute carrier protein SLCO1B1, which is expressed in the liver 36 and transports secondary bile acids, with a preference for sulfated bile acids and bile salts 37. Several other variants in SLCO1B1 were associated with secondary bile acid derivatives. The two detected glucuronidated secondary bile acids were associated with distinctly different sets of SNPs. Glycochenodeoxycholate glucuronide plasma abundances were mostly affected by variants in the SLCO1B3-SLC1B7 gene cluster, whereas we found several variants located on chromosome 4, many of which were in the TMPRSS11E serine protease-encoding gene (and a number of intergenic variants), that specifically affected deoxycholate glucuronide levels (Fig. 5E).
Furthermore, we also identified at least two significant genome-microbiome-metabolite interactions affecting the levels of deoxycholate glucuronide and glycochenodeoxycholate glucuronide. An uncharacterized variant on chromosome 4 was associated with a decreased level of deoxycholate glucuronide in the homozygous minor allele genotype, but this effect was mitigated in individuals with higher levels of Eggerthella in their gut microbiomes (Fig. 4D). Furthermore, a variant in the SLCO1B1 gene showed a positive correlation with Ruminoclostrium only in the homozygous major allele genotype, although this ‘major’ allele was actually less common in our cohort (Fig. 4E).
Thus, whereas cross-sectional variations in unmodified secondary bile acid levels were not explained by host genetic variation, plasma abundances of several secondary bile acid conjugates were significantly associated with a diverse set of compound-specific genetic variants on chromosomes 4 and 12. Additionally, we observed that unmodified deoxycholate and host-conjugated deoxycholate glucuronide where associated with mutually exclusive sets of bacterial genera, despite sharing the same secondary bile acid backbone (Fig. 5D).
Sphingosines and ceramides show heterogeneous associations
We next asked whether hybrid associations would also be common in another disease-relevant class of blood metabolites: ceramides. High ceramide levels have been shown to be associated with insulin resistance, hypercholesterolemia, liver steatosis, and the formation of lipid rafts 38,39. Furthermore, some classes of ceramides in the blood increase the risk for late-onset Alzheimer’s disease, as they are neurotoxic and induce apoptosis 40,41. Ceramides are the simplest of the sphingolipids and are formed either by de novo synthesis from sphingosine or by hydrolysis of sphingomyelin molecules 42. While ceramides are rarely found in bacteria, many bacteria in the Bacteroidetes phylum can synthesize sphingolipids, which have been shown to be taken up and processed by human epithelial cells in vitro 43. Thus, we asked whether variation in ceramide and sphingosine derivatives levels in blood was significantly explained by microbial genera, host genetic factors, or a combination of both.
We observed a large degree of heterogeneity in variance partitioning in ceramide and sphingosine molecules (Fig. 6). Variance in sphingosine itself was only weakly explained by the composition of the gut microbiome (R2=0.01), with Streptococcus as the only significant microbiome-related association. However, other intermediates in ceramide synthesis, such as sphinganine derivatives, showed stronger microbiome and genome associations (i.e. up to 5% and 2% of variance explained in training and validation cohorts, respectively; Fig. 6A–B). Ceramides showed a broad range of explained variances. Whereas most ceramides were associated with the gut microbiome, a small subset had additional genetic components to their variances, such as lactosyl-N-nervonoyl-sphingosine and ceramide (d16:1/24:1, d18:1/22:1), both of which had more than 5% of their variation explained by genetic factors in the training and validation cohorts. Both of these lipid species are examples of very long-chain fatty-acyl sphingolipids. While shorter fatty acids (≤C18) are prevalent in the human diet 44, often serving as preferential substrates for ceramide synthesis by certain gut microbes, longer chain fatty acids, such as nervonic acid (24:1), are often the product of elongation by host enzymes before being incorporated into sphingolipids that are primarily found in brain tissue 45. This result was corroborated in Bar, Korem, et al., where only short-chain sphingolipids were observed to be associated with the microbiome 1. In summary, we observed broad heterogeneity in the fraction of variance explained by the microbiome and host genetics, where a difference in the fatty acid chain length can distinguish circulating ceramides associated with gut microbial genera (i.e., shorter chains) from those associated with host genetics (i.e., longer chains).
Figure 6. Host genetic and gut microbial associations with sphingosine and ceramides.

(A-C) Explained variances for sphingosine, sphingosine intermediates, and ceramides in the training cohort (A), the validation cohort (B) and the study from Bar, Korem et. al. (C). (D) Associations between bacterial genera and sphingosines/ceramides in the training cohort. Only significant associations are shown (FDR-corrected and confounder-adjusted p<0.05, two-sided Pearson correlation test). Fill colors denote the Pearson product-moment correlation coefficients (see legend). (E) Colored cells denote associations that passed a genome-wide significance threshold in the GWAS and the color denotes the estimated coefficient in the GWAS regression. In (A-B, D-E) associations and variances were corrected for sex, age, age2, sex:age, and sex:age2 interactions, BMI, microbiome vendor, metabolomics batch, and the first 5 principal components of genetic ancestry. Metabolite names ending in “*” or “**” indicate compounds for which a standard is not available, but Metabolon is confident in its identity.
Discussion
In this study, we partition the cross-sectional variance explained in individual blood metabolite levels into their host genetic and gut microbiome components, across a population of 1,569 generally healthy U.S. residents with paired genomic, microbiome, and blood metabolomic data. Prior studies have established that the plasma metabolome is intimately connected to host genetics as well as to the gut microbiome, and that overall genetic and microbial influences on the metabolome appear to be orthogonal. Similarly, we quantified the overlap between microbial and genetic variance components for a given metabolite and found that it was significant but very small (<0.03% of explained variance). This result has important implications for Mendelian randomization studies, which rely on this overlap (i.e., non-additivity), and explains why previous studies required very large cohorts to detect very few putatively causal effects 46.
Until now, it has been unclear as to whether or not genetics and the microbiome affect the levels of mutually exclusive sets of metabolites in human blood or act simultaneously on individual metabolites. Most of the detected blood metabolites (508/930) showed significant associations with the gut microbiome. We found that hybrid genome-microbiome models were common and affected about 10–16% of all metabolites associated with either host genetics or the microbiome. Putting a more exact estimate to this proportion will require larger studies than the one presented here because hybrid associations were enriched in metabolites with small microbiome variance components (R2<0.02), many of which we were likely underpowered to detect (e.g., as illustrated by the lower proportion of hybrid cases in the smaller validation cohort). We observed that between one third and one half of blood metabolites associated with host genetics included significant hybrid associations with the gut microbiome, while only one in five metabolites associated with the microbiome showed an additional hybrid genetic component to their variances. Thus, while both genetic and microbiome variation were important to explaining variation in the blood metabolome, the microbiome (and the myriad factors associated with variation in the microbiome, like diet and lifestyle) appears to be the dominant driving force behind variation in circulating metabolites.
Additionally, we found some evidence for gene-microbiome interactions, where a particular metabolite-microbiome association was modulated by the genetic background of the host. In general, these interactions only explained small fractions of metabolite variance (i.e., <2%). However, these interactions often appear to result from relatively strong associations within the homozygous minor allele genotype, but we had limited numbers of these minor allele carriers to assess these associations robustly. That being said, if we look within the homozygous minor allele genotypes, we observed up to 16% of a metabolite’s variance explained by a given bacterial genus. In particular, N-acetyl-tyrosine, glycine, and sphingomyelin abundances all showed strong associations with a distinct set of gut bacterial genera, but only in the homozygous minor allele genotypes. Thus, it appears that genetically-determined deviations from a particular quantitative trait may be modulated, or even induced, by a particular microbial community composition in the gut. These kinds of genome-microbiome interaction effects could help guide the design of microbiome-targeted therapeutics that mitigate host genetic disease risk. For instance, we found that glycine abundance was modulated by 24 gene-microbiome-metabolome interactions (Fig. 4A and Table S3), mostly involving variants in the CPS1 gene. Glycine-addiction is a hallmark of cancer proliferation, and restriction of glycine and serine can reduce tumor growth in murine xenograft and allograft models 29,47. Thus, there may be benefit to microbiome-targeted interventions in individuals with the observed variants in the CPS1 gene to lower circulating glycine levels. However, given the generally low prevalence of the minor allele and the high level of heterogeneity in gut bacterial community composition, targeted interventional studies will be required to validate these interactions in vivo.
Prior studies of microbe-host co-metabolites derived from microbial precursors in blood, like hippurate or p-cresol sulfate (i.e., hepatically modified forms of microbially-derived metabolites), hinted that the cross-sectional variance in these molecules might only be associated with the microbiome and not with host genetics 13,25,48. Here we show that those observations do not extrapolate to all microbe-host co-metabolites. Indeed, we found that much of the cross-sectional variation in many metabolites derived from gut microbial precursors was explained by host genetics. For example, while unconjugated secondary bile acids in the bloodstream are only associated with the gut microbiome, their glucuronidated or sulfated derivatives, formed in the liver, were often strongly influenced by host genetic variation. Most of the variance in deoxycholate glucuronide levels across the current study cohort could be explained by a combination of genetic and microbial factors, while unconjugated deoxycholate was solely associated with the microbiome. Deoxycholate and deoxycholate glucuronide both showed associations with the microbiome, but these associations were with different sets of bacterial genera. Glucuronidation facilitates excretion into urine and feces and prior work in rats has shown that glucuronidated bile acids are reabsorbed less efficiently than unmodified secondary bile acids in the intestine 49. The gut metagenome encodes a variety of β-glucuronidases 50,51 which likely enable gut commensals to use these bile-acid-conjugated-glucuronides as a carbon source. Conversely, free secondary bile acids can be toxic to many bacterial taxa 52. Thus, bile acids can directly drive changes in the gut microbiome, either by acting as carbon sources that promote growth or as toxic compounds that inhibit growth, which may explain the very different subsets of gut genera associated with conjugated and unconjugated forms of the same secondary bile acid (Fig. 5D).
Another intriguing finding from our analyses was the variable association of certain plasma sphingolipids with either host genetics or the gut microbiome, depending on the fatty-acyl groups comprising each lipid species. Ceramides with a ≤C18 fatty-acyl group showed stronger correspondence to the gut microbiome, consistent with the high prevalence of these fatty acids in the diet 44 and their preferential incorporation into ceramides by certain gut taxa 53. On the other hand, ceramides with very long chain fatty-acyl groups (22:1,24:1), that are most abundant in brain tissue and often synthesized through elongation by host enzymes 45, showed a stronger correspondence with host genetics. Importantly, ceramides with different fatty-acyl chain lengths have been implicated in a number of human diseases, including Alzheimer’s disease, depression and mood disorders 54. Distinguishing between which ceramides are under the control of diet and the gut microbiome versus genetic predisposition may aid in the design of new and improved precision therapeutics.
It should be noted that the current study only included individuals from the U.S. who were predominantly of European descent. While our results are consistent with results obtained from cohorts in Israel and Sweden 1,55, future studies in more diverse populations will be required to see whether or not the reported observations replicate more broadly. Finally, while we included many highly relevant covariates in our regression analyses (i.e. age, sex, BMI, and genetic ancestry), many of the observed microbe-metabolite associations are likely confounded with lifestyle and dietary habits, which were not comprehensively tracked in the current study population and can strongly influence the composition of the gut microbiome.
Overall, our analyses show that the plasma metabolome is influenced by the interplay between genetic and microbial factors, where the abundances of individual microbially-derived metabolites absorbed in the gut is often affected additively by both host genetic variation and by variation in the ecology of the gut. Furthermore, many microbe-metabolite associations are dependent upon host genetic background. These hybrid genome-microbiome-metabolite models provide unique insights into the forces underlying variation in the human blood metabolome and can suggest possible therapeutic strategies. Specifically, we suggest that disease-relevant blood metabolites strongly associated with the microbiome may be modifiable through dietary, probiotic, prebiotic or lifestyle interventions, whereas metabolites under strong genetic control may require pharmacological interventions that target host metabolic pathways. Understanding which of these circulating small molecules fall predominantly under host versus microbiome control will help to guide interventions designed to prevent and/or treat a range of diseases.
Methods
Institutional Review Board Approval for the Study
Procedures for this study were reviewed and approved by the Western Institutional Review Board with the Institutional Review Board (IRB) study number 20170658 for the Institute for Systems Biology and 1178906 for Arivale.
Cohort Description
All study participants were subscribers of the Arivale Scientific Wellness program who provided consent and research authorization allowing the use of their anonymized, de-identified data in research. Arivale, Inc. (USA), was a consumer scientific wellness company that shut down in 2019. The Arivale program is described in detail in Zubair et. al. 56. In brief, participants signed up for a comprehensive deep phenotyping program coupled with personalized data-driven wellness coaching in order to improve overall health and wellness. Baseline blood draws were taken by trained phlebotomists at LabCorp or Quest service centers and paired with at-home fecal sampling (see methodological details below). The cohort contained 997 individuals with female sex assigned at birth and 572 individuals with male sex assigned at birth (self-reported). The average age of the cohort was 49±11 years, and the average BMI was 28.2±6.6.
Metabolomics
Plasma samples were transferred to Metabolon, Inc. (USA), for sample processing and data acquisition. Untargeted metabolomics was performed using Metabolon’s Global Metabolomics high-performance liquid chromatography (HPLC)–mass spectrometry (MS) platform across a total of 11 batches. In brief, EDTA-plasma samples were thawed on ice and protein was removed using aqueous methanol extraction. The resulting samples were divided into five fractions (1 backup fraction) and placed briefly on TurboVap (Zymark) to remove organic solvents. Measurements were taken using Waters ACQUITY ultra-performance liquid chromatography (UPLC) and Thermo Scientific Q-Exactive high resolution/accuracy mass spectrometer interfaced with a heated electrospay ionization (HESI-II) source and an Orbitrap mass analyzer operated at 35,000 mass resolution. Each of the four sample fractions was reconstituted for measurement in one of four methods. One aliquot was measured under positive ion condition by elution from a C18 column (Waters UPLC BEH C18–2.1×100 mm, 1.7 μm) using water and methanol with 0.05% perfluoropentanoic acid and 0.1% formic acid. The second sample was eluted from the same column but using acetonitrile along with methanol and water to optimize for more hydrophobic compounds. The third aliquot was measured under negative ion optimized conditions and gradient eluted from a C18 column using methanol, water, and 6.5mM ammonium bicarbonate at pH 8. The last aliquot was measured using negative ionization after elution from a HILIC column (Waters UPLC BEH Amide 2.1×150 mm, 1.7 μm) using water, acetonitrile, and 10mM ammonium formate at pH 10.8. MS analyses were run using MS and data-dependent MSn scans with dynamic exclusion with an approximate range of 70–1000 m/z.
Assay batch correction was performed by running a selected set of quality control plasma samples in every batch and normalizing the measured abundances to the quality control samples. Metabolon judges the confidence level of annotation on a three-tiered scale. The majority (>80%) of metabolites were annotated by a match to an internal standard of the metabolite run on the same HPLC/MS setup (no specific mark added to the name). Some metabolites were annotated only by a match to a previously published MS spectrum (marked with “*”), and some metabolites were annotated only by a match to a known chemical formula (marked with “**”). Metabolites not annotated on any of those tiers, but detected consistently in different biofluids, are denoted by a “X -” prefix followed by a unique ID to indicate unknown compounds. Metabolite not observed in at least 50% of the samples were removed, resulting in the final list of 930 metabolites.
Genome-wide association analysis
DNA was extracted from whole blood samples by Covance (Redmond, WA) using a standardized protocol. Whole genome sequencing was performed by Wuxi, Inc. (Shanghai, China) in a CLIA-certified laboratory for 1,569 individuals. Extracted libraries were sequenced using an HiSeq X sequencer (Illumina, San Diego, USA) with 150bp libraries and aiming for >30x coverage. Basecalling and conversion to raw FASTQ files was performed using the Illumina Basespace software. Raw reads were aligned to the hg19 human reference genome with BWA 0.7.12. The GATK HaplotypeCaller with GATK 3.3.0 was used to call individual variants. This included indel local realignment followed by base quality recalibration. This yielded a set of around 7.68M measured SNPs.
To account for genetic structure and potential cryptic relatedness in the cohort, we used a linear mixed model to test for associations between genetic variants and metabolite levels. For each metabolite, we performed a genome-wide association study using FastGWA using the hg19 genetic map and the metabolite confounder-adjusted residuals 57. Linkage disequilibrium scores were extracted for all variants in the study and used as an additional covariate in the genome-wide association study. Metabolome genome-wide significance was called using the Bonferroni corrected p-value threshold of 5.37e-11 (5e-8 / #metabolites) where 5e-8 is the commonly used threshold for genome-wide significance used in the human genomics field 58. Lead variants were mapped by two-step clumping strategy based on, linkage equilibrium (LD scores) where we first clumped all variants within 250kbp fragments that exceeded LD score cutoff of 0.8, followed by second clumping in 250kbp fragments using an LD score cutoff of 0.3. This resulted in the final set of 297 independent lead variants.
Gut Microbiome Sequencing
Fecal samples were collected and chemcially preserved using a proprietary at-home swab kit (OMNIgene Gut, DNAGenotek, USA). 250μL of homogenized stool from each sample were subsequently used for DNA extraction, using a MoBio PowerMag Soil DNA isolation kit (QIAGEN, Germany) and the KingFisher Flex instrument. DNA concentrations in each sample were quantified with a QuBit (ThermoFisher, USA) and purity was assessed by measuring the A260/A280 absorbance ratio. Amplification and library preparation were performed by external providers using custom and optimized protocols by either amplifying the V4 region (SecondGenome, USA) or the V3-V4 region (DNAGenotek, USA) of the 16S gene.
Sequencing was performed using a MiSeq (Illumina, USA) using either a paired-end 250bp protocol (SecondGenome) or a paired-end 300bp protocol (DNAGenotek). Basecalling was performed using the Illumina Basespace platform which removed the added phiX reads and provided the final FASTQ files used for downstream analysis. Quality of the sequencing reads was assessed by manual inspection of the error rate across sequencing cycles and appropriate length cutoffs of 250bp for the forward reads and 230bp for the reverse reads were chosen based on the profiles. Reads with more than 2 expected errors under the Illumina error model were removed from the analysis along with reads containing ambiguous base calls (“N” nucleotides). More than 97% of reads passed those filters yielding a mean of around 200,000 reads per sample.
Filtered and truncated reads were then used to infer amplicon sequence variants using DADA2 59. Here error profiles were learned for each sequencing run separately. The resulting ASVs and respective counts were merged and chimeras were removed using the “consensus” strategy implemented in DADA2, which removed around 16% of all reads. Taxonomy assignment of ASVs was performed by using the naive Bayes classifier in DADA2 with the SILVA database (version 128). Species assignments were performed by exact match of the inferred ASVs with the reference 16S gene in SILVA, where possible. About 90% of reads could be classified down to the genus levels and 32% of reads could be classified down to the species level. The total set of samples was then filtered for those individuals that also had paired microbiome, metabolomics, and whole genome shotgun sequencing data available. Where several microbiome time points were available we used the one closest to the blood draw used for generating the metabolomics data. Low abundance taxa, with less than 10 reads on average or appearing in less than 50% of all samples, were removed from the data set prior to analysis.
Metabolite-microbiome associations and confounder adjustment
Genus-level read counts were scaled using the centered log-ratio (CLR) transform and centered across the 1,569 samples. Metabolite abundances were log-transformed and centered which yielded good similarity to a normal distribution as judged by a QQ-plot.
Transformed and scaled metabolite and bacterial genera abundances were then regressed against the set of confounders and the residuals were saved as new dependent variables. For the confounder adjustment we chose the most common intrinsic factors that would affect blood metabolite and fecal microbe abundances. Firstly, we corrected for sex, age, and BMI by including coefficients for biological sex, age, age2, sex:age, and sex:age2 interactions, and BMI. Additionally, we included covariates for common batch effects, like the season of the year the samples were obtained, the metabolomics batch (each sample in a batch was processed together), and the vendor for the microbiome sample (DNA Genotek or Second Genome). Finally, we included continuous measures of genetic ancestry as covariates in our analyses, given by the first 5 principal components from an analysis of 100,000 linkage disequilibrium corrected SNPs (minor allele frequency > 5%) calculated with the PC-AiR and PC-Relate methods 60. The residuals were used for all further analyses as well as visualizations and analyses of variances.
After this we performed individual linear regressions between all metabolite-genus pairs to find significant associations between individual metabolites and bacterial genera. P-values were obtained using F-tests and corrected for false discovery rate using the Benjamini and Hochberg method. We then retained the significant associations with an FDR-corrected p<0.05. We also performed the same analysis with alpha-diversity as the independent variable or with alpha-diversity as a covariate to the CLR-transformed bacterial genus abundance to verify that individual genus-level associations were not due to an association with alpha-diversity alone, which we did not observe to be the case.
Analysis of explained variances (R2)
Explained variances (R2) for each metabolite in the training cohort were obtained by ordinary least squares models only containing features that were significantly associated with each metabolite individually in the training cohort. Ergo, we only include genus abundances for those genera with a FDR-corrected p<0.05 in the previous metabolite-microbe associations and only those single nucleotide variants that were identified as significant in the previous genome-wide association study for the particular metabolite. This was done to avoid inflation of the explained variances, as adding random variables to a regression model tends to increase the explained variance. We performed multilinear regressions with only microbiome features, only genetic features, and both genetic and microbiome features to quantify the variance of the transformed and confounder-adjusted metabolite abundances that was explained by each feature type individually and jointly. Consequently, it should be noted that the explained variances in this manuscript refer to the metabolite abundance variance after removing components of the variance explained by the covariates listed above. As such, these R2 caclulations pose an upper bound of the explained variance of the raw metabolite abundance but are also independent of common confounders, such as age, sex, and BMI and thus should be more generalizable to other populations structures.
Overlap between microbiome and genetic features was quantified as the difference between the sum of R2 of the individual feature type models and the joint model R2 (R2[microbiome] + R2[genetics] − R2[joint]). In the case of complete independence of microbiome factors and genetics, this difference would be zero and would become positive if there is partial or complete overlap in the variances explained by the microbiome and host genetics.
In the validation cohort we used the same regression models but as in the training cohort but fixed all coefficients to the ones from the training cohort except for the intercept term to account for different baseline levels of metabolites between the cohorts and avoid negative R2 values. This bounds the validation R2 values to [0, 1] where a metabolite not associated/validated with the identified features from the training cohort will intrinsically receive an R2 of 0.
Genome-microbiome interactions
For gene microbiome interactions, we performed ordinary least squares regression of the form
where mi denotes the scaled abundance of metabolite i, bj the scaled abundance of the bacterial genus j, sk the ordinal versions of the allele on variant k (0 - major allele, 1 - heterozygous minor allele, 2 - homozygous major allele), c the vector of confounder covariates, and εi a random normally distributed variable with an expectation of zero. Significance of the interaction was then evaluated by an F-test comparing the full model to a model with a fixed ɣ=0. P-values from all tested metabolite / variant / genus triplets were adjusted for false discovery rate using the Benjamini and Hochberg method and judged as significant with a FDR cutoff p<0.05. Even though individual metabolite and bacterial genus abundances had been adjusted for confounders previously, we still included them in this regression because the product bj*sk had not been adjusted for the same confounders. Also, to make our analysis computationally feasible, we performed the regressions only for those metabolites, bacterial genera, and genetic variants that had shown at least one significant interaction in the prior analyses. The full cohort was includes in the interaction analysis due to the low prevalence of the minor allele groups.
Finally, post-hoc tests for microbe-metabolite interactions within sub-cohorts carrying a specific allele pairing were performed by Pearson tests on the product-moment correlation within each sub-cohort and for those triplets that had shown significant interaction effects.
Extended Data
Extended Data Figure 1. Genome-wide association study results.

Genome-wide association p-values for variants associated with at least one metabolite (only p < 1e-5 shown here). Red dashed line denotes genome-wide significance (p < 5.37e-11). Gray bars denote separation between chromosomes. N=1,000 (training cohort). The full list of p-values from the GWAS analysis can be found in Table S4.
Extended Data Figure 2.

Gene-microbiome interactions explain variation in blood metabolite levels (unadjusted data). Shown are six selected examples where the genomic background modulates the associations between bacterial genus-level abundances and metabolite levels. In (A-F) metabolite abundances were log-transformed, but no adjustment for common confounders, was performed here. In (A-F) “ρ” denotes the Spearman’s Rank-Order correlation coefficient of the regression and “p” denotes the p-value under a two-sided Spearman’s correlation test. The solid line indicates an ordinary least squares regression line relating the transformed metabolite abundance with the transformed genus abundance within each group. The shaded area is the 95% confidence interval of the regression. Associations were corrected for sex, age, age2, sex:age, and sex:age2 interactions, BMI, microbiome vendor, metabolomics batch, and the first 5 principal components of genetic ancestry (N=1,569). Metabolite names ending in “*” or “**” indicate compounds for which a standard is not available, but Metabolon is confident in its identity.
Supplementary Material
Table S1. Training cohort R2 values for genetics and microbiome for the 595 metabolites with a significant association in the training cohort (N=1,000). Significant associations with bacterial genera were obtained by Pearson correlation tests between CLR-transformed genus abundances and log-transformed metabolite residuals corrected for common confounders (all FDR-corrected p<0.05). Significant associations with genomic variants were obtained by the FastGWA mixed-linear model with a Bonferroni correction across metabolites (p < 5e-8 / #metabolites).
Table S2. Validation out-of-sample R2 values for genetics and microbiome for the 595 metabolites with a significant association in the training cohort (N=569). Explained variances were calculated by using the fixed association coefficients from the training set models.
Table S3. Genome-microbiome interactions for blood metabolite regression models with a significant interaction term (466 interactions, N=1,569). P-values were obtained from a two-sided Wald test on the interaction term coefficient γ of an ordinary least squares regression model.
Table S4. Significant results from the genome-wide association analysis in the training cohort (N=1,000) along with annotations for variants and metabolites. Significant associations with genomic variants were obtained by the FastGWA mixed-linear model using a sparse genetic-relationship matrix with a Bonferroni correction across metabolites (p < 5e-8 / #metabolites).
Table S5. Enrichment tests for genome-microbiome associations. P-values were obtained from one-sided hypergeometric tests and corrected for multiple testing using the Benjamini-Hochberg correction.
Acknowledgments
SMG and CD were supported by the Washington Research Foundation Distinguished Investigator Award and startup funds from the Institute for Systems Biology. Research reported in this publication was supported by the National Institute of Diabetes And Digestive And Kidney Diseases (NIDDK) of the National Institutes of Health (NIH) under Award Number R01DK133468 (to SMG). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The funders had no role in designing, carrying out, or interpreting the work presented in the manuscript. TW was supported by a generous gift from C. Ellison. Further support came from the NIH grant (no. U19AG023122) awarded by the National Institute on Aging (NIA) (to NR).
Footnotes
Competing Interests Statement
C.L.D., B.S., and A.T.M. were all former employees of and held stock options in Arivale, Inc. L.H. was a former shareholder of Arivale. Arivale is no longer a commercially operating company as of April 2019. The remaining authors report no competing interests.
Code Availability
Pipelines for the 16S amplicon data processing and analysis can be found at https://github.com/Gibbons-Lab/mbtools. GWAS workflows, Jupyter notebooks, intermediate data files, and Python code for reproducing the regression models and figures have been deposited at https://github.com/Gibbons-Lab/2021_gene_environment_interactions.
Reviewer recognition statement:
Nature Metabolism thanks Aleksandar D Kostic and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Data Availability
Qualified researchers can access the full Arivale deidentified dataset supporting the findings in this study for research purposes through signing a Data Use Agreement (DUA). Inquiries to access the data can be made at data-access@isbscience.org and will be responded to within 7 business days. The raw 16S amplicon sequencing data can be found in the Sequencing Read Archive (SRA) under Bioproject IDs PRJNA826530 and PRJNA826648. External databases used: the hg19 human reference genome (https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.13), the SILVA database version 128 (https://www.arb-silva.de/), and the NCBI dbSNP database build 155 (https://www.ncbi.nlm.nih.gov/snp/).
References
- 1.Bar N et al. A reference map of potential determinants for the human serum metabolome. Nature 588, 135–140 (2020). [DOI] [PubMed] [Google Scholar]
- 2.Wang X & Paigen B Genetics of variation in HDL cholesterol in humans and mice. Circ. Res. 96, 27–42 (2005). [DOI] [PubMed] [Google Scholar]
- 3.Blau N, van Spronsen FJ & Levy HL Phenylketonuria. Lancet 376, 1417–1427 (2010). [DOI] [PubMed] [Google Scholar]
- 4.Gieger C et al. Genetics meets metabolomics: a genome-wide association study of metabolite profiles in human serum. PLoS Genet. 4, e1000282 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Yu B et al. Genetic determinants influencing human serum metabolome among African Americans. PLoS Genet. 10, e1004212 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Gallois A et al. A comprehensive study of metabolite genetics reveals strong pleiotropy and heterogeneity across time and context. Nat. Commun. 10, 4788 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Wilmanski T et al. Blood metabolome predicts gut microbiome α-diversity in humans. Nat. Biotechnol. 37, 1217–1228 (2019). [DOI] [PubMed] [Google Scholar]
- 8.Rothschild D et al. Environment dominates over host genetics in shaping human gut microbiota. Nature 555, 210–215 (2018). [DOI] [PubMed] [Google Scholar]
- 9.Goodrich JK et al. Human genetics shape the gut microbiome. Cell 159, 789–799 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Nicholson JK et al. Host-Gut Microbiota Metabolic Interactions. Science 336, 1262–1267 (2012). [DOI] [PubMed] [Google Scholar]
- 11.Jiao Y, Lu Y & Li X-Y Farnesoid X receptor: a master regulator of hepatic triglyceride and glucose homeostasis. Acta Pharmacol. Sin. 36, 44–50 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Zheng X et al. Hyocholic acid species improve glucose homeostasis through a distinct TGR5 and FXR signaling mechanism. Cell Metab. 33, 791–803.e7 (2021). [DOI] [PubMed] [Google Scholar]
- 13.Lees HJ, Swann JR, Wilson ID, Nicholson JK & Holmes E Hippurate: the natural history of a mammalian-microbial cometabolite. J. Proteome Res. 12, 1527–1546 (2013). [DOI] [PubMed] [Google Scholar]
- 14.Nakamura A et al. Symbiotic polyamine metabolism regulates epithelial proliferation and macrophage differentiation in the colon. Nat. Commun. 12, 2105 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Vitali C, Khetarpal SA & Rader DJ HDL Cholesterol Metabolism and the Risk of CHD: New Insights from Human Genetics. Curr. Cardiol. Rep. 19, 132 (2017). [DOI] [PubMed] [Google Scholar]
- 16.Kenny DJ et al. Cholesterol Metabolism by Uncultured Human Gut Bacteria Influences Host Cholesterol Level. Cell Host Microbe 28, 245–257.e6 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Dayama G, Priya S, Niccum DE, Khoruts A & Blekhman R Interactions between the gut microbiome and host gene regulation in cystic fibrosis. Genome Med. 12, 12 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Hunter DJ Gene-environment interactions in human diseases. Nat. Rev. Genet. 6, 287–298 (2005). [DOI] [PubMed] [Google Scholar]
- 19.Liu G et al. Analyzing large-scale samples confirms the association between the ABCA7 rs3764650 polymorphism and Alzheimer’s disease susceptibility. Mol. Neurobiol. 50, 757–764 (2014). [DOI] [PubMed] [Google Scholar]
- 20.Cuyvers E et al. Mutations in ABCA7 in a Belgian cohort of Alzheimer’s disease patients: a targeted resequencing study. Lancet Neurol. 14, 814–822 (2015). [DOI] [PubMed] [Google Scholar]
- 21.Huynh K et al. Concordant peripheral lipidome signatures in two large clinical studies of Alzheimer’s disease. Nat. Commun. 11, 5698 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Bosma PJ et al. Bilirubin UDP-glucuronosyltransferase 1 is the only relevant bilirubin glucuronidating isoform in man. J. Biol. Chem. 269, 17960–17964 (1994). [PubMed] [Google Scholar]
- 23.Wikoff WR et al. Metabolomics analysis reveals large effects of gut microflora on mammalian blood metabolites. Proc. Natl. Acad. Sci. U. S. A. 106, 3698–3703 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Teufel R et al. Bacterial phenylalanine and phenylacetate catabolic pathway revealed. Proc. Natl. Acad. Sci. U. S. A. 107, 14390–14395 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Pallister T et al. Hippurate as a metabolomic marker of gut microbiome diversity: Modulation by diet and relationship to metabolic syndrome. Sci. Rep. 7, 13670 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Brial F et al. Human and preclinical studies of the host-gut microbiome co-metabolite hippurate as a marker and mediator of metabolic health. Gut 70, 2105–2114 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Ridlon JM, Kang DJ, Hylemon PB & Bajaj JS Bile acids and the gut microbiome. Curr. Opin. Gastroenterol. 30, 332–338 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Winham SJ & Biernacka JM Gene-environment interactions in genome-wide association studies: current approaches and new directions. J. Child Psychol. Psychiatry 54, 1120–1134 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Jain M et al. Metabolite profiling identifies a key role for glycine in rapid cancer cell proliferation. Science 336, 1040–1044 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Maceyka M & Spiegel S Sphingolipid metabolites in inflammatory disease. Nature 510, 58–67 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Matsumura T et al. N-acetyl-l-tyrosine is an intrinsic triggering factor of mitohormesis in stressed animals. EMBO Rep. 21, e49211 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.de Aguiar Vallim TQ, Tarling EJ & Edwards PA Pleiotropic roles of bile acids in metabolism. Cell Metab. 17, 657–669 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.King CD, Rios GR, Green MD & Tephly TR UDP-glucuronosyltransferases. Curr. Drug Metab. 1, 143–161 (2000). [DOI] [PubMed] [Google Scholar]
- 34.Alnouti Y Bile Acid sulfation: a pathway of bile acid elimination and detoxification. Toxicol. Sci. 108, 225–246 (2009). [DOI] [PubMed] [Google Scholar]
- 35.Xiang X et al. Effect of SLCO1B1 polymorphism on the plasma concentrations of bile acids and bile acid synthesis marker in humans. Pharmacogenet. Genomics 19, 447–457 (2009). [DOI] [PubMed] [Google Scholar]
- 36.Uhlén M et al. Tissue-based map of the human proteome. Science 347, 1260419 (2015). [DOI] [PubMed] [Google Scholar]
- 37.Tóth B et al. Human OATP1B1 (SLCO1B1) transports sulfated bile acids and bile salts with particular efficiency. Toxicol. In Vitro 52, 189–194 (2018). [DOI] [PubMed] [Google Scholar]
- 38.Meikle PJ & Summers SA Sphingolipids and phospholipids in insulin resistance and related metabolic disorders. Nat. Rev. Endocrinol. 13, 79–91 (2017). [DOI] [PubMed] [Google Scholar]
- 39.Chaurasia B et al. Targeting a ceramide double bond improves insulin resistance and hepatic steatosis. Science 365, 386–392 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Mielke MM et al. Serum ceramides increase the risk of Alzheimer disease: the Women’s Health and Aging Study II. Neurology 79, 633–641 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Green DR Apoptosis and sphingomyelin hydrolysis. The flip side. The Journal of cell biology vol. 150 F5–7 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Summers SA, Chaurasia B & Holland WL Metabolic messengers: Ceramides. Nat Metab 1, 1051–1058 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Johnson EL et al. Sphingolipids produced by gut bacteria enter host metabolic pathways impacting ceramide levels. Nat. Commun. 11, 2471 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Ervin RB, Wright JD, Wang C-Y & Kennedy-Stephenson J Dietary intake of fats and fatty acids for the United States population: 1999–2000. Adv. Data 1–6 (2004). [PubMed] [Google Scholar]
- 45.Fan Y, Meng H-M, Hu G-R & Li F-L Biosynthesis of nervonic acid and perspectives for its production by microalgae and other microorganisms. Appl. Microbiol. Biotechnol. 102, 3027–3035 (2018). [DOI] [PubMed] [Google Scholar]
- 46.Liu X et al. Mendelian randomization analyses support causal relationships between blood metabolites and the gut microbiome. Nat. Genet. 54, 52–61 (2022). [DOI] [PubMed] [Google Scholar]
- 47.Maddocks ODK et al. Modulating the therapeutic response of tumours to dietary serine and glycine starvation. Nature 544, 372–376 (2017). [DOI] [PubMed] [Google Scholar]
- 48.Tevzadze G et al. Effects of a gut microbiome toxin, p-cresol, on the indices of social behavior in rats. Neurophysiology 50, 372–377 (2018). [Google Scholar]
- 49.Oelberg DG, Little JM, Adcock EW & Lester R Intestinal absorption of bile acid glucuronides in rats. Dig. Dis. Sci. 33, 1110–1115 (1988). [DOI] [PubMed] [Google Scholar]
- 50.Creekmore BC et al. Mouse Gut Microbiome-Encoded β-Glucuronidases Identified Using Metagenome Analysis Guided by Protein Structure. mSystems 4, (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Gloux K et al. A metagenomic β-glucuronidase uncovers a core adaptive function of the human intestinal microbiome. Proc. Natl. Acad. Sci. U. S. A. 108 Suppl 1, 4539–4546 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Begley M, Gahan CGM & Hill C The interaction between bacteria and bile. FEMS Microbiol. Rev. 29, 625–651 (2005). [DOI] [PubMed] [Google Scholar]
- 53.Stankeviciute G et al. Convergent evolution of bacterial ceramide synthesis. Nat. Chem. Biol. 1–8 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Dinoff A, Herrmann N & Lanctôt KL Ceramides and depression: A systematic review. J. Affect. Disord. 213, 35–43 (2017). [DOI] [PubMed] [Google Scholar]
- 55.Dekkers KF et al. An online atlas of human plasma metabolite signatures of gut microbiome composition. medRxiv 2021.12.23.21268179 (2021) doi: 10.1101/2021.12.23.21268179. [DOI] [Google Scholar]
- 56.Zubair N et al. Genetic Predisposition Impacts Clinical Changes in a Lifestyle Coaching Program. Sci. Rep. 9, 6805 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Jiang L et al. A resource-efficient tool for mixed model association analysis of large-scale data. Nat. Genet. 51, 1749–1755 (2019). [DOI] [PubMed] [Google Scholar]
- 58.Xu C et al. Estimating genome-wide significance for whole-genome sequencing studies. Genet. Epidemiol. 38, 281–290 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Callahan BJ et al. DADA2: High-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Conomos MP, Reiner AP, Weir BS & Thornton TA Model-free Estimation of Recent Genetic Relatedness. Am. J. Hum. Genet. 98, 127–148 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Table S1. Training cohort R2 values for genetics and microbiome for the 595 metabolites with a significant association in the training cohort (N=1,000). Significant associations with bacterial genera were obtained by Pearson correlation tests between CLR-transformed genus abundances and log-transformed metabolite residuals corrected for common confounders (all FDR-corrected p<0.05). Significant associations with genomic variants were obtained by the FastGWA mixed-linear model with a Bonferroni correction across metabolites (p < 5e-8 / #metabolites).
Table S2. Validation out-of-sample R2 values for genetics and microbiome for the 595 metabolites with a significant association in the training cohort (N=569). Explained variances were calculated by using the fixed association coefficients from the training set models.
Table S3. Genome-microbiome interactions for blood metabolite regression models with a significant interaction term (466 interactions, N=1,569). P-values were obtained from a two-sided Wald test on the interaction term coefficient γ of an ordinary least squares regression model.
Table S4. Significant results from the genome-wide association analysis in the training cohort (N=1,000) along with annotations for variants and metabolites. Significant associations with genomic variants were obtained by the FastGWA mixed-linear model using a sparse genetic-relationship matrix with a Bonferroni correction across metabolites (p < 5e-8 / #metabolites).
Table S5. Enrichment tests for genome-microbiome associations. P-values were obtained from one-sided hypergeometric tests and corrected for multiple testing using the Benjamini-Hochberg correction.
Data Availability Statement
Qualified researchers can access the full Arivale deidentified dataset supporting the findings in this study for research purposes through signing a Data Use Agreement (DUA). Inquiries to access the data can be made at data-access@isbscience.org and will be responded to within 7 business days. The raw 16S amplicon sequencing data can be found in the Sequencing Read Archive (SRA) under Bioproject IDs PRJNA826530 and PRJNA826648. External databases used: the hg19 human reference genome (https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.13), the SILVA database version 128 (https://www.arb-silva.de/), and the NCBI dbSNP database build 155 (https://www.ncbi.nlm.nih.gov/snp/).
