Abstract
The kidneys integrate information from continuous systemic processes related to the absorption, distribution, metabolism and excretion (ADME) of metabolites. To identify underlying molecular mechanisms, we performed genome-wide association studies of the urinary concentrations of 1,172 metabolites among 1,627 patients with reduced kidney function. The 240 unique metabolite–locus associations (metabolite quantitative trait loci, mQTLs) that were identified and replicated highlight novel candidate substrates for transport proteins. The identified genes are enriched in ADME-relevant tissues and cell types, and they reveal novel candidates for biotransformation and detoxification reactions. Fine mapping of mQTLs and integration with single-cell gene expression permitted the prioritization of causal genes, functional variants and target cell types. The combination of mQTLs with genetic and health information from 450,000 UK Biobank participants illuminated metabolic mediators, and hence, novel urinary biomarkers of disease risk. This comprehensive resource of genetic targets and their substrates is informative for ADME processes in humans and is relevant to basic science, clinical medicine and pharmaceutical research.
The term ADME describes the processing of a compound within an organism1. Although often used in the context of drug metabolism, ADME functions also influence the concentrations of naturally occurring intermediates and end products of metabolism, metabolites. Organs and tissues that strongly influence each of the respective ADME components are the intestinal tract, blood, liver and kidneys. As major excretory organs, the kidneys integrate continuous systemic ADME processes by controlling the amount of urinary metabolite excretion2. Although the blood concentrations of many metabolites are tightly regulated, urine metabolite concentrations can vary widely and serve as a readout of metabolic capacities that are not detected in blood3. We therefore reasoned that the study of urinary metabolite concentrations can be particularly informative for ADME processes in humans.
In addition to filtration, the kidneys have an important role in the generation, breakdown, and active reabsorption and secretion of metabolites, which determines their concentrations in urine4. Many metabolites are excreted by active detoxification and transport processes in the epithelial cells of proximal tubules, where specialized enzymes and transport proteins coordinate their breakdown and clearance5. Not all of these transporters and enzymes, or their sub-strates in vivo, have been identified. We reasoned that the presence of reduced kidney function may represent a ‘challenge’ state that may lead to the upregulation of ADME-relevant processes to compensate for reduced glomerular filtration, and may carry information about the metabolism of uremic toxins and drugs that are commonly prescribed to patients with chronic kidney disease (CKD).
Genome-wide association studies of metabolite concentrations (mGWAS) can provide novel insights into human physiology, inborn errors of metabolism, and complex traits and diseases6–9 Most previous mGWAS have focused on blood metabolite concentrations in population-based studies2. By performing genome-wide searches, mGWAS can implicate the enzymes and transporters that influence the generation, uptake, transport, breakdown or excretion of a metabolite6,9,10.
Here, we perform mGWAS of the urine concentrations of 1,172 endogenous and xenobiotic metabolites, replicate significant associations, and examine whether associations that are detected in patients with CKD are also observed in a healthy population sample. We show that the genes that were identified are highly informative for ADME processes of both endogenous and xenobiotic metabolites, such as for CYP2D6 in metoprolol metabolism. We fine map associations to single causal variants, often missense substitutions in enzymes and transporters, and use gene expression data to illuminate the tissues and cell types in which they are likely to operate. Systematic integration of the data with the health correlates of detected variants in 450,000 UK Biobank participants highlights metabolic readouts of underlying pathogenic mechanisms, such as genetically encoded lower urinary phosphoethanolamine concentrations, that reflect higher alkaline phosphate activity and urinary phosphate concentrations, resulting in an increased risk of kidney stone disease. The comprehensive list of genetic targets and their corresponding substrates reveals novel insights into human ADME processes and biotransformation reactions, and is relevant to basic science, clinical medicine and pharmaceutical research.
Results
We performed mGWAS of the urine concentrations of 1,172 metabolites from 1,221 (discovery) and 406 (replication) patients with CKD, followed by meta-analysis11 and downstream characterization of replicated findings (Methods). Patients were selected from the German Chronic Kidney Disease (GCKD) study12,13 if they had an estimated glomerular filtration rate (eGFR)14 of ≤ 50 ml min−1 per 1.73 m2 and normoalbuminuria (a urinary albumin-to-creatinine ratio (UACR) of <30 mg g−1; Extended Data Fig. 1). The mean eGFR was 42 ml min−1 per 1.73 m2 and the median UACR was 12.3 mg g−1 (Supplementary Table 1). Metabolites were quantified by using non-targeted mass spectrometry analysis15; detailed information for each metabolite, including the biochemical name and pathway, is provided in Supplementary Table 2.
Identification of 240 metabolite-associated loci
Across the 1,172 mGWAS, 240 genomic intervals that contained at least one SNP that was significantly associated with urinary metabolite concentrations (mQTL, P < 4.27 × 10−11; Fig. 1, Supplementary Table 3 and Supplementary Fig. 1) were identified and replicated. The strongest mQTLs were detected for metabolites that were yet unnamed (‘X-*’) at PYROXD2 (X-24809, P = 3.6 × 10−574), NAT8 (X-12125, P = 2.4 × 10−570) and AKRD7A (X-24462, P = 2.3 × 10−412). Association results were similar both with and without adjustment for eGFR and UACR (Extended Data Fig. 2). Of the 240 mQTLs, 26 had been described previously to be associated with the same urinary metabolite16–19, and 54 newly associated metabolites were also identified in these regions. The remaining 160 mQTLs represented novel loci for urinary metabolite concentrations at the time of annotation (Supplementary Table 3), 18 of which had been detected previously to be associated with the same metabolite in blood6,7,20,21. Of the 211 unique metabolites that underlie the 240 mQTLs, 136 were named (‘knowns’, with most of these being amino acids) and 75 were yet unnamed (‘unknowns’). The variance in metabolite levels that was explained by the index SNP at each locus ranged from 2.0% to 63.1% (Supplementary Table 3), which highlights the close relationship between the genome and the metabolome.
We assessed whether the mQTLs that were identified could capture intracellular enzymatic reactions and transmembrane transport processes, and multiple examples were found that were in support of both processes. For instance, mQTLs of N-acetyl-tyrosine and N-acetyl-phenylalanine revealed intracellular reactions that balance the N-acetylation of aromatic amino acids. Both of these metabolites were associated with SNPs at ACY3, which encodes an enzyme that de-acetylates N-acetylated aromatic amino acids in kidney proximal tubules22, and with SNPs at NAT8, a gene that is important in the N-acetylation of metabolites in the renal proximal tubule and in liver cells (Supplementary Table 3) (ref. 23). With respect to transport processes, associations contained positive controls such as the amino acid exchanger SLC7A9 and its substrate lysine, as well as novel associations that highlight the hypothesis-generating potential of our screen, such as associations at genes encoding for ‘orphan’ transporters. For example, UNC93A encodes a newly recognized atypical solute transporter with a yet-unknown substrate spectrum24. In mice, its expression is regulated by amino acid availability25. The observed association with N-acetylglucosaminylasparagine suggests that this metabolite may be a potential substrate.
Associations in a population-based sample
We next evaluated whether associations were also observed among 977 participants of the population-based Study of Health in Pomerania-Trend (SHIP-Trend) study26 (mean eGFR, 92 ml min−1 per 1.73 m2; Supplementary Table 4) by comparing genetic effect sizes at 90 mQTLs for which the same metabolite and SNP were available (Methods). Genetic effects could be predicted well in the population sample (model R2 = 0.952, Extended Data Fig. 3). Of 90 matched mQTLs, 70 mQTLs showed genome-wide significance in the SHIP-Trend sample, and 82 were significant after correction for 90 tests. Genetic effects on metabolite levels from patients with CKD were on average 1.35-fold greater than from individuals in the general population (P < 2 × 10−16; Extended Data Fig. 3a, Supplementary Table 5 and Methods). These larger genetic effect estimates were not related to mQTLs near the significance threshold, to the detection mode of the mass spectrometer, the metabolite super-pathway, the precision of metabolite measurement, or to the proportion of imputed data (Extended Data Fig. 3b–f). Evidence for significant heterogeneity (P-het < 0.05/90) was detected at 17 mQTLs (Supplementary Table 5), none of which showed a statistically significant (P < 0.05/17) interaction with eGFR in the SHIP-Trend sample. These findings suggest that most of the genetic effects on urine metabolite concentrations among patients with CKD are also relevant in individuals with normal kidney function, and that the potential modulation of underlying mechanisms is also likely to be effective among the general population.
Insights into ADME processes and handling of uremic toxins
Genes within associated loci were scored based on proximity, functional consequences and properties of the index SNP, and on colocalization with gene expression, and were subsequently ranked to assign the most likely causal gene (Methods). We then aimed to identify processes and pathways that connect the resulting 90 unique genes and these were found to be strongly enriched among 298 genes known to participate in ADME processes (P < 1 × 10−8; Fig. 2a, Supplementary Tables 6,7 and Methods). Twenty-one of the 90 genes could be annotated to phase I, II and III biotransformation reactions that are known to be important in drug metabolism. 1. Additional consideration of a manually curated, extended list of 544 ADME genes, and of a targeted literature review, allowed 14 additional genes to be placed into the context of these biotransformation reactions, for a total of 35 out of 90 genes (Fig. 2b and Methods). This strongly supports the use of unbiased studies of urinary metabolite concentrations as an integrative readout of ongoing human ADME processes, and nominates the remaining 55 genes as interesting novel candidates.
Next, systematic testing for enrichment of the 90 genes in pathways, molecular functions, cellular components and biological processes in the Gene Ontology (GO)27 and Kyoto Encyclopedia of Genes and Genomes (KEGG)28 databases was undertaken. Overall, 89 GO biological processes, 4 GO cellular components, 33 GO molecular functions and 20 KEGG pathways were significantly enriched for these genes (adjusted P < 0.05; Supplementary Table 7). Both the GO biological functions and the KEGG pathways implicated several processes that are related to detoxification and drug metabolism as greatly enriched (for example, ‘drug catabolic process’, P < 1 × 10−8). Enrichment was also observed for processes that are related to metabolism of small molecules, including carboxylic, amino and fatty acids (Supplementary Table 7 and Fig. 2c). This is consistent with the prominent role of the proximal tubule in the metabolism and handling of amino, organic and carboxylic acids29, and with the importance of fatty acid metabolism to satisfy its high metabolic needs. Highly enriched molecular functions further supported the importance of enzymes that mediate phase I and II biotransformation reactions (for example, ‘cofactor binding’, ‘monooxygenase activity’, ‘oxidoreductase activity’, all P < 1 × 10−8; Supplementary Table 7), and of transmembrane transport. The most highly enriched cellular components (‘mitochondrial matrix’, P = 8.0 × 10−7 and ‘peroxisome’, P = 9.3 × 10−5) were consistent with the localizations in which many detoxification reactions as well as fatty acid and amino acid metabolism are known to occur.
This study of patients with CKD was also informative in terms of the metabolism of uremic toxins; significantly enriched biological processes included ‘kynurenine metabolic process’ (P = 3.0 × 10−7; Fig. 2c) and ‘spermidine metabolic process’ ( P = 2.0 × 10−4; Supplementary Table 7). The genes AFMID, GOT2, KMO and KYAT3 encode enzymes that operate on kynurenine and its metabolites, which are elevated in uremia30,31. In a similar manner, the polyamines spermidine, spermine and putrescine accumulate with declining kidney function30, which probably aided the identification of genes with roles in amine and polyamine metabolism (KMO, AOC1, PAOX, AFMID, COMT and HDAC10). Together, the enriched pathways support the idea that studies of specific biospecimens reflect both organismal and organ-specific processes.
Enriched tissues and cell types reflect ADME components
We examined systematically whether any tissues and kidney cell types were enriched for high expression of the 90 genes that were found to be linked to urinary metabolite concentrations (Supplementary Table 8). Across 38 human tissues from the Genotype-Tissue Expression (GTEx) Project32, significant enrichment (P < 6.9 × 10−4) was found for expression of these genes in kidney (P < 1 × 10−8), liver (P < 1 × 10−8), small intestine (P = 1.0 × 10−6), pancreas (P = 2.8 × 10−6), and left heart ventricle (P = 4.2 × 10−5) (Fig. 3a and Methods). These results, based on unbiased testing of a large human RNA-sequencing (RNA-seq) resource, reflect organs that are known to be important for absorption (small intestine), metabolism (liver, pancreas) and excretion (kidney)—all ADME components. They also suggest a newly identified role for the left heart ventricle, a tissue rich in mitochondria, in ADME. Within the kidney, the final organ to determine urine metabolite concentrations, gene expression data from 43,745 single murine kidney cells33 and from 4,524 nuclei of adult human kidney cells34 were used to identify cell-type-specific enrichment that was confined to different segments of the proximal tubule (Fig. 3b (human, all P < 1 × 10−8) and Extended Data Fig. 4 (mouse, P < 1 × 10−8)). This finding is consistent with the prominent role of proximal tubule epithelial cells in many of the enriched processes, pathways and functions5. For example, SLC5A11 showed high, specific expression in the S3 segment of the proximal tubule, which is in agreement with the localization of the encoded myo-inositol transporter in rabbit kidney35 and the observed association with urinary myo-inositol concentrations. Figure 3c highlights genes highly expressed in at least one human kidney cell type, which may facilitate selection of a suitable cell line for experimental studies.
Fine mapping identifies genes and candidate variants
After characterization at the gene level, statistical fine mapping was performed to resolve associated loci into potentially causal variants. Conditional analyses36 across 240 mQTLs identified 1 independent signal at 223 mQTLs, 2 signals at 15 mQTLs, and 3 signals at ACADL and TTC38 (Supplementary Table 9). For each of the 259 independent signals, a credible set was constructed that collectively accounted for 99% posterior probability of containing the variant or variants that cause the association signal (posterior probability of association (PPA); Methods)37. There were 25 single-SNP credible sets and 82 small credible sets that each consisted of ≤5 SNPs (Supplementary Table 10).
SNPs within small credible sets were annotated with respect to function, predicted deleteriousness and regulatory potential (Methods). Across 96 small credible sets or sets with a SNP with a PPA of >0.5, there were 43 missense variants (18 unique) and 1 stop loss variant that directly implicate the affected gene (Supplementary Table 10). The implicated genes were often supported further by existing biochemical knowledge, by corresponding monogenic diseases and/or by prior functional evidence (Fig. 4). For example, the association of isobutyrylcarnitine with SLC22A1 was fine mapped to two independent signals; rs662138 was mapped to single-SNP resolution and rs12208357 (p.Arg61Cys) was mapped to a 99% credible set of three SNPs (Supplementary Table 10). A recent publication used a combination of in vivo and in vitro approaches to show that rs12208357 and rs202220802 (p.Met420del; tagged by rs662138) independently result in reduced SLC22A1-mediated carnitine efflux38.
Noncoding variants in small credible sets or with high PPA are particularly informative together with genomic annotation and gene expression. For example, rs140254647 (PPA > 99%) was associated with urinary mannose (Supplementary Table 10). This SNP is intronic to SPATA6 and maps into open chromatin in kidney tissue, which indicates that it may have regulatory potential. The association signal with mannose colocalized with differential expression of the neighboring SLC5A9 gene in the tubulo-interstitial portion of kidney tis-sue39. SLC5A9 encodes a renal glucose transporter that is known to transport mannose40, and that is highly expressed in the human proximal tubule (Fig. 3c). These observations suggest the presence of a reg ulatory element for SLC5A9 transcription in human kidney that maps into an intron of the neighboring gene. The comprehensive list of fine-mapped and annotated SNPs (Supplementary Table 10) represents a valuable resource for the guidance of future experimental studies.
Metabolite ratios and modules capture physiology
Next, we evaluated whether modeling of metabolites to reflect ongoing enzymatic reactions and transport processes in vivo could refine biological and clinical insights. Pair-wise metabolite ratios can capture the activity, affinity or abundance of enzymes or metabolite exchangers10,41,42. Genetic association testing for each of the 159 unique index SNPs with all pair-wise ratios that contained the associated metabolite revealed 880 significant, informative ratios (P < 4.3 × 10−11 (5 × 10−8/1,172), P gain > 2.2 × 106; Methods and Supplementary Table 11).
New insights gained from metabolite ratios are exemplified by the association of rs3892097 in a splice acceptor site of CYP2D6 with the ratio of metoprolol to α-hydroxy-metoprolol (P = 2.3 × 10−55, Supplementary Table 11). This ratio showed an approximately 100-fold concentration difference across genotypes (Fig. 5a). It was more significant than its individual components α-hydroxy-metoprolol (P = 5.6 × 10−16) and metoprolol (P = 4.4 × 10−5), which illustrates the gain in information that is provided by modeling the activity of the encoded enzyme cytochrome P450 family 2 subfamily D member 6. Metoprolol was used by 26% of our CKD study population, which is consistent with the high proportion of patients with CKD who use beta-blockers43. Metoprolol has negative chronotropic effects, which may be more apparent among slow metabolizers (TT genotype). Indeed, among 1,150 patients taking metoprolol in the entire GCKD study, there was a small but significant (P = 0.027) difference in mean heart rate. Moreover, no slow metabolizer who used metoprolol had a heart rate above 80 beats per minute (bpm) (Fig. 5b,c). This may be of clinical relevance given the high rate of heart disease and frequent use of metoprolol among patients with CKD. Another example, related to the renal tubular amino acid exchanger SLC7A9, is included in Supplementary Note and Extended Data Fig. 5.
We then examined whether genetic associations with groups of correlated metabolites (‘modules’) may reflect shared biochemical pathways or coregulation, using a correlation network analysis-based approach to construct 212 metabolite modules (Methods and Extended Data Fig. 6) (ref. 44). GWAS of the first principal component of the modules, the eigenmetabolite, identified 46 significant (P < 2.4 × 10−10 (5 × 10−8/212)) associations (Extended Data Fig. 7 and Supplementary Table 12). This hypothesis-generating screen of higher order genetic associations can prove useful for the de-orphanization of ‘unknown’ metabolites. The detailed description of the findings in the Supplementary Note illustrates examples in which knowledge of the genetic association of an unknown metabolite and the module to which it belonged helped to restrict the search for its possible identities, as demonstrated by the subsequent experimental confirmation of X-13689 as the glucuronide of α-CMBHC (Extended Data Fig. 8). Our software is publicly available (http://bioconductor.org/packages/release/bioc/html/netboost.html) and may be of interest for large-scale integrative omics efforts.
mQTLs illuminate disease-associated molecular mechanisms
Disturbances in metabolite handling can cause disease, as evidenced by inborn errors of metabolism, or can reflect ongoing disease processes. SNPs that were identified in our study were assessed for any association with traits or clinical diagnoses in the UK Biobank45, and then colocalization testing of association signals was carried out (Methods). Numerous significant and colocalizing associations (P < 5 × 10−8, H4 ≥ 0.8; Extended Data Fig. 9 and Supplementary Table 13) (ref. 46) were identified between mQTLs and metabolism-related anthropometric measures (for example, basal metabolic rate, height and fat mass), that are consistent with the idea that urine metabolites represent a readout of systemic metabolic processes. The mQTLs that were identified in this study may explain links between SNPs and traits or diseases (Fig. 6) at the biochemical level. For example, the association signal at ALPL with urinary phosphoethanolamine in this study colocalized with an association with urolithiasis and kidney stones in the UK Biobank. ALPL encodes an ectophosphatase that is expressed in kidney and other tissues, and that catalyzes the reaction of phosphoethanolamine to phosphoric acid, a precursor of phosphate ions47. Higher abundance or activity of this enzyme with each C allele carried at the probable causal 3' UTR SNP rs1772719 (PPA > 0.99) would result in lower phosphoethanolamine and higher than expected phosphate levels in urine, which may explain the higher than normal propensity to stone formation and the observed association between SNPs in ALPL and kidney stones48. This example illustrates the potential to identify novel biomarkers of disease risk, which do not have to be causally linked to the disease.
A second example illustrates the potential to generate new, testable hypotheses about disease associations. The colocalization of association signals for urinary trans-urocanate and skin tanning led us to suggest that HAL could represent a novel skin cancer risk locus. The encoded histidase has a role in the deamination of?-histidine to trans-urocanic acid, which is involved in the response to ultraviolet (UV) light49. Indeed, the allele that was associated with lower skin tanning ability at the HAL index SNP compared to the other allele was associated with a higher risk of skin cancer (melanoma and other malignant neoplasms of skin (C43-C44), P = 6.5 × 10−6). This association would have been missed in GWAS of skin cancers in the UK Biobank. It suggests that there exists a possibility to reduce UV-mediated skin damage by modulation of histidine metabolism in skin. The summary statistics provided by this study allow new hypotheses to be tested with every update of the UK Biobank data or to be tested on GWAS summary statistics from other studies.
With the sample of patients with CKD in this study, a complementary approach was undertaken to evaluate associations with clinical endpoints. We studied the association of the 30 NAT8-associated metabolites and the risk of CKD progression and complications, under the assumption that these associated metabolites capture the detoxification capacity of the kidney, where NAT8 is almost exclusively expressed and where it mediates the generation of water-soluble mercapturic acids23. Detailed results are provided in Supplementary Table 14 and the Supplementary Note, which highlights several significant metabolites that have improved risk prediction for terminal kidney failure.
Finally, Supplementary Table 15 summarizes the findings of this large-scale study, highlights the complementary nature of the analyses, and can serve as a guide to navigate the results.
Discussion
In this study of genetic associations with the urinary concentrations of 1,172 metabolites, 240 mQTLs were identified that implicated 90 unique genes. This represents a fourfold increase over previous mGWAS from urine16–19,50,51, the largest of which identified 22 genes16. Through a complex workflow to characterize replicated loci with complementary methods, our study generated several principal findings.
First, urinary metabolite concentrations provide an integrative readout of ongoing systemic processes related to detoxification and ADME, that were initially studied in the context of pharmaceutical research related to drug metabolism1. However, drug transporters and metabolizing enzymes also have important roles in the handling of many other xenobiotic and endogenous compounds, and contain more relevant family members than was previously recognized52,53. Although 35 of the 90 genes that were identified could already be placed into the context of biotransformation reactions, the remaining genes may represent additional such candidates. The 90 genes were enriched for high expression in tissues that govern ADME components. Within the kidney, enrichment concentrated on proximal tubular cells, where metabolite reabsorption and secretion take place. The high expression of many genes that are involved in phase I, II and III biotransformation reactions specifically in proximal tubular cells implies that these cells may have an under-appreciated role not only in the handling of conjugated molecules generated in the liver, but also in their generation5.
Urine metabolite concentrations also reflect mechanisms of metabolite handling that are specific to kidney cells. For example, rs11133665 in SLC6A19 explained 6% of the variance of urinary tryptophan levels in our study, but was not detected in association with blood tryptophan levels in a similarly sized study8. This is consistent with the experimentally confirmed role of the transporter encoded by this gene in the reabsorption of urinary tryptophan at the apical membrane of proximal tubular cells54, thereby affecting urine concentrations but not affecting blood concentrations. Thus, mGWAS can provide biofluid-specific insights.
Second, as opposed to GWAS of complex traits and diseases that identify mostly variants with regulatory potential, many mQTLs contained causal missense variants of large effect. Integration of the mQTLs with gene expression data in our study often suggested loss of function as the underlying mechanism, as illustrated by CYP2D6 or SLC22A1. Although purifying selection ensures that alleles that cause inborn errors of metabolism are kept at low population frequencies, the causal missense SNPs that were identified in our mGWAS were often common and were also observed in the homozygous state. This is consistent with either an evolutionarily neutral effect, or with a selective advantage that could be related to improved absorption or generation of advantageous metabolites, or to better detoxification or excretory ability for harmful ones. Positive selection has been reported for mQTLs detected in our study, such as at LCT, which encodes lactase, or at DPEP1, which is related to skin pigmentation and which introgressed from Neanderthals55. The large effect of these variants on metabolite levels permitted their detection in a relatively small study. When coupled with large, well-powered studies of genetic associations with diseases such as those made possible by the UK Biobank, our study can uncover potential pathophysiological foundations, as illustrated by ALPL and the risk of kidney stone disease. The results from this study therefore represent a useful annotation resource for GWAS of complex traits and diseases.
Third, several clinically relevant insights have been gained from this study. In general, associations between pharmacogenomics variants and concentrations of endogenous metabolites may highlight urinary biomarkers that can be used to predict treatment responses and side effects. Here, the study of patients with CKD as a metabolic ‘challenge’ also enabled the identification of important enzymes in the metabolism of uremic toxins. It should be noted, however, that greater metabolite coverage and advances in metabolite quantification and genotype imputation may have contributed to the higher number of mQTLs detected in this study, in comparison to previous, population-based studies. CKD affects approximately 10% of the adult population56, and patients with CKD are frequently treated with many medications57. Genetic differences in the expression or function of enzymes or transporters that are related to the metabolism of such medications may therefore become especially apparent in this setting. Lastly, the incorporation of metabolite information that reflects the functions of the kidney beyond filtration, such as N-acetylation capacity, may help to predict the variable course of CKD progression58.
Strengths of this study relate to the unique study population and to the large number of known, and yet-unknown, metabolites quantified from urine that was obtained using standardized procedures for collection, preparation, and storage. Findings were independently replicated, evaluated in a general population sample, and characterized by complementary approaches, including the acquisition of data based on cutting-edge techniques such as single-nucleus RNA-seq. Our new clustering algorithm is freely available. However, some limitations warrant mention. Metabolite quantification was based on a single ‘spot’ urine sample, which prevented us from studying dynamic components of the urinary metabolome. Not all metabolites were available in the general-population-based sample, and metabolites were quantified in the two samples using different mass spectrometers. These differences complicate the interpretation of heterogeneity in genetic effects that were observed at some mQTLs. The absence of an interaction with eGFR at such mQTLs could suggest that the presence of CKD may not be the main driver of effect size differences, but our current study cannot distinguish between reasons for heterogeneity, such as technical factors that are related to metabolite quantification, other differences between the study samples, or collider bias by design. Although it can be concluded that the mQTLs that were discovered in this study replicate among patients with CKD and are also mainly observed among individuals with normal eGFR, a future study of individuals across the entire range of kidney function will be required to assess the reasons for heterogeneity in more detail. Finally, the fact that the identities of many metabolites are unknown as yet also highlights the potential of this study to identify novel substrates of transporters and enzymes that are related to ADME processes and detoxification reactions.
In summary, this study generates a catalog of genes, causal variants, target tissues and clinical associations, and constitutes a comprehensive resource to guide follow-up studies in physiology, basic science, clinical medicine and in the pharmaceutical industry.
Methods
Study design and participants
The GCKD study is an ongoing prospective observational cohort study. Between 2010 and 2012, 5,217 adult patients with CKD under nephrological care provided written informed consent, were enrolled, and are currently being followed for 10 years13. Inclusion criteria were eGFR between 30–60 ml min−1 per 1.73 m2 or an eGFR of >60 ml min−1 per 1.73 m2 with UACR > 300 mg g−1 (or a urinary protein–creatinine ratio of >500 mg g−1). Biomaterials including DNA and urine were collected and were shipped frozen to a central biobank59. A more detailed description of the study design, standard operating procedures, and the recruited study population has been published12,13. The GCKD study was registered in the national registry for clinical studies (DRKS 00003971) and was approved by local ethics committees.
For this project, urine specimens collected at enrollment were selected for metabolite measurements. The discovery sample consisted of 1,221 patients with an eGFR of <45 ml min−1 per 1.73 m2 and the replication sample consisted of 406 patients with an eGFR ≥45 and ≤50 ml min−1 per 1.73 m2; all of those sampled had UACR < 30 mg g−1 to minimize the influence of urinary albumin on metabolite concentrations. A discovery–replication design, which may result in a loss of power to detect genetic associations, was chosen to assess the potential for winner’s curse.
Those individuals examined from the population-based sample participated in SHIP-Trend, a study that was recruited in Western Pomerania, Germany. A detailed description of the study population and design can be found in ref. 26. In total, 4,420 invited subjects participated in the study (a 50.1% response). All participants provided written informed consent. The study was approved by the relevant ethics committee and conformed to the principles of the Declaration of Helsinki. SHIP-Trend data are publicly available upon application (https://www.fvcm.med.uni-greifswald.de/index_engl.html). For a subsample of 1,000 participants, mass spectrometry (MS)-based metabolomics data from urine samples, and genotyping data, were obtained.
Genotyping and imputation
Genotyping and data cleaning in the GCKD study has been described in detail previously41. In brief, genomic DNA from GCKD participants was genotyped at 2,612,357 variants using Illumina Omni2.5Exome BeadChip arrays. Quality control of the Omni2.5 array component included evaluation of the per-individual call rate, a sex check, heterozygosity, cryptic relatedness, and genetic ancestry. On the variant-level, SNPs that had a <96% call rate and that deviated from Hardy–Weinberg equilibrium (HWE) (P < 1 × 10−5) were removed. The cleaned genotype data set contained 5,034 individuals. Genotype imputation was performed using minimac3 v2.0.1 at the Michigan Imputation Server60, using the Haplotype Reference Consortium haplotypes version r1.1 as the reference panel and using Eagle 2.3 for phasing. Bi-allelic imputed genotypes with imputation quality of R2 ≥ 0.3 and a minor allele frequency of ≥1% were retained, which resulted in 7,750,367 high-quality autosomal variants for GWAS.
Genotyping of 986 SHIP-Trend probands was performed using the Illumina HumanOmni2.5-Quad. DNA from whole blood was prepared using the Gentra Puregene Blood Kit (Qiagen) as described by the manufacturer. DNA purity and concentration were determined using a NanoDrop ND-1000 UV-Vis Spectrophotometer (Thermo Scientific), and DNA integrity was validated by electrophoresis using 0.8% agarose-1× TBE buffer gels stained with ethidium bromide. Subsequent sample processing and array hybridization was performed at the Helmholtz Zentrum München as described by the manufacturer (Illumina). Genotypes were called with the GenCall algorithm of GenomeStudio Genotyping Module v1.0. Arrays with a call rate of <94%, duplicate samples by estimated identity-by-descent, and individuals with sex mismatch were excluded. The final mean sample call rate was 99.51%. Imputation of genotypes was performed using IMPUTE v2.2.2 (ref. 61) against the 1000 Genomes Phase I reference panel. There were 667,024 SNPs excluded before imputation (HWE P ≤ 0.0001, call rate ≤0.9, monomorphic) and 1,634 SNPs were excluded after imputation if they had a duplicate reference SNP ID number but had different positions. The total number of SNPs after imputation and quality control was 17,585,496. Variant positions in the GCKD and SHIP-Trend data sets were based on human genome build GRCh37.
Non-targeted mass spectrometry analysis
Non-targeted MS analysis was performed at Metabolon. Sample preparation was carried out as described previously15 and as detailed in the Supplementary Note, which also contains details about the four ultra-high-performance liquid chromatography–tandem mass spectrometry (UPLC–MS/MS) methods as well as about controls and other quality control procedures related to the mass spectrometry analysis. All methods used a Waters ACQUITY UPLC and a Thermo Scientific Q-Exactive high resolution/ accurate mass spectrometer interfaced with a heated electrospray ionization (HESI-II) source and Orbitrap mass analyzer operated at 35,000 mass resolution. The scan range covered 70–1,000 m/z.
Metabolite identification and quantification
Metabolites were identified by automated comparison of the ion features in the experimental samples to a reference library of chemical standard entries that included retention time, molecular weight (m/z), preferred adducts and in-source fragments, as well as associated MS spectra, and were curated by visual inspection for quality control using software developed at Metabolon (see Supplementary Note for additional details). Peaks were quantified using area under the curve. Raw area counts for each metabolite in each sample were normalized to correct for variation that resulted from instrument inter-day tuning differences by the median value for each run day.
Data cleaning of quantified metabolites
An in-house pipeline, as described below, was set up for data quality control, filtering, and normalization of metabolite concentrations. No sample was removed for >50% missing data. Seventy-four (discovery) and 42 (replication) non-xenobiotic metabolites were excluded because of >80% missing values. To account for urine dilution, concentrations of each metabolite were normalized using the probabilistic quotient based on endogenous metabolites with <1% missing values62. Subsequently, metabolites were minimum imputed and median scaled63. Xenobiotic metabolites were not imputed. None of the remaining log2-transformed metabolites was excluded for low variance (<0.01) or for having many outliers (>5% of samples outlying >5 standard deviations (s.d.)). Similarly, no sample represented an outlier of >5 s.d. along the first 10 principal components based on metabolites with complete information. Outlying values (>5 s.d.) for each metabolite were set to missing. Finally, 62 (discovery) and 51 xenobiotic metabolites (replication) with <50 measurements were excluded. Metabolite annotation was aligned between the 2 batches for a common data set of 1,172 metabolites. After removal of 27 samples with missing genotypes, the final data set consisted of 1,221 discovery and 406 replication samples.
Metabolite quantification and genetic associations in SHIP-Trend
Non-targeted metabolomics analysis was conducted at the Genome Analysis Center, Helmholtz Zentrum München. A detailed description of metabolite measurements, annotations and data processing has been published64,65. In brief, two separate LC– MS/MS analytical methods were used to obtain a broad metabolite spectrum in a non-targeted manner according to standard operating procedures at Metabolon. Processing and annotation of the raw spectra was conducted at Metabolon. The raw ion counts for each metabolite measurement in each sample were divided by the median value of the respective run day to reduce technical variation. Metabolites were retained if a valid estimation (>3 observations) of the median within a run day was possible. Subsequently, probabilistic quotient normalization was applied to account for urine dilution. The metabolite values were log2 transformed, followed by a robust multivariate outlier exclusion based on principle component analyses. Finally, a minimum value imputation in case of missing values was applied to all non-xenobiotic metabolites. After preprocessing and omitting participants with either no genetic or no metabolite data, urine levels of 558 metabolites were available for 977 participants.
In this sample, the association between each of the unique 159 SNPs that were identified in the GCKD mGWAS was investigated with the respective metabolite or metabolites, using PLINK v1.90 (ref. 66) and assuming an additive genetic model and adjusting for age, sex, eGFR and the first 3 genetic principal components. Metabolite matches across studies were based on a combination of compound ID, chemical ID or biochemical name (n = 83). Seven further mQTLs, related to five metabolites, were matched by expert knowledge based on mass and retention time.
Definition of additional variables
Serum creatinine was measured using an isotope dilution mass specrometry traceable enzymatic assay (Creatinine plus, Roche). Glomerular filtration rate was estimated using the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) formula14. The UACR was based on creatinine that was measured using the same assay as in serum and albumin with the ALBU-XS assay (Roche/Hitachi Diagnostics).
GWAS of urinary metabolite concentrations
GWAS were based on log2 transformed metabolite concentrations. Similar to previous mGWAS6,7,41, residuals adjusted for age, sex, log(eGFR), log(UACR) and the first three genetic principal components were generated and were used in separate discovery and replication GWAS as described previously (ref. 41), using imputed genotype dosages and an additive genetic model. Discovery and replication summary statistics were quality controlled using GWAtoolbox67, and subsequently a meta-analysis was carried out using a fixed effects inverse variance model as implemented in METAL (25 March 2011 release)11, in which only those metabolites with a minimum combined sample size of 300 were retained. Statistical significance was defined as genome-wide significance (P < 5 × 10−8) in the discovery cohort, a onesided P < 0.05 in the replication cohort, and significance (P < 4.3 × 10−11) in the meta-analysis after correcting for multiple testing by a Bonferroni procedure (5 × 10−8/1,172 metabolites).
Significantly associated SNPs were assigned to mQTLs by selecting, for each metabolite, the SNP with the lowest genome-wide P value as the index SNP, by defining the corresponding locus as a 1 Mb interval centered on the index SNP, and by repeating the procedure until no further genome-wide significant SNP remained. For each metabolite, overlapping intervals were combined into mQTLs. The extended major histocompatibility complex region (chromosome 6, 25.5–34 Mb) was considered as one region. A sensitivity analysis was conducted without adjustment for log(eGFR) and log(UACR) and this produced very similar results (231 out of 232 detected mQTLs overlapped with results from the main analysis; data not shown). For each mQTL, a regional association plot centered on the index SNP was generated using LocusZoom v1.3 (ref. 68). mQTLs were further merged across metabolites into genetic regions if their index SNPs were in linkage disequilibrium (r2 > 0.8). Circular plots of association results were created using Circos v0.69–6 (ref. 69).
Effect estimates of GCKD index SNPs were compared with those for the corresponding metabolite in the SHIP-Trend sample by fitting a linear regression model. In genetic regions with many mQTLs, the mQTL with the lowest P value was used. Effect size heterogeneity between the GCKD and SHIP-Trend samples was evaluated using Cochran’s Q-test for heterogeneity, in which multiple testing was accounted for by use of a Bonferroni correction. mQTLs with evidence of significant heterogeneity were evaluated for the presence of interaction between the SNP and eGFR on metabolite levels in the SHIP-Trend study by inclusion of an interaction term in the regression model, with a Bonferroni correction for the number of tested mQTLs.
Annotation
SNP annotation was performed by querying the SNiPA database v3.3 (released 25 June 2018) (ref. 20), based on the 1000 Genomes phase 3 v5 and Ensembl v87 data sets. The retrieved combined annotation dependent depletion (CADD) score was based on CADD version 1.3. The Ensembl variant effect predictor (VEP) tool was used for the effect prediction of SNPs. SNiPA was used to collect the following annotations for each index SNP and its proxies (r2 ≥ 0.8): gene hit or close-by, regulated genes, CADD score, SnpEff effect impact (exonic and noncoding), mQTL, pQTL, GWAS Catalog, cis-eQTL, disease genes (based on ClinVar, the Online Mendelian Inheritance in Man database (OMIM), the Human Gene Mutation Database (HGMD) and DrugBank). Novelty of the mQTLs and regions that were identified in our screen was assigned based on the presence of SNiPA entries in the ‘mQTL’ and ‘GWAS Catalog’ categories for the respective index SNP and its proxies, as ‘confirmed for urine’, ‘novel for urine’ (but potentially identified in another body fluid) and ‘novel’. The asterisk assignment for genetic regions containing novel substrates was based on string matching of biochemical names within these mQTL and GWAS catalog entries. For index SNPs that were missing in SNiPA, LDlink v3.2.0 (ref. 70) was used to identify the best proxy to be used instead.
To select the most likely causal gene for each index SNP, first the ‘genes’ and ‘evidence’ information based on SNiPA was compiled and colocalization analyses were performed. Index SNPs were then queried for association with differential expression of a nearby gene in tubulo-interstitial kidney portions (cis-eQTL) from 187 patients with CKD using the NephQTL browser39. When one or more eQTL associations with P < 0.05/159 were identified within ±100 kb of each index SNP, colocalization analyses of the mQTL of the respective metabolite(s) and each of the eQTL association(s) were performed, with the region for each colocalization test defined as the eQTL cis window in the underlying study (±500 kb). An adapted version of the Giambartolomei colocalization method46 as implemented in the ‘coloc.fast’ function from the R package ‘gtx’ (https://github.com/tobyjohnson/gtx) with default parameters and prior definitions was used. Positive colocalization was defined as the posterior probability of one common causal variant (H4) of > 0.8 (ref.46). The evidence codes h, r, e, p, m and c, based on SNiPA, correspond to gene hit or close-by, regulated genes, cis-eQTL, pQTL, missense variants, and disease genes based on pathogenic variants known to cause monogenic diseases, respectively. The evidence code o designates genes with evidence for colocalization. Evidence codes were collected and summed for each gene. The gene with the highest sum of scores within each locus was assigned as the most likely causal gene. In the case of ties, genes with evidence for colocalization were prioritized, followed by genes for which an inborn error of metabolism with the corresponding metabolite is known. In all other cases, ties were resolved by prioritizing the closest gene. The final gene list was used as input for downstream gene-based analyses.
Analysis of metabolite ratios
For each metabolite and significant locus, associations between the index SNP and all of the 222,542 pair-wise metabolite ratios were evaluated that contained the respective metabolite and at least 300 measurements. Metabolite ratios were log2-transformed and were regressed on genotypes as in the GWAS. Statistical significance was defined by a P gain threshold of 2,225,420 (222,542 × 10) (ref. 42).
Clustering of metabolites
Log2-transformed metabolites were imputed using a k-nearest neighbor (knn) algorithm with k = 10 (ref. 63). Pair-wise correlations between all metabolites were computed using Spearman correlation coefficients (Supplementary Table 16). Based on the imputed data matrix, the Bioconductor R package Netboost v1.0.0 (ref. 44) was used to partition metabolites into strongly correlated subgroups and to obtain aggregated summary measures. Netboost is an extension of the weighted gene co-expression network analysis methodology (WGCNA) (ref. 71). It is a three-step dimension–reduction procedure; in the first step, a boosting-based filter and a sparse Pearson correlation-based distance matrix between metabolites are used to reduce the network to its essential edges and to remove spurious connections. Next, a sparse version of hierarchical clustering72 and the Dynamic Tree Cut procedure73 are performed to identify subgroups, called modules, from the dendrogram. Finally, module information is summarized by its first principal component, termed the eigenmetabolite of the module71. Details on GWAS of eigenmetabolites are provided in the Supplementary Note.
Curation of genes involved in ADME processes
The list of 298 ADME core genes was obtained from http://pharmaadme.org, accessed 12 December 2018. An additional comprehensive list was manually curated by the identification of additional members of gene families that are known to be involved in phase I, II and III biotransformation reactions, starting with the list of families that was included at https://en.wikipedia.org/wiki/ADME on 23 November 2018. Lastly, all 90 unique genes were evaluated for the presence of publications on their involvement in phase I, II and III biotransformation reactions in a PubMed search in December 2018.
Tissue and kidney RNA-seq data processing
Tissue RNA-seq data were downloaded from the GTEx Portal (https://gtexportal.org; 11,688 tissue samples in V7). We used gene transcripts per million (TPM), median TPM and sample annotation files, and excluded ‘Cells - EBV-transformed lymphocytes’, ‘Cells -transformed fibroblasts’, and brain tissue samples. Genes for which <4 samples had at least one read count per million in the TPM matrix were removed before t statistics were computed for specifically expressed genes.
Kidney single-cell analyses were based on the RNA-seq count matrix of 4,524 single nuclei from a healthy human adult34 and of 43,745 kidney cells from 7 mice33 downloaded from the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo; GSE118184 and GSE107585), following the procedures described in the respective publications. The human data set contained 17 assigned cell types and the mouse data set contained 16 assigned cell types.
Genes expressed specifically within each tissue or cell type were identified by comparing the expression of each gene in the tissue or cell type of interest to all other tissues or cell types in the data set using a linear model as described previously74, adjusting for age and sex in the GTEx data. Genes were then ranked by the resulting t statistics, and the top 10% of genes in each GTEx tissue and kidney cell type were selected for enrichment analyses. Human homologs of mouse genes were obtained by querying Ensembl Biomart using R package biomaRt v2.34.2 (ref. 75).
For heatmap visualization, normalized expression values were averaged for each gene in each kidney cell type, followed by z-score transformation of the averaged expression in each kidney cell type. A heatmap was plotted for genes with a z-score of >2 in at least one cell type.
ADME, GO, KEGG, tissue and kidney cell-type enrichment analyses
The number of independent SNPs per gene was computed based on GCKD genotypes using PLINK v1.90 (ref. 66). A database of Entrez gene identifiers was generated based on the Bioconductor R database org.Hs.eg.db v3.8.2. For each gene the datatabase contained the number of independent SNPs and gene length, as well as its membership in the original ADME list, the lists of GO terms and KEGG pathways, and in the list of the top 10% highly expressed genes in each GTEx V7 tissue, and human and murine kidney cell type. For enrichment testing, the observed number of the 90 identified genes in the ADME list, GO terms and KEGG pathways, and in the top 10% of highly expressed genes in each GTEx tissue and kidney cell type was noted. This was then compared with the number obtained from lists of 90 randomly drawn genes that were matched to the 90 identified genes by deciles of gene length and number of independent SNPs (GO and KEGG: 1 × 107 draws; others: 1 × 108 draws). Multiple testing correction was performed using the Bonferroni method for GTEx tissues and kidney cell types and the Benjamini–Hochberg procedure for GO terms and KEGG pathways.
Independent SNP selection and statistical fine mapping
To find additional, independent signals within the identified mQTL, approximate conditional analyses were carried out that incorporated linkage disequilibrium information that was estimated in the analyzed GCKD sample. Within each replicated mQTL, the GCTA COJO-Slct algorithm36, a step-wise forward selection approach, was used to identify independent genome-wide significant SNPs (P conditional < 4.3 × 10−11), using a collinearity cut-off of 0.1. The beta v1.91.6 selects SNPs based on their test statistics and can therefore accommodate very low P values.
Statistical fine mapping was carried out for all of the independently associated SNPs. For loci that contain multiple independent SNPs, approximate conditional analyses were carried out conditioning on the other independent SNPs in the region using the GCTA COJO-Cond algorithm to estimate conditional effect sizes. Approximate Bayes factors (ABFs) were then derived from conditional estimates based on the formula given in ref. 37. In loci with a single independent SNP, ABFs were based on the original estimates. The s.d. prior was chosen as 0.6122 because 95% of the effect size estimates fell within the −1.2 to 1.2 interval37. The ABF of the SNPs was used to calculate the posterior probability for each variant driving the association signal (PPA, ‘causal variant’). Credible sets were calculated by summing up PPA-ranked variants until the cumulative PPA was >99%.
Credible set variant annotation
Overlap with ENCODE and Roadmap DNaseseq tracks (downloaded 24 April 2018, https://www.encodeproject.org) was added for upper gastrointestinal tract, intestine, kidney, epithelial tissue and liver.
Association with phenotypes and diseases in the UK Biobank
Summary statistics of pre-computed GWAS of 778 traits assessed in the UK Biobank project45 were downloaded from GeneAtlas (http://geneatlas.roslin.ed.ac.uk, accessed 17 December 2018) (ref. 76) for 156 of the 159 unique index SNPs. Three of the SNPs (rs112992077, rs36209093, rs6497490) were not present in the UK Biobank data and did not have good proxies (r2 ≥ 0.8) based on the 1000 Genomes EUR reference samples. Genome-wide significant SNP associations (P < 5 × 10−8) with any of the 778 outcomes represented the basis for integration with mGWAS data. Colocalization analyses of mQTL and UK Biobank associations were performed as was outlined for the cis-eQTL analyses above.
Associations with CKD progression and CKD-related endpoints
Associations between the 30 NAT8-associated metabolites and the prospective endpoints of overall mortality, end-stage kidney disease and major adverse cardiac events over the first 4 years of the GCKD study were evaluated using the Cox proportional-hazards model. Competing events were accounted for when necessary. Models were assessed for predictive performance as outlined in the Supplementary Note.
Reporting Summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability
Meta-analysis summary statistics for SNPs that are associated with any metabolite at P < 5 × 10−8 will be integrated into the SNiPA database with its v3.4 release. Genome-wide summary statistics will be made available publicly through the GWAS catalog (https://www.ebi.ac.uk/gwas). Raw data for Extended Data Fig. 5 are available as source data with the supplementary materials.
Code availability
Each use of software programs has been clearly indicated and information on the options that were used is provided in the Methods section. Source code to call programs is available upon request.
Extended Data
Supplementary Material
Acknowledgements
The work of A.K. was supported by grants KO 3598/3–1 and KO 3598/5–1 (A.K.), Y.L. was supported by grant KO 3598/4–1 (A.K.), the work of M.W., U.T.S., F. Kotsis and A.K. was supported by the Collaborative Research Center (CRC) 1140 project no. 246781735 (A.K.), and the work of P. Schlosser and A.K. was supported by the CRC 992 (A.K.), all German Research Foundation (DFG). U.T.S. and M.W. were also supported by the Else Kroener Fresenius Foundation (NAKSYS project, A.K.). M.K. was funded by the DFG Transregional Collaborative Research Centers (TRR) 152 project no. 239283807 (M.K.) and CRC 1140 project number 246781735 (M.K.). M.K. and G.W. were supported by the Excellence Strategy of the German Federal and State Governments (Center for Integrative Biological Signalling Studies EXC 2189). G.K. was supported by the grants from the National Institute on Aging (NIA): RF1-AG057452–01, RF1-AG059093–01, RF1-AG058942–01 and U01-AG061359–01. Genotyping was supported by Bayer Pharma AG. The GCKD study is supported by the German Ministry of Education and Research (Bundesministerium für Bildung und Forschung, FKZ 01ER 0804, 01ER 0818, 01ER 0819, 01ER 0820 and 01ER 0821) and the KfH Foundation for Preventive Medicine (Kuratorium für Heimdialyse und Nierentransplantation e.V. –Stiftung Präventiv-medizin) and corporate sponsors (www.gckd.org). We are grateful for the willingness of the patients to participate in the GCKD study. The enormous effort of the study personnel of the various regional centers is highly appreciated. We thank the large number of nephrologists who provide routine care for the patients and who collaborate with the GCKD study. The GCKD Investigators are listed in the Supplementary Note. SHIP is part of the Community Medicine Research net of the University of Greifswald, Germany, which is funded by the BMBF (grants 01ZZ9603, 01ZZ0103 and 01ZZ0403), the Ministry of Cultural Affairs as well as the Social Ministry of the Federal State of Mecklenburg-Western Pomerania, and the network ‘Greifswald Approach to Individualized Medicine (GANI_MED)’ funded by the Federal Ministry of Education and Research (grant 03IS2061A). The work of K.S. was supported by the Biomedical Research Program at Weill Cornell Medicine in Qatar, a program funded by the Qatar Foundation. We would like to thank Z. Zheng from GCTA for an update that enables selection of index SNPs based on the X2 statistic, and Elizaveta Freinkman for the extracted ion chromatogram and fragmentation spectra used to confirm the identity of X-13689 as α-CMBHC glucuronide.
Footnotes
Online content
Any methods, additional references, Nature Research reporting summaries, source data, statements of code and data availability and associated accession codes are available at https://doi.org/10.1038/s41588-019-0567-8.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Competing interests
R.P.M. is an employee of Metabolon and, as such, has affiliations with or financial involvement with Metabolon. All other authors have no competing interests.
Additional information
Supplementary information is available for this paper at https://doi.org/10.1038/s41588-019-0567-8.
Correspondence and requests for materials should be addressed to A.K.
Reprints and permissions information is available at www.nature.com/reprints.
References
- 1.Caldwell J, Gardner I & Swales N An introduction to drug disposition: the basic principles of absorption, distribution, metabolism, and excretion. Toxicol. Pathol 23, 102–114 (1995). [DOI] [PubMed] [Google Scholar]
- 2.Köttgen A, Raffler J, Sekula P & Kastenmuller G Genome-wide association studies of metabolite concentrations (mGWAS): Relevance for nephrology. Semin. Nephrol 38, 151–174 (2018). [DOI] [PubMed] [Google Scholar]
- 3.Homuth G, Teumer A, Volker U & Nauck M A description of large-scale metabolomics studies: increasing value by combining metabolomics with genome-wide SNP genotyping and transcriptional profiling. J. Endocrinol. 215, 17–28 (2012). [DOI] [PubMed] [Google Scholar]
- 4.Kalim S & Rhee EP An overview of renal metabolomics. Kidney Int. 91, 61–69 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Nigam SK et al. Handling of drugs, metabolites, and uremic toxins by kidney proximal tubule drug transporters. Clin. J. Am. Soc. Nephrol. 10, 2039–2049 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Suhre K et al. Human metabolic individuality in biomedical and pharmaceutical research. Nature 477, 54–60 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Shin SY et al. An atlas of genetic influences on human blood metabolites. Nat. Genet. 46, 543–550 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Long T et al. Whole-genome sequencing identifies common-to-rare variants associated with human blood metabolites. Nat. Genet. 49, 568–578 (2017). [DOI] [PubMed] [Google Scholar]
- 9.Suhre K, Raffler J & Kastenmuller G Biochemical insights from population studies with genetics and metabolomics. Arch. Biochem. Biophys. 589, 168–176 (2016). [DOI] [PubMed] [Google Scholar]
- 10.Gieger C et al. Genetics meets metabolomics: a genome-wide association study of metabolite profiles in human serum. PLoS Genet. 4, e1000282 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Willer CJ, Li Y & Abecasis GR METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Titze S et al. Disease burden and risk profile in referred patients with moderate chronic kidney disease: composition of the German Chronic Kidney Disease (GCKD) cohort. Nephrol. Dial. Transplant. 30, 441–451 (2015). [DOI] [PubMed] [Google Scholar]
- 13.Eckardt KU et al. The german chronic kidney disease (GCKD) study: design and methods. Nephrol. Dial. Transplant. 27, 1454–1460 (2012). [DOI] [PubMed] [Google Scholar]
- 14.Levey AS et al. A new equation to estimate glomerular filtration rate. Ann. Intern. Med. 150, 604–612 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Evans AM et al. High resolution mass spectrometry improves data quantity and quality as compared to unit mass resolution mass spectrometry in high-throughput profiling metabolomics. Metabolomics 4, 132 (2014). [Google Scholar]
- 16.Raffler J et al. Genome-wide association study with targeted and non-targeted NMR metabolomics identifies 15 novel loci of urinary human metabolic individuality. PLoS Genet. 11, e1005487 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Suhre K et al. A genome-wide association study of metabolic traits in human urine. Nat. Genet. 43, 565–569 (2011). [DOI] [PubMed] [Google Scholar]
- 18.Rueedi R et al. Genome-wide association study of metabolic traits reveals novel gene-metabolite-disease links. PLoS Genet. 10, e1004132 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Nicholson G et al. A genome-wide metabolic QTL analysis in Europeans implicates two loci shaped by recent positive selection. PLoS Genet. 7, e1002270 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Arnold M, Raffler J, Pfeufer A, Suhre K & Kastenmuller G SNiPA: an interactive, genetic variant-centered annotation browser. Bioinformatics 31, 1334–1336 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Draisma HHM et al. Genome-wide association study identifies novel genetic variants contributing to variation in blood metabolite levels. Nat. Commun. 6, 7208 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Pushkin A et al. Structural characterization, tissue distribution, and functional expression of murine aminoacylase III. Am. J. Physiol. Cell Physiol. 286, C848–C856 (2004). [DOI] [PubMed] [Google Scholar]
- 23.Veiga-da-Cunha M et al. Molecular identification of NAT8 as the enzyme that acetylates cysteine S-conjugates to mercapturic acids. J. Biol. Chem. 285, 18888–18898 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Perland E, Bagchi S, Klaesson A & Fredriksson R Characteristics of 29 novel atypical solute carriers of major facilitator superfamily type: evolutionary conservation, predicted structure and neuronal co-expression. Open Biol. 7, 170142 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Ceder MM, Lekholm E, Hellsten SV, Perland E & Fredriksson R The neuronal and peripheral expressed membrane-bound UNC93a respond to nutrient availability in mice. Front. Mol. Neurosci. 10, 351 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Volzke H et al. Cohort profile: the study of health in Pomerania. Int. J. Epidemiol 40, 294–307 (2011). [DOI] [PubMed] [Google Scholar]
- 27.Ashburner M et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet 25,25–29(2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Kanehisa M & Goto S KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Moe OW, Giebisch GH & Seldin DW in Genetic Diseases of the Kidney 1st edn (eds Lifton RP, Somlo S, Giebisch GH & Seldin DW) Ch. 3 (Academic Press, 2009). [Google Scholar]
- 30.Vanholder R et al. Review on uremic toxins: classification, concentration, and interindividual variability. Kidney Int. 63, 1934–1943 (2003). [DOI] [PubMed] [Google Scholar]
- 31.Rhee EP & Thadhani R New insights into uremia-induced alterations in metabolic pathways. Curr. Opin. Nephrol. Hypertens. 20, 593–598 (2011). [DOI] [PubMed] [Google Scholar]
- 32.GTEx Consortium et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Park J et al. Single-cell transcriptomics of the mouse kidney reveals potential cellular targets of kidney disease. Science 360, 758–763 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Wu H et al. Comparative analysis and refinement of human PSC-derived kidney organoid differentiation with single-cell transcriptomics. Cell Stem Cell 23, 869–881 e8 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Lahjouji K et al. Expression and functionality of the Na+/myo-inositol cotransporter SMIT2 in rabbit kidney. Biochim. Biophys. Acta. 1768, 1154–1159 (2007). [DOI] [PubMed] [Google Scholar]
- 36.Yang J, Lee SH, Goddard ME & Visscher PM GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Wakefield J Bayes factors for genome-wide association studies: comparison with P-values. Genet. Epidemiol. 33, 79–86 (2009). [DOI] [PubMed] [Google Scholar]
- 38.Kim HI et al. Fine mapping and functional analysis reveal a role of SLC22A1 in acylcarnitine transport. Am. J. Hum. Genet. 101, 489–502 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Gillies CE et al. An eQTL landscape of kidney tissue in human nephrotic syndrome. Am. J. Hum. Genet. 103, 232–244 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Tazawa S et al. SLC5A9/SGLT4, a new Na+-dependent glucose transporter, is an essential transporter for mannose, 1,5-anhydro-D-glucitol, and fructose. Life Sci. 76, 1039–1050 (2005). [DOI] [PubMed] [Google Scholar]
- 41.Li Y et al. Genome-wide association studies of metabolites in patients with CKD identify multiple loci and illuminate tubular transport mechanisms. J. Am. Soc. Nephrol. 29, 1513–1524 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Petersen AK et al. On the hypothesis-free testing of metabolite ratios in genome-wide and metabolome-wide association studies. BMC Bioinformatics 13, 120 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Schneider MP et al. Blood pressure control in chronic kidney disease: a cross-sectional analysis from the German Chronic Kidney Disease (GCKD) study. PLoS One 13, e0202604 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Schlosser P et al. Netboost: Boosting-supported network analysis improves high-dimensional omics prediction in acute myeloid leukemia and Huntington’s disease. Preprint at https://arxiv.org/abs/1909.12551 (2019). [DOI] [PubMed]
- 45.Bycroft C et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Giambartolomei C et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Fedde KN & Whyte MP Alkaline phosphatase (tissue-nonspecific isoenzyme) is a phosphoethanolamine and pyridoxal-5’-phosphate ectophosphatase: normal and hypophosphatasia fibroblast study. Am. J. Hum. Genet. 47, 767–775 (1990). [PMC free article] [PubMed] [Google Scholar]
- 48.Oddsson A et al. Common and rare variants associated with kidney stones and biochemical traits. Nat. Commun 6, 7975 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Taylor RG, Levy HL & McInnes RR Histidase and histidinemia. Clinical and molecular considerations. Mol. Biol. Med. 8, 101–116 (1991). [PubMed] [Google Scholar]
- 50.Montoliu I et al. Current status on genome-metabolome-wide associations: an opportunity in nutrition research. Genes Nutr. 8, 19–27 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.McMahon GM et al. Urinary metabolites along with common and rare genetic variations are associated with incident chronic kidney disease. Kidney Int. 91, 1426–1435 (2017). [DOI] [PubMed] [Google Scholar]
- 52.Nigam SK What do drug transporters really do? Nat. Rev. Drug Discov. 14, 29–44 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Momper JD & Nigam SK Developmental regulation of kidney and liver solute carrier and ATP-binding cassette drug transporters and drug metabolizing enzymes: the role of remote organ communication. Expert Opin Drug Metab. Toxicol 14, 561–570 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Broer A et al. Molecular cloning of mouse amino acid transport system B0, a neutral amino acid transporter related to Hartnup disorder. J. Biol. Chem. 279, 24467–24476 (2004). [DOI] [PubMed] [Google Scholar]
- 55.Hu Y, Ding Q, He Y, Xu S & Jin L Reintroduction of a homocysteine level-associated allele into east asians by neanderthal introgression. Mol. Biol. Evol. 32, 3108–3113 (2015). [DOI] [PubMed] [Google Scholar]
- 56.Eckardt KU et al. Evolving importance of kidney disease: from subspecialty to global health burden. Lancet 382, 158–169 (2013). [DOI] [PubMed] [Google Scholar]
- 57.Secora A, Alexander GC, Ballew SH, Coresh J & Grams ME Kidney function, polypharmacy, and potentially inappropriate medication use in a community-based cohort of older adults. Drugs Aging 35, 735–750 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Levin A, Djurdjev O, Beaulieu M & Er L Variability and risk factors for kidney disease progression and death following attainment of stage 4 CKD in a referred cohort. Am. J. Kidney Dis 52, 661–671 (2008). [DOI] [PubMed] [Google Scholar]
References
- 59.Prokosch HU et al. Designing and implementing a biobanking IT framework for multiple research scenarios. Stud. Health Technol. Inform. 180, 559–563 (2012). [PubMed] [Google Scholar]
- 60.Das S et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Howie BN, Donnelly P & Marchini J A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Dieterle F, Ross A, Schlotterbeck G & Senn H Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. Anal. Chem. 78, 4281–4290 (2006). [DOI] [PubMed] [Google Scholar]
- 63.Do KT et al. Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies. Metabolomics 14, 128 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Piontek U et al. Sex-specific metabolic profiles of androgens and its main binding protein SHBG in a middle aged population without diabetes. Sci. Rep. 7, 2235 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Knacke H et al. Metabolic fingerprints of circulating IGF-1 and the IGF-1/ IGFBP-3 Ratio: a multifluid metabolomics study. J. Clin. Endocrinol. Metab. 101, 4730–4742 (2016). [DOI] [PubMed] [Google Scholar]
- 66.Chang CC et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Fuchsberger C, Taliun D, Pramstaller PP, Pattaro C & CKDGen consortium. GWAtoolbox: an R package for fast quality control and handling of genome-wide association studies meta-analysis data. Bioinformatics 28, 444–445 (2012). [DOI] [PubMed] [Google Scholar]
- 68.Pruim RJ et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Krzywinski M et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Machiela MJ & Chanock SJ LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants. Bioinformatics 31, 3555–3557 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Langfelder P & Horvath S WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Loewenstein Y, Portugaly E, Fromer M & Linial M Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space. Bioinformatics 24, i41–i49 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Langfelder P, Zhang B & Horvath S Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics 24, 719–720 (2008). [DOI] [PubMed] [Google Scholar]
- 74.Finucane HK et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Durinck S, Spellman PT, Birney E & Huber W Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat. Protoc. 4, 1184–1191 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Canela-Xandri O, Rawlik K & Tenesa A An atlas of genetic associations in UK Biobank. Nat. Genet. 50, 1593–1599 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Meta-analysis summary statistics for SNPs that are associated with any metabolite at P < 5 × 10−8 will be integrated into the SNiPA database with its v3.4 release. Genome-wide summary statistics will be made available publicly through the GWAS catalog (https://www.ebi.ac.uk/gwas). Raw data for Extended Data Fig. 5 are available as source data with the supplementary materials.