Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Mar 1.
Published in final edited form as: Kidney Int. 2023 Nov 23;105(3):582–592. doi: 10.1016/j.kint.2023.11.007

Evaluation of novel candidate filtration markers from a global metabolomic discovery for glomerular filtration rate estimation

Nora Fino 1, Ogechi M Adingwupu 2, Josef Coresh 3, Tom Greene 1, Ben Haaland 1, Michael G Shlipak 4, Veronica T Costa e Silva 5,6, Roberto Kalil 7, Ayse L Mindikoglu 8, Susan L Furth 9, Jesse C Seegmiller 10, Andrew S Levey 2, Lesley A Inker 2
PMCID: PMC10932836  NIHMSID: NIHMS1961943  PMID: 38006943

Abstract

Creatinine and cystatin-C are recommended for estimating glomerular filtration rate (eGFR) but accuracy is suboptimal. Here, using untargeted metabolomics data, we sought to identify candidate filtration markers for a new targeted assay using a novel approach based on their maximal joint association with measured GFR (mGFR) and with flexibility to consider their biological properties. We analyzed metabolites measured in seven diverse studies encompasing 2,851 participants on the Metabolon H4 platform that had Pearson correlations with log mGFR and used a stepwise approach to develop models to < −0.5 estimate mGFR with and without inclusion of creatinine that enabled selection of candidate markers. In total, 456 identified metabolites were present in all studies, and 36 had correlations with mGFR < −0.5. A total of 2,225 models were developed that included these metabolites; all with lower root mean square errors and smaller coefficients for demographic variables compared to estimates using untargeted creatinine. Seventeen metabolites were chosen, including 12 new candidate filtration markers. The selected metabolites had strong associations with mGFR and little dependence on demographic factors. Candidate metabolites were identified with maximal joint association with mGFR and minimal dependence on demographic variables across many varied clinical settings. These metabolites are excreted in urine and represent diverse metabolic pathways and tubular handling. Thus, our data can be used to select metabolites for a multi-analyte eGFR determination assay using mass spectrometry that potentially offers better accuracy and is less prone to non-GFR determinants than the current eGFR biomarkers.

Keywords: glomerular filtration rate, metabolomics, filtration markers

Graphical Abstract

graphic file with name nihms-1961943-f0001.jpg

LAY SUMMARY

Creatinine and cystatin C are recommended for estimating glomerular filtration rate (eGFR), but accuracy is limited by factors other than GFR that affect both markers, especially in patient populations with acute or chronic illnesses. We used untargeted metabolomics from seven different studies with a total of 2,851 participants to develop a total of 2225 models. Each model had 2 to 15 metabolites. All models provided more accurate predictions of measured GFR than a creatinine-only model and were largely robust across demographic factors. From these models, we identified 17 candidate metabolites for evaluation for their analytical properties by our laboratory. Incorporation of the final selected set of metabolites in a targeted multiplex assay and ultimately development and validation of a panel GFR may provide more accurate GFR estimates across health and disease.

INTRODUCTION

Glomerular filtration rate (GFR), the key indicator of kidney function in health and disease, is most commonly estimated from creatinine combined with age and sex (eGFRcr) [1, 2]. Clinical practice guidelines recommend more accurate tests such as GFR measured using clearance of exogenous filtration markers (mGFR) or GFR estimated from the combination of creatinine with cystatin-C (eGFRcr-cys) when eGFRcr is thought to be not sufficiently accurate, or when the clinical decision requires a more accurate assessment of GFR[2]. Measurement procedures for mGFR can be difficult to perform and are not widely accessible to many health systems, and even with the more accurate eGFRcr-cys, there is substantial variation between eGFR and mGFR for individual patients[3], with errors relative to mGFR greater than 30% in 10 to 20% of patients[4]. Notably, higher rates are observed in patients with acute and chronic illness, where clinical decisions based on imprecise GFR can have important effects [4-8]. In addition, the inclusion of age and sex in current GFR equations is undesirable as these coefficient reflect the average effect of age and sex in the populations used to develop the equations, contributing to the individual-level errors.

Errors in eGFR are predominantly due to the presence of non-GFR determinants of the serum concentration of endogenous filtration markers, for example muscle mass for creatinine[1, 9]. In principle, a collection of multiple filtration markers with non-GFR determinants that are largely independent of one another could be used to develop a panel eGFR equation that lowers the importance of any individual marker, consequently reducing the influence of non-GFR determinants on the GFR estimate, improving accuracy for individual patients, and eliminating the need for demographic factors.

Our prior explorations into multi-marker panels have not resulted in eGFR that were substantially more accurate than current eGFR [10-12]. We hypothesize that a greater number of metabolites identified from diverse datasets is required for greater accuracy in GFR estimation, especially to be robust to states of health and disease. Untargeted metabolomics assays, where a broad population of metabolites are assessed and compared across a cohort of subjects using relative quantification, can be used for the identification of a set of candidate filtration markers[13]. Selected metabolites can then be used to develop targeted assays that provide absolute quantitative results for widespread clinical applications (Supplementary Figure S1).

In this paper, our goal was to select candidate markers using untargeted metabolites from a joint analysis of seven diverse studies that measured untargeted metabolites using the same platform. We hypothesized that many combinations of metabolites could form the basis of alternative prediction models with similarly high accuracy. We present a systematic process that allows flexibility in choosing amongst a possible set of metabolites based on relevant biological and chemical characteristics.

METHODS

Data Sources

We included the following seven research studies with existing metabolomics data and available mGFR: African American Study of Kidney (AASK[14]), Assessing Long Term Outcomes in Living Kidney Donors (ALTOLD[15]), Onco-GFR Study (Onco-GFR[16]), Chronic Kidney Disease in Children (CKiD[17]), University of Maryland Baltimore (UMB) Cirrhosis Cohort [18, 19]), Multi-Ethnic Study of Atherosclerosis (MESA[20]), and Modification of Diet in Renal Disease (MDRD [21]) (Supplementary Table S1). GFR was measured using urinary clearance of iothalamate, or plasma clearance of iothalamate, iohexol, or Cr-EDTA. We increased GFR measured using plasma clearance of iohexol by 5% to calibrate to the other methods as has been done previously [4, 22, 23]. All studies were approved by participating institutions’ institutional review boards.

Metabolomics

Global, untargeted metabolomics assays were performed at Metabolon on the H4 platform (Durham, NC, USA) (Supplementary Table S1). We excluded all exogenous metabolites as well as metabolites and samples with high levels of missingness (>80% and >50% missing, respectively). Remaining missing values were imputed with the lower level of detection. We assessed the appropriateness of this imputation in metabolites with >1% missing by predicting the missing values with a process outlined in the Supplementary Methods. We rescaled all data by the batch-specific median and evaluated within and between study-batch effects using principal components analysis [24].

Statistical Analysis

Our goal for this analysis was to rigorously select a set of candidate metabolites for exploration by our central laboratory for potential incorporation into a targeted MS multiplex assay (Supplementary Figure S1). To allow for the possibility that markers in combination outperformed individual markers with stronger associations with mGFR, we did not restrict this analysis to examination of each metabolite in isolation. In addition, because no single model developed here is likely to be the ultimate panel eGFR, we did not solely focus on the best performing (the statistically optimal model) for a given number of metabolites, but rather a collection of models that perform almost as well as the statistical optimal model (near optimal models). We expected many collections of metabolites would be highly associated with mGFR, and this flexibility allowed the ultimate selection of metabolites to be guided by assay characteristics, biological characteristics, and predictive accuracy.

We limited our analysis to named metabolites measured in all seven studies. mGFR was indexed by body surface area and is expressed as ml/min/1.73m2. We log-transformed mGFR and each metabolite. Metabolites with an average Pearson correlation with log mGFR across studies <−0.5 were included as candidates, as strong inverse associations are expected for a filtration marker. In sensitivity analysis, we also evaluated metabolites with an average Pearson correlation <−0.3. Separate linear regression models for mGFR were fit for each individual metabolite, along with age, sex, race (Black vs. other), and body mass index (BMI). To account for the lack of cross-study calibration of untargeted assays, which can have arbitrary shifts in location and scale of metabolite distributions, we included terms for study and metabolite-by-study interactions in all modeling.

We developed a novel stepwise (Supplementary Figure S2) approach using linear regression to generate subsets of metabolites for estimating mGFR, detailed in the Supplemental Methods. Prior simulations suggested that a more accurate eGFR would require 8-10 metabolites; we performed selection up to 15 metabolites to confirm that additional metabolites did not improve predictive error. Briefly, we began by fitting all models for mGFR using a single metabolite and identifying all models with root mean square error (RMSE) within 0.005 of the best-fitting single metabolite model. Then, as we incrementally increased the number of metabolites included in the model from 2 to 15, we found the best new metabolite to add to the best-fitting model from the previous model size. Next, we replaced each metabolite in the present model with an alternative metabolite and determined whether the present model RMSE was within a small margin (0.005) of the current best fitting model for a given model size. Following this approach, we found a collection of near optimal models across a spectrum of subset sizes. We developed models with and without the inclusion of untargeted creatinine as a candidate predictor. Sets of metabolites that do not include creatinine would be more appropriate as confirmatory tests for eGFRcr and may be optimal in people at the extremes of muscle mass or dietary meat intake where eGFRcr accuracy is poor.

We assessed the impact of including demographics and clinical factors or age, sex, race and body mass index alone and in combination. We compared all developed models to models fit with demographics and untargeted creatinine as the sole metabolite (hereon referred to as the reference model). We used untargeted creatinine as a reference because we believe it represents a meaningful comparison to untargeted analyses of other metabolites. To examine the degree of over-optimism in our models, we compared the fitted model RMSEs to cross-validated RMSEs. We also evaluated the performance of the models separately by study.

Our final step was to identify the set of metabolites for evaluation by our laboratory for consideration in development of the multiplex assay. Models that did not show evidence of substantial overfitting in cross-validation were retained for further consideration. We removed from consideration models that had at least one demographic coefficient (age, sex, race, or BMI) with an absolute value >0.1. After applying the above criteria to the set of models with and without creatinine, we selected metabolites present in at least 20% of all models for consideration of assay development. To better understand the metabolites, we also evaluated the partial correlation of each metabolite with untargeted creatinine after adjusting for mGFR and the association of demographic factors to mGFR after adjusting for each metabolite.

RESULTS

Study Populations

Across studies, there were 2,851 participants with a wide range of mGFR (6.1 to 201.8 ml/min/1.73m2), age (2.0 to 91.0 years), BMI (13.1 to 56.3 kg/m2) and proportion of Black individuals (2.3% to 100%) (Table 1). Importantly, we included three populations not well represented in studies developing GFR equations: children with CKD, patients with cirrhosis, and patients with cancer.

Table 1:

Summaries of Patient Characteristics in Included Studies

Study Study description N Age (years) mGFR
(ml/min/1.73m2)
Black Female BMI
(kg/m2)
AASK RCT of CKD progression 962 54.5 (10.6) 45.8 (13.0) 962 (100.0) 373 ( 38.8) 30.6 (6.6)
ALTOLD Cohort study of kidney donor candidates 131 43.7 (11.3) 101.1 (15.3) 3 (2.3) 83 (63.4) 27.0 (4.3)
Onco-GFR Cohort study of patients with solid tumors in Brazil 100 57.2 (13.3) 80.1 (21.1) 14 (14.0) 51 (51.0) 28.3 (6.1)
CKiD Cohort study of children with CKD 613 12.1 (4.3) 54.7 (25.5) 88 (14.4) 236 (38.5) 20.6 (5.7)
Cirrhosis Cohort study of patients with cirrhosis 103 54.5 (8.9) 80.3 (35.3) 26 (25.2) 45 (43.7) 29.5 (6.8)
MDRD RCT of CKD progression 677 52.9 (12.0) 28.9 (13.1) 51 ( 7.5) 256 (37.8) 26.8 (4.3)
MESA General population of older adults 265 70.9 (8.7) 76.3 (20.2) 122 (46.0) 124 (46.8) 29.8 (5.5)

Data are presented as mean (standard deviation) or as N (percent).

Data for ALTOLD represents assessments before donation.

Abbreviations: RCT, Randomized Controlled Trial, BMI, Body Mass Index; CKD, chronic kidney disease; mGFR, measured glomerular filtration rate; AASK, African American Study of Kidney; ALTOLD, Assessing Long Term Outcomes in Living Kidney Donors; Onco-GFR, oncology GFR; CKiD, Chronic Kidney Disease in Children; MESA, Multi-Ethnic Study of Atherosclerosis; MDRD, The Modification of Diet in Renal Disease

A summary of overlapping metabolites by study is given in Supplementary Table S2. There were 456 named metabolites common to all seven studies; 36 of which had a pairwise correlation <−0.5 with mGFR (Figure 1). Missing data for the 36 metabolites are summarized in Supplementary Table S3. No evidence of substantial batch effects by study was found using principal components analysis (Supplementary Figure S3). For metabolites with >1% missingness, we found that predicted missing values were generally small in magnitude and similar to predicted values below the fifth percentile of values in a given metabolite. An example of these results using guanidino succinate, which was 12.5% missing, can be found in Supplementary Figure S4; results of other metabolites with missing data were similar (data not shown). Following these results, we concluded that it was reasonable to impute remaining values with the lower limit of detection. The correlations for each of the 36 metabolites with mGFR as well as demographic coefficients for all models fit with each individual metabolite are shown in Supplementary Table S4.

Figure 1: Flow diagram of metabolite and model selection process.

Figure 1:

Near Optimal Models

Figure 2 shows the RMSE for all near-optimal models by model size. All developed models had lower RMSE compared to the reference model (RMSE 0.258). When creatinine was included as a candidate marker, we found 1,204 models using 36 metabolites. As the number of metabolites included in the model increased, the RMSEs generally decreased: the mean RMSE for two metabolite models was 0.210 (range 0.207 – 0.211), compared to 0.187 for 15 metabolite models (range 0.183 – 0.188). When creatinine was not included, we found 1,021 near-optimal models using 35 metabolites. In general, models without creatinine had higher RMSE than those containing creatinine (mean RMSE 0.193 (range 0.185 – 0.212) vs. mean RMSE 0.191 (range 0.183 – 0.211), respectively), though they become more similar as the number of metabolites increased. Cross-validated RMSE were comparable to original RMSEs when considering models with ≤ 10 metabolites, whereas cross-validated RMSE were generally higher for models with >10 metabolites (shaded band in Figure 2).

Figure 2. Comparison of near-optimal models when untargeted creatinine is included as a candidate (Creatinine Included) versus when untargeted creatinine is excluded (Creatinine Excluded).

Figure 2.

The colored bands show the range of cross-validated (CV) root mean square errors. Points are non-CV model errors. Cross-validated errors were averaged across 10 random iterations of 5-fold cross-validation. Models are fit with study terms but not with demographic variables.

RMSE: Root Mean Square Error

Impact of demographic factors

Across all models, the magnitude of coefficients for age, sex, race and BMI were smaller than their size in the reference equation (Figure 3). Coefficients from models that did not include creatinine were generally smaller in magnitude than models in which creatinine was a candidate predictor. The average coefficients for age, sex, race, and BMI were −0.07, −0.08, −0.05 and 0.02 in models with creatinine, while the average coefficients for age, sex, race, and BMI were −0.05, −0.05,−0.06 and 0.02 in models without creatinine. Including more metabolites did not appreciably reduce the coefficients' size in either set of models. (Supplementary Table S5).

Figure 3. Multivariable coefficients of the demographic terms for near-optimal models of size 4, 6, 8, and 10, both with and without consideration of creatinine.

Figure 3.

Age and BMI coefficients are represented as per 2 standard deviations (SD). Black stars points represent the demographic coefficients for the reference model with creatinine only. When selecting models for further consideration in a targeted assay, we excluded models with any demographic coefficients > 0.1 in absolute value. Coefficients from other model sizes not shown were similar.

When demographic terms and BMI were included in models individually, the inclusion of sex resulted in the largest decrease in RMSE in the creatinine model set (Figure 4, top). Models without creatinine were somewhat more robust to the addition of demographic variables (Figure 4, bottom). Across all model sizes in the creatinine model set, the average RMSEs when age, BMI, and race were included individually were 0.190, 0.191, and 0.190, respectively, while the average RMSE when sex was included was 0.188. In the model set without creatinine, the average RMSEs when age, BMI, race, and sex were included were 0.193, 0.193, 0.193, and 0.192, respectively.

Figure 4: Comparison of near-optimal models after inclusion of age, BMI, race, and race individually.

Figure 4:

Top: models that with creatinine. Bottom: models without creatinine

Comparison of near-optimal models fit with and without creatinine after individual inclusion of age, BMI, race, and race terms.

Models denoted ‘Metabolites Only’ refers to the fact that the near optimal models with no demographic terms included. We added each demographic term individually to assess changes in RMSE after each single demographic term is included. Each near optimal model is connected by semi-transparent lines.

In the creatinine model set, incorporating age and sex together decreased RMSEs compared to the corresponding models without any demographics (average RMSE across all model sizes 0.191 compared to 0.187 when age and sex are included). In contrast, the incorporation of BMI and race did not substantially decrease RMSEs beyond the age and sex-adjusted models (average RMSE 0.186, Supplementary Figure S5 top). In the model set without creatinine, the average RMSE across all model sizes was 0.193. The inclusion of demographics modestly decreased the average RMSE. Both the average RMSEs when age and sex were included together and when all demographics were included was 0.193 (Supplementary Figure S5, bottom).

Variation by study

Model performance varied by study: models performed the best in ALTOLD and MESA and performed the worst in CKiD and the Cirrhosis cohort (Figure 5). Notably, near-optimal models chosen from the entire dataset performed similarly to the models optimized to a given study and outperformed models fit with creatinine as the sole metabolite, especially in CKiD, MESA, and ALTOLD. Results were similar in models without creatinine.

Figure 5: RMSEs of near-optimal models including all studies and models fit separately by study for model sizes of 6, 8, and 10 metabolites, respectively.

Figure 5:

Top: models could include creatinine. Bottom: models exclude creatinine. Green triangles show the RMSE from the reference model containing creatinine, age, and sex. Purple squares represent the best study specific model for a given subset size. The axis for RMSE is shown on the log scale. RMSE: Root Mean Square Error

Selection of candidate metabolites

We selected individual metabolites for further investigation for assay development based on the performance of the cross-validation and multivariable models (Figure 1). We selected the 17 metabolites present in >20% in either set of models to form our list of top-performing metabolites (Table 2 and Supplementary Figure S6). When creatinine was a candidate predictor, it was chosen in 81.5% of models. Two of the most common metabolites in models with and without creatinine were C-glycosyltryptophan (selected in 90.6% and 80.6% of models with and without creatinine, respectively) and gulonic acid (selected in 64.4% and 45.2% of models). Of the 17, five had partial correlation with creatinine of greater than 0.2, and four had coefficients for demographic factors that were greater than 0.1. Selection percentages for all 36 metabolites are given in Supplementary Table S4. Scatterplots with study-wise LOESS curves for the 17 selected metabolites and mGFR are given in Supplementary Figure S7.

Table 2:

Summary of the selective metabolites for consideration for assay development

Metabolite Pathway Models selected,
percent
Correlation Coefficients with demographic and
clinical characteristics
Cr
Included
Cr
Excluded
mGFR Cr, after
adjusting
for mGFR
Age Female Black BMI
C-glycosyltryptophan* Amino Acid 90.60% 80.60% −0.76 0.08 −0.03 −0.03 −0.08 0.04
Creatinine Amino Acid 81.50% - −0.64 - −0.13 −0.19 0.09 0.08
Gulonic acid* Cofactors and Vitamins 64.40% 45.20% −0.6 0.07 −0.07 −0.05 0.02 0.03
3-hydroxy-3-methylglutarate Lipid 63.50% 2.90% −0.58 0.05 −0.04 −0.03 −0.01 0.02
Myo-inositol* Lipid 61.20% 39.60% −0.57 0.1 0.01 −0.06 0.03 0.02
4-acetamidobutanoate* Amino Acid 50.00% 63.10% −0.68 0.22 −0.05 −0.05 −0.02 0.01
Erythronate* Carbohydrate 31.30% 32.90% −0.7 0.19 −0.07 −0.07 −0.04 0.00
3-methylglutarylcarnitine (2)* Amino Acid 29.40% 39.90% −0.53 0.12 −0.07 −0.01 −0.12 0.06
Pseudouridine* Nucleotide 25.50% 42.10% −0.77 0.28 −0.06 −0.07 −0.05 0.03
3-methylglutaconate Amino Acid 25.10% 6.10% −0.53 0.14 −0.11 −0.01 −0.06 0.00
Adipoylcarnitine Lipid 23.60% 12.80% −0.52 0.17 −0.07 −0.08 −0.07 0.08
N-acetylneuraminate* Carbohydrate 23.20% 30.30% −0.7 0.14 −0.1 −0.03 0.02 −0.02
N6-carbamoylthreonyladenosine* Nucleotide 22.20% 33.80% −0.74 0.36 −0.1 −0.09 −0.02 0.06
Arabitol/xylitol Carbohydrate 11.80% 34.00% −0.62 0.12 −0.06 −0.08 −0.06 0.00
Vanillylmandelate (VMA) Amino Acid 11.70% 30.10% −0.61 0.07 0.00 −0.03 −0.06 −0.01
1-methylimidazoleacetate Amino Acid 9.40% 21.90% −0.65 0.22 −0.1 −0.07 −0.07 −0.01
N-acetyl-1-methylhistidine Amino Acid 6.60% 42.70% −0.54 0.24 −0.04 −0.06 0.12 0.07

Abbreviations: Cr, untargeted creatinine; mGFR, measured glomerular filtration rate; BMI, Body Mass Index.

Model selection percentages are the fraction of high-performing models across that the metabolites were chosen both when creatinine was included as a candidate predictor and when creatinine was not included as a candidate predictor.

*

Metabolites was selected in both scenarios, with and without consideration of creatinine.

Metabolites was selected only when creatinine was considered

Metabolites was selected only when creatinine was not considered.

Correlations shown are the Pearson correlations of the metabolites with mGFR and the partial correlations of the metabolites with non-targeted creatinine, after adjusting for mGFR. Demographic coefficients are from models fit with a single given metabolite. Models also include study effects. Age and BMI Coefficients represent a two standard deviation increase. Note these single metabolite coefficients were not used to select metabolites in the final criteria, which used the coefficients from multivariable models. Coefficients in gray are greater than 0.1 in absolute value.

In sensitivity analysis using a different threshold for initial metabolite inclusion (correlation < −0.3), model RMSEs were similar (Supplementary Figure S8), and we selected 12 of the same metabolites (Supplementary Table S6). Two of the additional four metabolites that might have been included had correlation > −0.5 and were therefore eliminated due to other criteria (Figure 1) and two had correlation < −0.3.

DISCUSSION

This paper presented analyses of a global untargeted metabolomics platforms for selection of candidate novel filtration markers. Our results will be used to develop targeted assays on a subset of markers that will ultimately be used to develop a panel estimated GFR that ideally will provide greater accuracy for individual patients. We found many subsets of metabolites with excellent predictions of mGFR, and models outperformed untargeted creatinine-only-based estimates and were largely robust across demographics. Improved accuracy was observed with more metabolites, but cross-validation results suggested diminishing benefits to adding more metabolites beyond ten. Inclusion of sex did meaningfully improve the accuracy of our models, but age, race, and body size did not. By selecting metabolites common to these high performing models, we created a list of metabolites with maximal joint association with mGFR and with flexibility in choosing the final metabolites for the panel eGFR based on biological and assay characteristics.

We hypothesized that to improve upon existing eGFR, we would first need to identify a large set of candidate markers. We identified 17 metabolites that are candidates for possible inclusion in a panel eGFR. To our knowledge, as many as 12 metabolites may not have been specifically identified as filtration markers in prior work (Supplementary Table S7). We had previously identified three of the 17 metabolites: myo-inositol, pseudouridine, arabitol/xylitol, although only pseudouridine had been included in our prior multi-marker panel [10, 25]. In addition, C-glycosyltryptophan (also known as C-mannosyltryptophan) and erythronate had been identified by other investigators [26-28]. All 17 have promise to be excellent filtration markers. Seven of the 17 metabolites had inverse correlations with mGFR of greater magnitude than that of creatinine. In addition to their low molecular weights, all are eliminated in the urine, consistent with metabolite filtration markers. The 17 represent diverse metabolic pathways and tubular handling, hopefully allowing for selection of non-overlapping, non-GFR-related properties of metabolites (Supplementary Table S7).

C-glycosyltryptophan was the top-most selected metabolite in both model sets. It has a higher correlation to mGFR than does creatinine (−0.76 vs −0.64) and results from N-glycosylation, a post-translational modification, of tryptophan [29]. It is not bound to plasma proteins and correlates strongly with measured GFR. However, one study showed it did undergo tubular reabsorption.[30] After creatinine, gulonic acid was the next metabolite selected in the highest proportion of models. Gulonic acid appears to be bound to plasma proteins. It is a product of ascorbate and aldarate metabolism, involved in the conversion of d-gluconic acid to the d-xylose as part of the pento phosphate pathway, and in non-humans to ascorbic acid.[31, 32] Further testing in diverse experimental and clinical settings for the included metabolites is necessary to fully characterize these metabolites and assess the potential non-GFR determinants of these markers.

We hypothesized that the use of multiple markers would obviate the need for demographics and would be more robust to body size and composition. Consistent with this hypothesis, the addition of age, race, and BMI did not meaningfully improve the RMSE of the models that included these variables compared to models that did not include these variables. While race has been excluded from creatinine based GFR equations, its removal introduced a small but persistent bias observed in several populations.[4, 33, 34]. Thus, our demonstration that inclusion of race did not improve performance is an important finding. In contrast, we found that including sex improved RMSE across all models. Recently there have been increased questions about using eGFR equations that include sex, given the increase in gender diversity and the use of gender vs. biological sex in clinical care [35-37], and new equations are available that estimate GFR from cystatin-C that do not include sex[34]. Our results suggest two important points. First, sets of metabolites that did not include creatinine had smaller female vs. male coefficients than sets with creatinine, suggesting that should the ultimately developed panel eGFR not include creatinine, it might be able to exclude sex, whereas a panel eGFR that includes creatinine may need to retain sex. Second, the persistence of the sex coefficient might suggest that biological sex does impact the levels of these markers, and the impact is not related to body size, unlike what has been hypothesized for creatinine. The cause of the sex differences here is not well understood and require future investigation. Future studies can explore the impact of hormonal and non-hormonal cause on the observed differences which could have wider implications such as in the care of individuals in the transgender community or the impact of menopause and hormone replacement therapy on health.

Importantly, despite substantial variation in relative metabolite quantities by study, the models developed in the pooled dataset performed similarly to models optimized for individual studies. This suggests that models developed in the pooled dataset are similar to models developed within each study, highlighting the generalizability of our models. In all studies, our models showed better performance compared to estimates based on GFR estimates from creatinine alone. Despite this consistency, we also noted that the accuracy of the models varied by study. Improvements were particularly notable in CKiD, ALTOLD, and MESA, suggesting that the greatest benefits of a panel eGFR could be seen in children or individuals with relatively healthy kidneys. Future work should investigate causes of these differences.

Our approach had several strengths. First, we identified novel candidate filtration markers using untargeted metabolomics and their relation to mGFR., many of which were not previously noted to be filtration markers [28]. Second, we combined data from 2,851 participants in seven diverse data sources across age, disease, and geography. This diversity allowed us to analyze a wide range of mGFR, from 6 to 202 ml/min/1.73m2, and across health and disease, increasing the likelihood of a novel panel eGFR to be accurate across clinical settings. Third, our approach provided flexibility in selecting metabolites, considering not only predictive accuracy but also biological and assay characteristics. A central challenge was how to estimate the degree of overfitting and optimism when fitting many prediction models. We developed an envelope of cross-validated RMSEs that represented the range of the generalizable predictive accuracy for each model size.

Limitations of this analysis include that the observed results for individual metabolites assayed in the global untargeted platform might not be consistent with targeted assays. However, improved precision is expected with targeted assays [10]. The cross-sectional study design also limits our conclusions. Repeated assessments of the metabolites would allow us to account for possible measurement error in our analysis, as well as investigate changes in the metabolites over time. Additionally, we limited our analysis to metabolites measured in all seven studies; it’s possible that there are important metabolites omitted from our results. Additionally, across all studies there were 68 unknown metabolites (7.4% of the unknown metabolites) with a correlation with mGFR greater than our threshold of >0.5 in absolute value. Future work could include investigating these metabolites in greater detail. Nevertheless, our models were able to predict GFR consistently across numerous collections of metabolites, so it is unlikely that a single omitted metabolite would meaningfully improve accuracy. We were limited to using dichotomous race (Black vs. other) and did not include a more refined assessment of race, ethnicity and geographical groups. However, previous work showed that using a four-level race term did not improve prediction beyond that of a dichotomous race term, though this finding may be limited to data from the United States and Europe [38]. Additionally, the inclusion of patients with cancer from South America and the consistency of results within this cohort might suggest our results are more robust to geographic region and disease. Lastly, we did not include cystatin-C in our analysis, which limits our ability to compare to eGFRcr-cys. Our goal was to identify set of metabolites that could be included in targeted MS multiplex assay; cystatin-C, as a low molecular weight protein, would not be included in such an assay as it would only include metabolites that could be assayed together. Future work will incorporate cystatin-C using standard clinical chemistry assays in conjunction with the multiplex metabolite assay. Regardless of the inclusion of cystatin-C in the panel eGFR algorithm, the most relevant future comparison of panel eGFR panel will be to eGFRcr-cys, the current most accurate eGFR.

In conclusion, these data represent the first step towards our ultimate goal of developing a more accurate panel eGFR across a range of health and disease, with less reliance on demographic terms than the current eGFR. This new panel is intended as an independent complementary test for eGFRcr-cys. Next steps include exploring the physical and biological properties of candidate metabolites for final selection and developing and validating the panel GFR for analytical precision and performance compared to mGFR in various clinical settings.

Supplementary Material

1
2

ACKNOWLEDGEMENTS

We would like to acknowledge collaborators of participating studies. Assessing Long Term Outcomes in Living Kidney Donors (ALTOLD): Bertram Kasiske, Matthew Weir, Todd Pesavento; Multi Ethnic Study of Atherosclerosis (MESA): Tariq Shafi, Wendy Post, Peter Rossing; Onco-GFR Study: Emmanuel de Almeida Burdmann, Renato Antunes Caires; Chronic Kidney Disease in Children (CKiD): Bradley Warady, Alvaro Munoz, Derek Ng, George Schwartz; University of Maryland Baltimore CirrhosisCohort: Laurence S. Magder, Thomas C. Dowling, Matthew R. Weir, Robert H. Christenson.

PRIMARY FUNDING

The measurements and analyses were supported by grants from the National Institute of Diabetes and Digestive and Kidney Diseases grant 1R01DK116790 to Tufts Medical Center.

DISCLOSURES

Dr. Inker reports receiving grant and contracts to institution from NIH/NIDDK (1R01DK116790), NKF, Chinnocks, Omeros and Reata Pharmaceuticals; consultancy fees from Tricida Inc. and Diamtrix; participation on medical advisory council for Alport foundation and scientific advisory board for NKF. Dr. Furth reports receiving grant from the NIH and participation on the advisory board at Genentech. Dr. Levey reports receiving grant to institution from the NKF and NIH, honoraria for academic lectures and participation on advisory board for clinical trials of dapagliflozin at AstraZeneca. Dr. Shlipak reports receiving grants and contracts to institution from Bayer Pharmaceuticals, NIH-NIA, NIDDK, NHLBI; consultancy fees from Cricket Health and Intercept Pharmaceuticals; honoraria from Boehringer Ingelheim, AstraZeneca and Bayer Pharmaceuticals; stock options at TAI Diagnostics and Chairman of the Board of Directors for NCIRE. Dr. Greene reports receiving contracts from Boehringer Ingelheim, AstraZeneca, CSL and Vertex, and consultancy fees from Invokana, Pfizer, Novartis, AstraZeneca and Janssen Pharmaceuticals. Dr. Seepgmiller reports receiving subcontracts from the NIH and CDC and participation on data safety and monitoring board for HIV in Nigeria (H3). Dr. Costa e Silva reports funding from FAPESP (Fundação de Amparo à Pesquisa do Estado de São Paulo, Project Number 2014/19286-4), a local public research agency from São Paulo, Brazil. Dr. Mindikoglu reports funding from NIH-NIDDK (5 K23 DK089008-05) and the University of Maryland School of Medicine, Department of Medicine funds, University of Maryland Clinical Translational Science Institute, University of Maryland General Clinical Research Center, and NIH Public Health Service grant P30DK056338. Other authors had no financial disclosures.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

DATA SHARING STATEMENT

Data of African American Study of Kidney (AASK), Assessing Long Term Outcomes in Living Kidney Donors Study (ALTOLD), and Modification of Diet in Renal Disease Study (MDRD) is available at NIH Common Fund's National Metabolomics Data Repository (NMDR) website, the Metabolomics Workbench, https://www.metabolomicsworkbench.org where it has been assigned Project ID PR001762 and DOI: http://dx.doi.org/10.21228/M8ZB1D. Data of Multi Ethnic Study of Atherosclerosis (MESA) can be requested through the study website https://www.mesa-nhlbi.org/MesaInternal/Publications.aspx. Data of Chronic Kidney Disease in Children (CKiD) can be requested through the study website https://statepi.jhsph.edu/ckid/investigator-resources/. Data of Onco-GFR Study and University of Maryland Baltimore Cirrhosis Cohort can be requested directly from study investigators (Ayse Mindikoglu (UMB) and Veronica Costa e Silva (Onco-GFR)). R Code for the stepwise procedure is included in the supplementary materials.

REFERENCES

  • 1.Levey AS, et al. , Measured and estimated glomerular filtration rate: current status and future directions. Nat Rev Nephrol, 2020. 16(1): p. 51–64. [DOI] [PubMed] [Google Scholar]
  • 2.Outcomes, K.D.I.G. and C.W. Group, KDIGO 2012 clinical practice guideline for the evaluation and management of chronic kidney disease. Kidney Int, 2013. 3(1): p. 1–150. [DOI] [PubMed] [Google Scholar]
  • 3.Shafi T., et al. , Quantifying individual-level inaccuracy in glomerular filtration rate estimation: A cross-sectional study. Annals of internal medicine, 2022. 175(8): p. 1073–1082. [DOI] [PubMed] [Google Scholar]
  • 4.Inker LA, et al. , New Creatinine- and Cystatin C-Based Equations to Estimate GFR without Race. N Engl J Med, 2021. 385(19): p. 1737–1749. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Levey AS, et al. , A new equation to estimate glomerular filtration rate. Ann Intern Med, 2009. 150(9): p. 604–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Levey AS, Becker C, and Inker LA, Glomerular filtration rate and albuminuria for detection and staging of acute and chronic kidney disease in adults: a systematic review. JAMA, 2015. 313(8): p. 837–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Kervella D., et al. , Cystatin C versus creatinine for GFR estimation in CKD due to heart failure. American Journal of Kidney Diseases, 2017. 69(2): p. 321–323. [DOI] [PubMed] [Google Scholar]
  • 8.Torre A., et al. , Creatinine versus cystatin C for estimating GFR in patients with liver cirrhosis. American Journal of Kidney Diseases, 2016. 67(2): p. 342–344. [DOI] [PubMed] [Google Scholar]
  • 9.Levey AS, et al. , A more accurate method to estimate glomerular filtration rate from serum creatinine: a new prediction equation. Annals of internal medicine, 1999. 130(6): p. 461–470. [DOI] [PubMed] [Google Scholar]
  • 10.Coresh J., et al. , Metabolomic profiling to improve glomerular filtration rate estimation: a proof-of-concept study. Nephrol Dial Transplant, 2019. 34(5): p. 825–833. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Freed TA, et al. , Validation of a metabolite panel for a more accurate estimation of glomerular filtration rate using quantitative LC-MS/MS. Clinical chemistry, 2019. 65(3): p. 406–418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Inker LA, et al. , A New Panel-Estimated GFR, Including beta2-Microglobulin and beta-Trace Protein and Not Including Race, Developed in a Diverse Population. Am J Kidney Dis, 2021. 77(5): p. 673–683 e1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Roberts LD, et al. , Targeted metabolomics. Current protocols in molecular biology, 2012. 98(1): p. 30.2. 1–30.2. 24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wright JT Jr., et al. , Effect of blood pressure lowering and antihypertensive drug class on progression of hypertensive kidney disease: results from the AASK trial. JAMA, 2002. 288(19): p. 2421–31. [DOI] [PubMed] [Google Scholar]
  • 15.Kasiske BL, et al. , A prospective controlled study of living kidney donors: three-year follow-up. Am J Kidney Dis, 2015. 66(1): p. 114–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Costa ESVT, et al. , A prospective cross-sectional study estimated glomerular filtration rate from creatinine and cystatin C in adults with solid tumors. Kidney Int, 2022. 101(3): p. 607–614. [DOI] [PubMed] [Google Scholar]
  • 17.Schwartz GJ and Furth SL, Glomerular filtration rate measurement and estimation in chronic kidney disease. Pediatr Nephrol, 2007. 22(11): p. 1839–48. [DOI] [PubMed] [Google Scholar]
  • 18.Mindikoglu AL, et al. , Unique metabolomic signature associated with hepatorenal dysfunction and mortality in cirrhosis. Transl Res, 2018. 195: p. 25–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Mindikoglu AL, et al. , Estimation of Glomerular Filtration Rate in Patients With Cirrhosis by Using New and Conventional Filtration Markers and Dimethylarginines. Clin Gastroenterol Hepatol, 2016. 14(4): p. 624–632 e2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Inker LA, et al. , Effects of Race and Sex on Measured GFR: The Multi-Ethnic Study of Atherosclerosis. Am J Kidney Dis, 2016. 68(5): p. 743–751. [DOI] [PubMed] [Google Scholar]
  • 21.Klahr S., et al. , The effects of dietary protein restriction and blood-pressure control on the progression of chronic renal disease. Modification of Diet in Renal Disease Study Group. N Engl J Med, 1994. 330(13): p. 877–84. [DOI] [PubMed] [Google Scholar]
  • 22.Inker LA, et al. , A new panel-estimated GFR, including 62-microglobulin and β-trace protein and not including race, developed in a diverse population. American Journal of Kidney Diseases, 2021. 77(5): p. 673–683. e1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Inker LA, et al. , Performance of glomerular filtration rate estimating equations in a community-based sample of Blacks and Whites: the multiethnic study of atherosclerosis. Nephrol Dial Transplant, 2018. 33(3): p. 417–425. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Han W and Li L, Evaluating and minimizing batch effects in metabolomics. Mass Spectrom Rev, 2022. 41(3): p. 421–442. [DOI] [PubMed] [Google Scholar]
  • 25.Freed TA, et al. , Validation of a Metabolite Panel for a More Accurate Estimation of Glomerular Filtration Rate Using Quantitative LC-MS/MS. Clin Chem, 2019. 65(3): p. 406–418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Sekula P., et al. , A Metabolome-Wide Association Study of Kidney Function and Disease in the General Population. J Am Soc Nephrol, 2016. 27(4): p. 1175–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Takahira R., et al. , Tryptophan glycoconjugate as a novel marker of renal function. Am J Med, 2001. 110(3): p. 192–7. [DOI] [PubMed] [Google Scholar]
  • 28.Cheng Y., et al. , The relationship between blood metabolites of the tryptophan pathway and kidney function: a bidirectional Mendelian randomization analysis. Scientific Reports, 2020. 10(1): p. 12675. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Gutsche B., et al. , Tryptophan glycoconjugates in food and human urine. Biochem J, 1999. 343 Pt 1: p. 11–9. [PMC free article] [PubMed] [Google Scholar]
  • 30.Sekula P., et al. , From Discovery to Translation: Characterization of C-Mannosyltryptophan and Pseudouridine as Markers of Kidney Function. Sci Rep, 2017. 7(1): p. 17400. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Linster CL and Van Schaftingen E, Vitamin C. Biosynthesis, recycling and degradation in mammals. Febs j, 2007. 274(1): p. 1–22. [DOI] [PubMed] [Google Scholar]
  • 32.Database, K.C. COMPOUND: C00800. 2022. [cited 2023 Aug 3]; Available from: https://www.kegg.jp/entry/C00800.
  • 33.Hsu CY, et al. , Race, Genetic Ancestry, and Estimating Kidney Function in CKD. N Engl J Med, 2021. 385(19): p. 1750–1760. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Pottel H., et al. , Cystatin C–Based Equation to Estimate GFR without the Inclusion of Race and Sex. New England Journal of Medicine, 2023. 388(4): p. 333–343. [DOI] [PubMed] [Google Scholar]
  • 35.Collister D., et al. , Providing Care for Transgender Persons With Kidney Disease: A Narrative Review. Can J Kidney Health Dis, 2021. 8: p. 2054358120985379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Gandhi P, Medeiros E, and Shah AD, Physiology or Pathology? Elevated Serum Creatinine in a Female-to-Male Transgender Patient. Am J Kidney Dis, 2020. 75(4): p. A13–a14. [DOI] [PubMed] [Google Scholar]
  • 37.Whitley CT and Greene DN, Transgender Man Being Evaluated for a Kidney Transplant. Clin Chem, 2017. 63(11): p. 1680–1683. [DOI] [PubMed] [Google Scholar]
  • 38.Stevens LA, et al. , Evaluation of the CKD-EPI equation in multiple races and ethnicities. Kidney international, 2011. 79(5): p. 555. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2

Data Availability Statement

Data of African American Study of Kidney (AASK), Assessing Long Term Outcomes in Living Kidney Donors Study (ALTOLD), and Modification of Diet in Renal Disease Study (MDRD) is available at NIH Common Fund's National Metabolomics Data Repository (NMDR) website, the Metabolomics Workbench, https://www.metabolomicsworkbench.org where it has been assigned Project ID PR001762 and DOI: http://dx.doi.org/10.21228/M8ZB1D. Data of Multi Ethnic Study of Atherosclerosis (MESA) can be requested through the study website https://www.mesa-nhlbi.org/MesaInternal/Publications.aspx. Data of Chronic Kidney Disease in Children (CKiD) can be requested through the study website https://statepi.jhsph.edu/ckid/investigator-resources/. Data of Onco-GFR Study and University of Maryland Baltimore Cirrhosis Cohort can be requested directly from study investigators (Ayse Mindikoglu (UMB) and Veronica Costa e Silva (Onco-GFR)). R Code for the stepwise procedure is included in the supplementary materials.

RESOURCES