ABSTRACT
Background
Protein biomarkers may provide insight into kidney disease pathology but their use for the identification of phenotypically distinct kidney diseases has not been evaluated.
Methods
We used unsupervised hierarchical clustering on 225 plasma biomarkers in 541 individuals enrolled into the Boston Kidney Biopsy Cohort, a prospective cohort study of individuals undergoing kidney biopsy with adjudicated histopathology. Using principal component analysis, we studied biomarker levels by cluster and examined differences in clinicopathologic diagnoses and histopathologic lesions across clusters. Cox proportional hazards models tested associations of clusters with kidney failure and death.
Results
We identified three biomarker-derived clusters. The mean estimated glomerular filtration rate was 72.9 ± 28.7, 72.9 ± 33.4 and 39.9 ± 30.4 mL/min/1.73 m2 in Clusters 1, 2 and 3, respectively. The top-contributing biomarker in Cluster 1 was AXIN, a negative regulator of the Wnt signaling pathway. The top-contributing biomarker in Clusters 2 and 3 was Placental Growth Factor, a member of the vascular endothelial growth factor family. Compared with Cluster 1, individuals in Cluster 3 were more likely to have tubulointerstitial disease (P < .001) and diabetic kidney disease (P < .001) and had more severe mesangial expansion [odds ratio (OR) 2.44, 95% confidence interval (CI) 1.29, 4.64] and inflammation in the fibrosed interstitium (OR 2.49 95% CI 1.02, 6.10). After multivariable adjustment, Cluster 3 was associated with higher risks of kidney failure (hazard ratio 3.29, 95% CI 1.37, 7.90) compared with Cluster 1.
Conclusion
Plasma biomarkers may identify clusters of individuals with kidney disease that associate with different clinicopathologic diagnoses, histopathologic lesions and adverse outcomes, and may uncover biomarker candidates and relevant pathways for further study.
Keywords: biomarkers, cluster, histopathology, kidney biopsy, kidney disease, proteomics
Graphical Abstract
Graphical Abstract.
INTRODUCTION
Chronic kidney disease (CKD) affects over 10% of the global population and is a major contributor to premature death and disability worldwide [1]. CKD is not a single entity but rather a highly heterogenous condition, encompassing a variety of distinct clinical phenotypes that arise from different underlying pathologic mechanisms. Current staging of CKD relies on the estimated glomerular filtration rate (eGFR) and albuminuria, both of which provide only limited information about underlying disease etiology or histopathologic manifestations of the disease.
Novel biomarkers may provide non-invasive insights into disease mechanisms and pathobiologic processes responsible for CKD. In a previous study, we identified individual biomarkers associated with histopathologic changes and the risks of CKD progression and death in patients with biopsy-confirmed kidney disease [2]. The heterogeneity of CKD, however, suggests that the combined information from multiple biomarkers, which reflect different pathobiologic processes and pathways, could aid with non-invasive phenotyping and enhance our understanding of underlying mechanisms of the disease.
Studies in other disease settings such as cancer [3], heart failure [4] and diabetes [5] have applied unsupervised machine learning algorithms to identify disease subtypes and predict clinical outcomes. In the setting of CKD, unsupervised consensus clustering using traditional risk factors, including baseline eGFR and proteinuria, identified distinct CKD subpopulations that had significantly different risks of kidney disease progression, cardiovascular events and death [6]. Few studies have used unsupervised machine learning algorithms to classify patients using multiple biomarker–generated phenotypes [7–9].
In this study, we used an unsupervised clustering approach on 225 plasma biomarkers measured in a prospective cohort study of individuals with biopsy-confirmed kidney diseases and adjudicated semiquantitative assessment of histopathology. We examined contributions of biomarkers to each cluster and tested associations of cluster membership with histopathologic lesions, clinicopathologic diagnoses and the risks of future kidney failure and death. We used pathway analyses to gain insights into differences in cluster-specific pathway activities.
MATERIALS AND METHODS
Study population
The Boston Kidney Biopsy Cohort (BKBC) is a prospective, observational cohort study of patients undergoing native kidney biopsy at three tertiary care hospitals in Boston. Details of the study design have been previously described [10]. The study includes adults ≥18 years of age who underwent a clinically indicated native kidney biopsy between September 2006 and October 2018. Exclusion criteria were the inability to provide written consent, severe anemia, pregnancy and enrollment in competing studies. Patients provided blood and urine samples on the day of kidney biopsy. For this study, we evaluated 541 participants with available plasma samples. The Partners Human Research Committee (the Brigham and Women's Hospital Institutional Review Board) approved the study protocol which is in accordance with the principles of the Declaration of Helsinki.
Sample collection, proteomics assays and exposures
Blood samples were collected from study participants on the day of biopsy, aliquoted and immediately stored at −80°C. Aliquots were analyzed at Olink using high-throughput, multiplex immunoassays [2, 11] on three commercially available panels named Inflammation, Organ Damage and Cardiovascular II. Each panel consists of 92 biomarker proteins that were chosen based on their potential relevance in various pathological processes and expressed as normalized protein expression (NPX) values on a log2 scale. We included 5% blind split replicates in addition to BKBC samples. Of the 276 proteins included in the three panels, we used 225 biomarkers as the primary exposures for statistical analyses that were non-overlapping across the panels and passed the following quality control metrics: coefficients of variation (CV) <10% from blind split replicates; standard deviation of internal Olink controls <0.2; and incubation or detection control which deviated less than ±0.3 from the median value of all samples on the plate. As an additional quality control, we included multiple plasma aliquots from two patients, one with high and one with low eGFR, which were spread randomly in dummy labeled tubes across the shipment boxes. The mean CVs were 5.5 ± 4.3% and 4.9 ± 4.6%, respectively, for the 225 biomarkers.
Histopathologic outcomes
Kidney biopsy specimens were adjudicated under light microscopy by two experienced kidney pathologists who provided semiquantitative scores of kidney inflammation, fibrosis, vascular sclerosis, and acute tubular injury (Supplementary data, Table S1). Methods to evaluate and score histopathologic lesions were previously described in detail [2, 10]. The adjudication of histopathologic lesions by the two kidney pathologists was performed between 2014 and 2017. Of the 13 histopathologic lesions adjudicated, all were scored during study sessions except for grades of global or segmental glomerulosclerosis, which were taken from the biopsy report, because they were each calculated as a percentage of the total number of glomeruli. We limited statistical analyses on histopathologic lesions to participants with adjudicated histopathology by both kidney pathologists (n = 474, 87.6%) except for analyses of global or segmental glomerulosclerosis since they were taken from the biopsy report. For regression analyses, histopathologic lesions were dichotomized as described in Supplementary data, Table S1. We combined endocapillary glomerular inflammation, extracapillary cellular crescents, focal glomerular necrosis and fibrocellular crescents into a single dichotomous variable named “glomerular inflammation” due to the relatively low prevalence and limited range of severity for each of those lesions in this cohort. All participants’ charts were reviewed alongside histopathologic evaluations to provide the final primary clinicopathologic diagnosis.
Clinical outcomes
The primary outcome was kidney failure, defined as the initiation of dialysis or kidney transplantation. The secondary outcome was death. To ascertain information on vital status, change in creatinine or need for dialysis, we reviewed the electronic medical record (EMR) of the respective hospital as well as other linked EMR systems. Data on eGFR during follow-up were obtained from the EMR and kidney replacement therapy status was confirmed by reviewing the EMR and linkage with the US Renal Data System database. Mortality status was confirmed with the Social Security Death Index. Participants were followed until the occurrence of death, voluntary study withdrawal, loss to follow-up or 1 February 2020.
Covariates
Detailed patient information was collected at the biopsy visit, including demographics, medical history, medication lists and pertinent laboratory data, and stored using REDCap electronic data capture tools hosted at Partners Health Care. We obtained serum creatinine (SCr) from the EMR on the day of biopsy. In participants for whom this was unavailable, we measured SCr in available blood samples collected on the day of biopsy. We obtained spot urine protein-to-creatinine ratio (UPCR) or urine albumin-to-creatinine ratio (UACR) from the date of kidney biopsy to 3 months before biopsy from the EMR. If both were available, the UACR was used. If a participant did not have any of these values, we measured urine albumin-to-creatinine ratio from urine collected on the day of the kidney biopsy. SCr and urine creatinine were measured using a Jaffe-based method and urine albumin was measured by an immunoturbidometric method. The creatinine-based Chronic Kidney Disease Epidemiology Collaboration equation without Race (2021) was used to calculate the eGFR [12].
Construction of clusters and principal component analysis
We performed unsupervised hierarchical clustering on 225 biomarkers to partition subjects based on the totality of biomarker information. This approach did not use clinical characteristics or subsequent outcomes and was based only on the biomarker data. Clustering was performed using the hclust function in R. We used the Euclidean distance measure and Ward's Minimum Variance method for combining clusters by minimizing the total within-cluster variance, and determined the optimal number of clusters using the R package NbClust [13]. We aggregated information across 19 of NbClust's 30 different indices, which each recommended an optimal number of clusters, and used majority voting to determine a single value for the optimal number of clusters (Supplementary data, Table S2). Once clusters were defined, we used principal components analysis (PCA) to determine the individual contribution of biomarkers to the formation of each cluster. The contribution of a biomarker to a principal component is:
, where
is the eigenvalue for the jth principal component and
is the factor score of the ith observation on the jth principal component. A factor score is an observation's coordinates on the principal components. Contribution represents the proportion of the PC that is determined by that biomarker. We performed a PCA on each cluster using the 225 biomarker measurements and assessed each biomarker's contribution to the first five PCs. The total contribution of a biomarker to the first five PCs, explaining the variation in the first five principal components, is given by the following equation:
, where PCi is the ith PC,
is the biomarker contribution to PCi, and
is the eigenvalue for PCi.
Pathway analysis
To explore differences in enriched biological pathways between clusters, we used the Gene Set Variation Analysis (GSVA) tool. GSVA is a non-parametric, unsupervised method for estimating pathway enrichment using hallmark gene sets which represent specific biological processes. Results are obtained as GSVA scores which are calculated using the Klimigrov random walk statistic [14]. For these analyses, we ranked biomarkers in each cluster according to the magnitude of the coefficient obtained from PCA and submitted these data to the GSVA tool to examine differences in pathway activities between clusters. Hallmark gene sets from Molecular Signatures Database (MSigDB) [15] were adopted to summarize differences in pathway activities between clusters. Analyses were performed using the GSVA v.1.32.0 R package.
Statistical analysis
We summarized descriptive statistics by cluster as count with percentages for categorical variables and mean ± standard deviation or median with interquartile range (IQR) for continuous variables. For skewed data distributions, we performed natural logarithmic transformation as appropriate. To assess differences in clinicopathologic diagnoses by cluster membership, we used chi-squared tests. Unadjusted and adjusted multivariable logistic regression models were used to assess associations of each cluster with histopathologic lesions.
We performed time-to-event analyses to examine the associations of clusters with kidney failure and death. Cox proportional hazard models were adjusted for age, race, sex, log(proteinuria), primary clinicopathologic diagnostic category of kidney disease and eGFR (modeled as a time-varying variable). Statistical analyses were performed using R Version 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria) and STATA 15.0 (STATACorp, College Station, TX, USA). A P-value <.05 was considered statistically significant.
RESULTS
Cluster analysis and baseline characteristics by cluster membership
The optimal number of clusters in this unsupervised hierarchical cluster analysis was three (Fig. 1). Table 1 summarizes the baseline characteristics of the study cohort by cluster. The mean age was 47.9 ± 15.0 years in Cluster 1, 50.0 ± 16.6 years in Cluster 2, and 54.7 ± 16.7 years in Cluster 3. In Clusters 1, 2 and 3, respectively 49.3%, 56.8% and 48.5% were female. The mean eGFR was 72.9 ± 28.7, 72.9 ± 33.4 and 39.9 ± 30.4 mL/min/1.73 m2 and the median proteinuria (IQR) was 1.1 (0.3, 2.9), 1.0 (0.3, 3.4) and 2.4 (0.8, 5.1) g/g creatinine in Clusters 1, 2 and 3, respectively.
Figure 1:
Hierarchical cluster analysis. (A) Dendrogram showing the results from hierarchical clustering on 225 plasma biomarkers using the Euclidean distance measure and Ward's Minimum Variance. Rectangles show the split into three clusters. (B) Shown is the cluster formation in the first two PCs. PC1 and PC2 explain 21.5% and 10.2% of the total variance, respectively. PC, principal component.
Table 1:
Baseline characteristics of BKBC participants by cluster.
| Cluster 1 | Cluster 2 | Cluster 3 | P-value | |
|---|---|---|---|---|
| n = 71 | n = 241 | n = 229 | ||
| Clinical characteristics | ||||
| Age, mean (±SD) | 47.9 (±15.0) | 50.0 (±16.6) | 54.7 (±16.7) | .001 |
| Female | 35 (49.3) | 137 (56.8) | 111 (48.5) | .17 |
| Race | .002 | |||
| White | 44 (62.0) | 139 (57.7) | 159 (69.4) | |
| Black | 8 (11.3) | 49 (20.3) | 45 (19.7) | |
| Other | 19 (26.8) | 53 (22.0) | 25 (10.9) | |
| eGFR (mL/min/1.73 m2) | 72.9 (±28.7) | 72.9 (±33.4) | 39.9 (±30.4) | <.001 |
| Proteinuria (g/g creatinine), median (IQR) | 1.1 (0.3, 2.9) | 1.0 (0.3, 3.4) | 2.4 (0.8, 5.1) | <.001 |
| Clinicopathologic diagnosisa | <.001 | |||
| Proliferative glomerulonephritis | 26 (36.6) | 73 (30.3) | 62 (27.1) | |
| Non-proliferative glomerulopathy | 12 (16.9) | 44 (18.3) | 39 (17.0) | |
| Paraprotein-related disease | 4 (5.6) | 22 (9.1) | 9 (3.9) | |
| Diabetic kidney disease | 5 (7.0) | 15 (6.2) | 44 (19.2) | |
| Vascular disease | 6 (8.5) | 26 (10.8) | 18 (7.9) | |
| Tubulointerstitial disease | 4 (5.6) | 9 (3.7) | 32 (14.0) | |
| Advanced glomerulosclerosis | 7 (9.9) | 33 (13.7) | 21 (9.2) | |
| Other | 7 (9.9) | 19 (7.9) | 4 (1.7) | |
| Comorbid conditions | ||||
| Diabetes mellitus | 7 (9.9) | 32 (13.3) | 81 (35.4) | <.001 |
| Hypertension | 30 (42.3) | 119 (49.4) | 133 (58.1) | .034 |
| Systemic lupus erythematosus | 10 (14.1) | 47 (19.5) | 32 (14.0) | .23 |
| HIV | 2 (2.8) | 1 (0.4) | 3 (1.3) | .22 |
| Hepatitis B | 0 (0.0) | 3 (1.2) | 1 (0.4) | .44 |
| Hepatitis C | 1 (1.4) | 0 (0.0) | 9 (3.9) | .006 |
| Malignancy | 8 (11.3) | 31 (12.9) | 40 (17.5) | .26 |
| Medications | ||||
| ACEi | 27 (38.0) | 73 (30.3) | 74 (32.3) | .47 |
| ARB | 11 (15.5) | 40 (16.6) | 30 (13.1) | .56 |
| MRA | 1 (1.4) | 4 (1.7) | 9 (3.9) | .24 |
| Calcium channel blockers | 9 (12.7) | 69 (28.6) | 58 (25.3) | .024 |
| Beta-blockers | 7 (9.9) | 57 (23.7) | 98 (42.8) | <.001 |
| Steroids | 17 (23.9) | 53 (22.0) | 53 (23.1) | .92 |
| Immunosuppression | 13 (18.3) | 41 (17.0) | 47 (20.5) | .62 |
| Clinical site | .010 | |||
| 1 | 46 (64.8) | 187 (77.6) | 145 (63.3) | |
| 2 | 18 (25.4) | 34 (14.1) | 52 (22.7) | |
| 3 | 7 (9.9) | 20 (8.3) | 32 (14.0) |
ACEi, angiotensin-converting enzyme inhibitor; ARB, angiotensin II receptor blocker; MRA, mineralocorticoid receptor antagonist.
aThe ‘other diagnosis’ category was composed of participants with minor abnormalities or relatively preserved parenchyma.
Data are presented as mean ± standard deviation, median (IQR), or count with frequencies (%) for binary and categorical variables.
Top biomarker contributors to Clusters 1, 2,and 3
Biomarker levels by cluster membership are shown in Supplementary data, Table S3. The top three biomarkers contributing to cluster membership were as follows (Supplementary data, Fig. S1): Cluster 1, AXIN-1, Syntaxin-8 (STX8) and Placental Growth Factor (PGF); Cluster 2, PGF, TNF-related apoptosis-inducing ligand-R2 (TRAIL-R2) and DNA topoisomerase 2-beta (TOP2B); and Cluster 3, PGF, CD40, and BMP and activin membrane bound inhibitor (BAMBI).
Associations of clusters with clinicopathologic diagnoses and histopathologic lesions
Differences in clinicopathologic diagnostic categories by cluster are shown in Fig. 2 and Supplementary data, Table S4. Compared with Clusters 1 and 2, individuals in Cluster 3 were significantly more likely to have tubulointerstitial disease (P < .001) and diabetic kidney disease (P < .001), and less likely to have only minor abnormalities or relatively preserved parenchyma (P = .003). Associations between cluster membership and histopathologic lesions are shown in Table 2 and Supplementary data, Table S5. Compared with individuals in Cluster 1, individuals in Cluster 3 had more severe mesangial expansion, glomerular sclerosis, acute tubular injury and interstitial fibrosis/tubular atrophy, as well as more severe arteriolar sclerosis and inflammation in the preserved and fibrosed interstitium. After multivariable adjustment for age, race, sex and eGFR, membership in Cluster 3 remained significantly associated with more severe mesangial expansion [odds ratio (OR) 2.44, 95% confidence interval (CI) 1.29, 4.64] and inflammation in the fibrosed interstitium (OR 2.49 95% CI 1.02, 6.10) compared with Cluster 1. There were no significant differences in histopathologic lesion severity comparing individuals in Cluster 1 and Cluster 2.
Figure 2:
Clinicopathologic diagnoses by cluster. Shown are differences in clinicopathologic diagnostic categories by cluster membership. P-values obtained from Chi-square tests: proliferative GN, P = .30; non-proliferative glomerulopathy, P = .93; paraprotein disease, P = .07; *diabetic kidney disease, P < .001; vascular disease, P = .53; *tubulointerstitial disease, P < .001; advanced chronic changes, P = .28; *other (comprised of individuals with minor abnormalities or relatively preserved parenchyma), P = .003.
Table 2:
Associations of clusters with histopathologic lesions.
| Cluster 1 | Cluster 2 | Cluster 3 | |||
|---|---|---|---|---|---|
| Histopathologic lesions | OR (95% CI) | OR (95% CI) | P-value | OR (95% CI) | P-value |
| Glomerular inflammation | |||||
| Model 1 | Reference | 0.79 (0.40, 1.55) | 0.488 | 1.01 (0.52, 1.96) | .970 |
| Model 2 | Reference | 0.77 (0.38, 1.54) | 0.457 | 1.38 (0.67, 2.84) | .378 |
| Mesangial expansion | |||||
| Model 1 | Reference | 0.96 (0.52, 1.78) | 0.909 | 2.63 (1.45, 4.80) | .002 |
| Model 2 | Reference | 0.93 (0.50, 1.74) | 0.828 | 2.44 (1.29, 4.64) | .010 |
| Segmental sclerosis | |||||
| Model 1 | Reference | 0.91 (0.48, 1.72) | 0.777 | 0.86 (0.45, 1.63) | .645 |
| Model 2 | Reference | 0.93 (0.49, 1.77) | 0.827 | 0.85 (0.43, 1.67) | .632 |
| Glomerular sclerosis | |||||
| Model 1 | Reference | 1.31 (0.72, 2.39) | 0.375 | 2.72 (1.50, 4.93) | .001 |
| Model 2 | Reference | 1.23 (0.64, 2.35) | 0.530 | 1.08 (0.55, 2.11) | .825 |
| Acute tubular injury | |||||
| Model 1 | Reference | 1.38 (0.38, 5.06) | 0.624 | 5.42 (1.62, 18.15) | .006 |
| Model 2 | Reference | 1.34 (0.35, 5.12) | 0.664 | 2.14 (0.58, 7.88) | .254 |
| Inflammation, non-fibrosed interstitium | |||||
| Model 1 | Reference | 0.58 (0.23, 1.44) | 0.239 | 2.27 (1.01, 5.09) | .048 |
| Model 2 | Reference | 0.59 (0.23, 1.53) | 0.277 | 1.38 (0.56, 3.42) | .485 |
| Inflammation, fibrosed interstitium | |||||
| Model 1 | Reference | 0.91 (0.46, 1.79) | 0.785 | 4.45 (1.91, 10.38) | .001 |
| Model 2 | Reference | 0.93 (0.46, 1.90) | 0.844 | 2.49 (1.02, 6.10) | .045 |
| Interstitial fibrosis/tubular atrophy | |||||
| Model 1 | Reference | 1.53 (0.80, 2.92) | 0.199 | 5.08 (2.68, 9.63) | <.001 |
| Model 2 | Reference | 1.55 (0.74, 3.22) | 0.242 | 1.86 (0.88, 3.92) | .105 |
| Arterial sclerosis | |||||
| Model 1 | Reference | 0.99 (0.56, 1.75) | 0.976 | 1.49 (0.83, 2.66) | .180 |
| Model 2 | Reference | 0.92 (0.50, 1.71) | 0.791 | 0.77 (0.40, 1.51) | .452 |
| Arteriolar sclerosis | |||||
| Model 1 | Reference | 1.39 (0.80, 2.42) | 0.249 | 2.77 (1.57, 4.88) | <.001 |
| Model 2 | Reference | 1.42 (0.77, 2.60) | 0.265 | 1.39 (0.73, 2.65) | .317 |
Logistic regression models were fit using dichotomized histopathologic lesions as the dependent variable and cluster membership as the independent variable. Model 1 is unadjusted, Model 2 is adjusted for age, sex, race and eGFR.
Associations of clusters with adverse clinical outcomes
One hundred seventeen individuals progressed to kidney failure and 79 participants died during a median follow-up time of 43.1 months and 59.5 months, respectively. Figure 3 and Table 3 show associations of cluster membership with subsequent kidney failure and death. In the fully adjusted model, membership in Cluster 3 was associated with a 3.29-fold higher risk of progression to kidney failure compared with Cluster 1 (Table 3). Compared with Cluster 1, membership in Cluster 3 was associated with a higher risk of death, but this association was not statistically significant after multivariable adjustment including eGFR. Membership in Cluster 2 was not associated with a higher risk of kidney failure or death compared with Cluster 1.
Figure 3:
Associations of future kidney failure and death with cluster membership. Kaplan–Meier survival curves show associations between time-to-kidney failure (A) and death (B) by cluster membership. P-values obtained from log rank test: P < .001 (A), P < .001 (B).
Table 3:
Associations of clusters with risks of kidney disease progression and death.
| Events | Events per 100 person-years | Model 1 | P-value | Model 2 | P-value | Model 3 | P-value | |
|---|---|---|---|---|---|---|---|---|
| Kidney failure | ||||||||
| Cluster 1 | 6 | 1.4 | Reference | Reference | Reference | |||
| Cluster 2 | 29 | 2.5 | 1.73 (0.72, 4.16) | .224 | 1.54 (0.63, 3.78) | 0.344 | 1.64 (0.66, 4.05) | .283 |
| Cluster 3 | 82 | 8.9 | 5.99 (2.61, 13.74) | <.001 | 4.80 (2.04, 11.30) | <0.001 | 3.29 (1.37, 7.90) | .008 |
| Mortality | ||||||||
| Cluster 1 | 6 | 1.2 | Reference | Reference | Reference | |||
| Cluster 2 | 21 | 1.5 | 1.20 (0.48, 2.98) | .694 | 0.87 (0.34, 2.24) | 0.780 | 0.83 (0.32, 2.15) | .707 |
| Cluster 3 | 52 | 4.1 | 3.40 (1.46, 7.95) | .005 | 2.28 (0.94, 5.53) | 0.068 | 1.64 (0.65, 4.16) | .297 |
Results are expressed as hazard ratios (95% CI).
Kidney failure is defined as initiation of kidney replacement therapy.
Model 1 is unadjusted.
Model 2 is stratified by site and adjusts for age, sex, race, natural log transformed proteinuria,and primary clinicopathologic diagnosis.
Model 3 is model 2 and further adjusts for eGFR.
Differences in pathway activities by cluster
Differences in pathway activities according to GSVA scores between clusters are shown in Supplementary data, Fig. S2. The top ranked pathway in Cluster 1 was epithelial–mesenchymal transition. The top ranked pathway in Cluster 2 was Interleukin-6 (IL-6)/Janus Kinase (JAK)/Signal Transducer and Activator of Transcription-3 (STAT3) signaling, and the top ranked pathways in Cluster 3 was tumor-necrosis factor alpha (TNF-alpha) signaling.
DISCUSSION
In this study, we investigated the use of plasma biomarker–derived clusters for clinicopathologic phenotyping and assessment of prognosis in individuals with biopsy-confirmed kidney disease. Our unsupervised clustering approach, blinded to any additional clinical information, partitioned subjects into three separate clusters. Participants classified in Cluster 3 were slightly older, had worse kidney function, more severe mesangial expansion and tubulointerstitial inflammation, and were more likely to have tubulointerstitial and diabetic kidney disease. Compared with those classified into Cluster 1, individuals in Cluster 3 had a significantly higher risk of future kidney failure; this difference was evident even after adjustment for age, eGFR and proteinuria. The key contributing biomarkers in each cluster and differences in pathway activities between clusters may point toward important mechanisms involved in CKD pathology and disease progression. Our findings demonstrate that biomarker-derived clusters could aid with non-invasive phenotyping and may improve understanding of the underlying heterogeneity of kidney diseases.
To our knowledge, this is the first study to investigate associations of biomarker-derived clusters with disease progression and histopathologic findings across a spectrum of biopsy-confirmed kidney diseases. In other disease settings, data-driven phenotyping tools have been used to better characterize heterogenous conditions such as diabetes [5], heart failure [4, 16] and different types of malignancies [17–19]. In the setting of kidney disease, cluster analysis has been used to differentiate pathogenetic patterns in individuals with membranoproliferative glomerulonephritis (GN) [20]. Another study applied an unsupervised clustering approach on eight urine biomarkers to predict incident CKD in HIV-infected women [7]. A study from the Chronic Renal Insufficiency Cohort showed that cluster analysis on 72 variables, including baseline kidney function, demographics and comorbidities, provided a useful and simple metric to summarize patient heterogeneity and comorbidity profiles in discrete categories of risk [6]. This study reported strong associations of clusters with adverse clinical outcomes and showed that the observed associations of future cardiovascular events and death with cluster membership were greater than those with established risk factors such as diabetes or male sex [6]. By contrast, our study used biomarkers alone to define cluster membership, which allowed us to evaluate the ability of biomarkers in isolation to stratify individuals into distinct categories of risk as well as to understand the pathophysiologic pathways that may mediate such risk. We also observed a strong association between cluster membership and future kidney failure independent of baseline kidney function.
Membership in Cluster 3 was associated with higher risks of future kidney failure and with the presence of diabetic kidney disease and tubulointerstitial disease. In line with this, the top hallmark pathway in Cluster 3 was TNF-α signaling, which plays an essential role in the regulation of inflammatory processes and the pathogenesis and progression of diabetic kidney disease [21–23]. Several studies in individuals with diabetes and diabetic kidney disease have shown that higher levels of TNF-α and different TNF receptors were associated with adverse clinical outcomes [22, 24–26]. The main biomarker that contributed to the formation of Clusters 2 and 3 was PGF. We previously identified a strong association between PGF, a member of the vascular endothelial growth factor (VEGF) family, with a higher risk of future kidney failure and more severe interstitial fibrosis/tubular atrophy (IFTA) [2]. Higher PGF levels have been found to be associated with a higher risk of incident CKD [27], and PGF has also been studied in the setting of preeclampsia where lower circulating maternal PGF levels were found to be associated with the disease [28, 29]. In addition to PGF, we identified CD40 as an important contributor to the formation of Cluster 3. Soluble CD40 ligand has been shown to mediate inflammatory responses and remodeling processes associated with tissue injury and glomerular sclerosis in patients with diabetic kidney disease [30]. Several main biomarker contributors to Cluster 3 are known to be involved in the regulation of inflammatory pathways in the kidney [22, 24–29]. Our findings suggest that targeting these pathways may be an important step toward the development of new therapeutics of kidney diseases or tracking response to therapy. Future basic and translational research is needed to explore individual biomarkers identified in this study and evaluate their potential as therapeutic targets.
The main contributor to Cluster 1, which was associated with preserved kidney function and had a lower risk of disease progression compared with Cluster 3, was AXIN-1. AXIN-1, a cytoplasmic protein that functions as a negative regulator of the Wnt-signaling pathway by downregulating β-catenin, has primarily been studied in the setting of cancer [31–33]. Overexpression of Wnt1 and β-catenin has been shown to be associated with podocyte dysfunction and albuminuria in patients with diabetic kidney disease and focal segmental glomerulosclerosis, suggesting that this pathway plays an important role in the pathogenesis of CKD [34–36]. A recent study demonstrated decreased AXIN-1 expression in a rat model of kidney fibrosis and found that inhibition of AXIN-1 plays a key role in mediating Wnt/β-catenin signaling in hypoxia-induced epithelial–mesenchymal transition in human proximal tubular cells [37]. In our study, we observed the highest levels of AXIN-1 in Cluster 1, which may point toward an important role of AXIN-1 in regulating anti-fibrotic responses in patients with CKD. Our results provide support for further study of AXIN-1 and the Wnt/β-catenin signaling pathway in CKD.
A significant strength of our clustering approach is that clusters can define subgroups of individuals while mitigating the problem of multicollinearity and implicitly handling potential measurement errors associated with individual variables [6, 7]. Additional strengths of our study are the number of protein biomarkers included in the analyses, the availability of adjudicated histopathologic scores on lesion severity, and the prospective study design with long-term follow-up data. Our study has several limitations that warrant consideration as well. First, our approach was exploratory. We measured biomarker levels only in baseline samples and were not able to account for therapy at the time of kidney biopsy which could alter an individual's risk of disease progression or death. An important limitation of our pathway analysis was the narrow range and nature of the protein markers studied. We selected three commercially available Olink proteomics panels which were chosen based on their potential relevance for kidney disease. Thus, the protein biomarkers included in the study necessarily represent processes related to cardiovascular disease, inflammation and organ injury. Furthermore, we were not able to perform mechanistic studies of signaling pathways which needs to be addressed in future studies. Lastly, our findings will need replication in a validation cohort.
In this study, we used a data-driven clustering approach of protein biomarkers to identify subgroups of individuals with biopsy-confirmed kidney disease. The three clusters summarized the multidimensional information from 225 plasma protein measurements at baseline and differed by clinicopathologic diagnoses, histopathologic lesion severity and future risks of kidney failure. Our study demonstrates that clusters of plasma protein biomarkers can be used for phenotyping of patients with a diverse spectrum of kidney diseases and can aid with uncovering proteins and pathways for further investigation as therapeutic targets in CKD.
Supplementary Material
ACKNOWLEDGEMENTS
We thank the members of the laboratory of S.S.W. for their invaluable assistance in the Boston Kidney Biopsy Cohort. I.M.S. is supported by the American Philosophical Society Daland Fellowship in Clinical Investigation. S.S.W. is also supported by NIH grants UH3DK114915, U01DK085660, U01DK104308, R01DK103784, and R21DK119751. A.S. is supported by NIH grants K23DK120811, U01AID163081, NIDDK Kidney Precision Medicine Project Opportunity Pool grant under U2CDK114886, and core resources from the George M. O'Brien Kidney Research Center at Northwestern University (NU-GoKIDNEY) P30DK114857. P.P. is supported by NIH grants R21AI100484-27, R01GM122876-03, and 1DP2DA051864-01. This work was conducted with support from Harvard Catalyst. The Harvard Clinical and Translational Science Center (National Center for Advancing Translational Sciences, National Institutes of Health Award UL1TR001102) and financial contributions from Harvard University and its affiliated academic healthcare centers. The content is solely the responsibility of the authors and does not necessarily represent the official views of Harvard Catalyst, Harvard University and its affiliated academic healthcare centers, or the NIH. Results presented in this paper have not been published previously in whole or part, except in abstract format. Part of this work was presented as a poster presentation at the 2021 American Society of Nephrology Kidney Week.
Contributor Information
Insa M Schmidt, Boston University School of Medicine and Boston Medical Center, Department of Medicine, Section of Nephrology, Boston, MA, USA.
Steele Myrick, Boston University School of Public Health, Department of Biostatistics, Boston, MA, USA.
Jing Liu, Division of Nephrology and National Clinical Research Center for Geriatrics, Kidney Research Institute, West China Hospital of Sichuan University, Chengdu, China.
Ashish Verma, Boston University School of Medicine and Boston Medical Center, Department of Medicine, Section of Nephrology, Boston, MA, USA.
Anand Srivastava, Center for Translational Metabolism and Health, Institute for Public Health and Medicine, Division of Nephrology and Hypertension, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.
Ragnar Palsson, Division of Nephrology, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA.
Ingrid F Onul, Boston University School of Medicine and Boston Medical Center, Department of Medicine, Section of Nephrology, Boston, MA, USA.
Isaac E Stillman, Beth Israel Deaconess Medical Center, Harvard Medical School, Department of Pathology, Boston, MA, USA.
Claire Avillach, Boston Medical Center, Department of Pathology, Boston, MA, USA.
Prasad Patil, Boston University School of Public Health, Department of Biostatistics, Boston, MA, USA.
Sushrut S Waikar, Boston University School of Medicine and Boston Medical Center, Department of Medicine, Section of Nephrology, Boston, MA, USA.
FUNDING
This study was supported by National Institutes of Health (NIH) grant R01DK093574 (S.S.W.).
AUTHORS’ CONTRIBUTIONS
I.M.S, S.M., P.P., R.P., A.S., I.O., C.A., A.V., and S.S.W were responsible for the concept and design of the study. A.S., R.P. and S.S.W. adjudicated clinical outcomes. I.E.S. and H.G.R. were responsible for the adjudication of histopathology. S.M., P.P., I.M.S., and S.S.W. designed the computational framework. S.M., P.P., I.M.S., and S.S.W. were responsible for statistical analyses. All authors interpreted the data. I.M.S., S.M., and S.S.W. drafted the manuscript. All authors contributed to critical revisions of the manuscript for important intellectual content.
DATA AVAILABILITY STATEMENT
The data underlying this article cannot be shared publicly due to privacy of individuals that participated in the study. The data will be shared on reasonable request to the corresponding author.
CONFLICT OF INTEREST STATEMENT
S.S.W. reports consultancy feels from Wolters Kluewer, Bain, BioMarin, Goldfinch, GSK, Ikena, Regeneron, Strataca, research fundig from Vertex, Pfizer, and J&J, and expert witness for litigation related to dialysis lab testing (Davita), PPIs (Pfizer), PFAO exposure (Dechert), voclosporin (Aurinia). A.S. reports personal fees from Horizon Therapeutics, PLC, AstraZeneca, CVS Caremark, Bayer, and 500 medicolegal consulting (Tate & Latham). All other authors have nothing to disclose.
REFERENCES
- 1. Kidney Disease: Improving Global Outcomes (KDIGO) CKD Work Group . KDIGO clinical practice guideline for the evaluation and management of chronic kidney disease. Kidney Int Suppl 2013;3:1150. [Google Scholar]
- 2. Schmidt IM, Sarvode Mothi S, Wilson PCet al. Circulating plasma biomarkers in biopsy-confirmed kidney disease. Clin J Am Soc Nephrol 2022;17:27–37. 10.2215/CJN.09380721 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Pal NR, Aguan K, Sharma Aet al. Discovering biomarkers from gene expression data for predicting cancer subgroups using neural networks and relational fuzzy clustering. BMC Bioinf 2007;8:5. 10.1186/1471-2105-8-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Scherzer R, Shah SJ, Secemsky Eet al. Association of biomarker clusters with cardiac phenotypes and mortality in patients with HIV infection. Circ Heart Fail 2018;11:e004312. 10.1161/CIRCHEARTFAILURE.117.004312 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Ahlqvist E, Storm P, Karajamaki Aet al. Novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables. Lancet Diabetes Endocrinol 2018;6:361–9. 10.1016/S2213-8587(18)30051-2 [DOI] [PubMed] [Google Scholar]
- 6. Zheng Z, Waikar SS, Schmidt IMet al. Subtyping CKD patients by consensus clustering: the chronic renal insufficiency cohort (CRIC) study. J Am Soc Nephrol 2021;32:639–53. 10.1681/ASN.2020030239 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Scherzer R, Lin H, Abraham Aet al. Use of urine biomarker-derived clusters to predict the risk of chronic kidney disease and all-cause mortality in HIV-infected women. Nephrol Dial Transplant 2016;31:1478–85. 10.1093/ndt/gfv426 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Chen DQ, Cao G, Chen Het al. Identification of serum metabolites associating with chronic kidney disease progression and anti-fibrotic effect of 5-methoxytryptophan. Nat Commun 2019;10:1476. 10.1038/s41467-019-09329-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Agarwal R, Duffin KL, Laska DAet al. A prospective study of multiple protein biomarkers to predict progression in diabetic chronic kidney disease. Nephrol Dial Transplant 2014;29:2293–302. 10.1093/ndt/gfu255 [DOI] [PubMed] [Google Scholar]
- 10. Srivastava A, Palsson R, Kaze ADet al. The prognostic value of histopathologic lesions in native kidney biopsy specimens: results from the Boston Kidney Biopsy Cohort study. J Am Soc Nephrol 2018;29:2213–24. 10.1681/ASN.2017121260 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Assarsson E, Lundberg M, Holmquist Get al. Homogenous 96-plex PEA immunoassay exhibiting high sensitivity, specificity, and excellent scalability. PLoS One 2014;9:e95192. 10.1371/journal.pone.0095192 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Inker LA, Eneanya ND, Coresh Jet al. New creatinine- and cystatin C-cased equations to estimate GFR without race. N Engl J Med 2021;385:1737–49. 10.1056/NEJMoa2102953 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Charrad M, Ghazzali N, Boiteau Vet al. NbClust: an r package for determining the relevant number of clusters in a data set. J Stat Softw 2014;61:1–36. 10.18637/jss.v061.i06 [DOI] [Google Scholar]
- 14. Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinf 2013;14:7. 10.1186/1471-2105-14-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Subramanian A, Tamayo P, Mootha VKet al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005;102:15545–50. 10.1073/pnas.0506580102 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Shah SJ, Katz DH, Selvaraj Set al. Phenomapping for novel classification of heart failure with preserved ejection fraction. Circulation 2015;131:269–79. 10.1161/CIRCULATIONAHA.114.010637 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Sakr L, Small D, Kasymjanova Get al. Phenotypic heterogeneity of potentially curable non-small-cell lung cancer: cohort study with cluster analysis. J Thorac Oncol 2015;10:754–61. 10.1097/JTO.0000000000000505 [DOI] [PubMed] [Google Scholar]
- 18. Shukla N, Hagenbuchner M, Win KTet al. Breast cancer data analysis for survivability studies and prediction. Comput Methods Programs Biomed 2018;155:199–208. 10.1016/j.cmpb.2017.12.011 [DOI] [PubMed] [Google Scholar]
- 19. Oh SC, Sohn BH, Cheong JHet al. Clinical and genomic landscape of gastric cancer with a mesenchymal phenotype. Nat Commun 2018;9:1777. 10.1038/s41467-018-04179-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Iatropoulos P, Daina E, Curreri Met al. Cluster analysis identifies distinct pathogenetic patterns in C3 glomerulopathies/immune complex-mediated membranoproliferative GN. J Am Soc Nephrol 2018;29:283–94. 10.1681/ASN.2017030258 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Chung CH, Fan J, Lee EYet al. Effects of tumor necrosis factor-alpha on podocyte expression of monocyte chemoattractant protein-1 and in diabetic nephropathy. Nephron Extra 2015;5:1–18. 10.1159/000369576 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Niewczas MA, Pavkov ME, Skupien Jet al. A signature of circulating inflammatory proteins and development of end-stage renal disease in diabetes. Nat Med 2019;25:805–13. 10.1038/s41591-019-0415-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Al-Lamki RS, Mayadas TN. TNF receptors: signaling pathways and contribution to renal dysfunction. Kidney Int 2015;87:281–96. 10.1038/ki.2014.285 [DOI] [PubMed] [Google Scholar]
- 24. Niewczas MA, Gohda T, Skupien Jet al. Circulating TNF receptors 1 and 2 predict ESRD in type 2 diabetes. J Am Soc Nephrol 2012;23:507–15. 10.1681/ASN.2011060627 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Schrauben SJ, Shou H, Zhang Xet al. Association of multiple plasma biomarker concentrations with progression of prevalent diabetic kidney disease: findings from the chronic renal insufficiency cohort (CRIC) study. J Am Soc Nephrol 2021;32:115–26. 10.1681/ASN.2020040487 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Coca SG, Nadkarni GN, Huang Yet al. Plasma biomarkers and kidney function decline in early and established diabetic kidney disease. J Am Soc Nephrol 2017;28:2786–93. 10.1681/ASN.2016101101 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Carlsson AC, Ingelsson E, Sundstrom Jet al. Use of proteomics to investigate kidney function decline over 5 years. Clin J Am Soc Nephrol 2017;12:1226–35. 10.2215/CJN.08780816 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Almaani SJ. Placental growth factor in pre-eclampsia: friend or foe? Kidney Int 2019;95:730–2. 10.1016/j.kint.2019.02.002 [DOI] [PubMed] [Google Scholar]
- 29. Bramham K, Seed PT, Lightstone Let al. Diagnostic and predictive biomarkers for pre-eclampsia in patients with established hypertension and chronic kidney disease. Kidney Int 2016;89:874–85. 10.1016/j.kint.2015.10.012 [DOI] [PubMed] [Google Scholar]
- 30. Kuo HL, Huang CC, Lin TYet al. IL-17 and CD40 ligand synergistically stimulate the chronicity of diabetic nephropathy. Nephrol Dial Transplant 2018;33:248–56. 10.1093/ndt/gfw397 [DOI] [PubMed] [Google Scholar]
- 31. Biechele TL, Kulikauskas RM, Toroni RAet al. Wnt/β-catenin signaling and AXIN1 regulate apoptosis triggered by inhibition of the mutant kinase BRAFV600E in human melanoma. Sci Signal 2012;5:ra3. 10.1126/scisignal.2002274 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Zhan T, Rindtorff N, Boutros M. Wnt signaling in cancer. Oncogene 2017;36:1461–73. 10.1038/onc.2016.304 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Klaus A, Birchmeier W. Wnt signalling and its impact on development and cancer. Nat Rev Cancer 2008;8:387–98. 10.1038/nrc2389 [DOI] [PubMed] [Google Scholar]
- 34. Malik SA, Modarage K, Goggolidou P. The role of Wnt signalling in chronic kidney disease (CKD). Genes (Basel) 2020;11:496. 10.3390/genes11050496 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Zhou L, Liu Y. Wnt/β-catenin signalling and podocyte dysfunction in proteinuric kidney disease. Nat Rev Nephrol 2015;11:535–45. 10.1038/nrneph.2015.88 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Zuo Y, Liu Y. New insights into the role and mechanism of Wnt/β-catenin signalling in kidney fibrosis. Nephrology (Carlton) 2018;23:38–43. 10.1111/nep.13472 [DOI] [PubMed] [Google Scholar]
- 37. Liao L, Duan L, Guo Yet al. TRIM46 upregulates Wnt/β-catenin signaling by inhibiting axin1 to mediate hypoxia-induced epithelial-mesenchymal transition in HK2 cells. Mol Cell Biochem 2022; doi: 10.1007/s11010-022-04467-4. Online ahead of print. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data underlying this article cannot be shared publicly due to privacy of individuals that participated in the study. The data will be shared on reasonable request to the corresponding author.




