Key Points
A software tool was developed to perform glomerular and patient-level classification on the basis of clinically relevant biomarkers.
Ten biomarkers were used for glomerular and patient-level classification that obtained 77% and 87% accuracies, respectively.
In the future, these tools can be applied to clinical datasets for glomerular biomarker discovery and for insights into disease mechanisms.
Keywords: glomerular and tubulointerstitial diseases, basic science, computational pathology, explainable biomarkers, machine learning, membranous nephropathy, minimal change disease, thin-basement membrane nephropathy
Abstract
Pathologists use multiple microscopy modalities to assess renal biopsy specimens. Besides usual diagnostic features, some changes are too subtle to be properly defined. Computational approaches have the potential to systematically quantitate subvisual clues, provide pathogenetic insight, and link to clinical outcomes. To this end, a proof-of-principle study is presented demonstrating that explainable biomarkers through machine learning can distinguish between glomerular disorders at the light-microscopy level.
The proposed system used image analysis techniques and extracted 233 explainable biomarkers related to color, morphology, and microstructural texture. Traditional machine learning was then used to classify minimal change disease (MCD), membranous nephropathy (MN), and thin basement membrane nephropathy (TBMN) diseases on a glomerular and patient-level basis.
The final model combined the Gini feature importance set and linear discriminant analysis classifier. Six morphologic (nuclei-to-glomerular tuft area, nuclei-to-glomerular area, glomerular tuft thickness greater than ten, glomerular tuft thickness greater than three, total glomerular tuft thickness, and glomerular circularity) and four microstructural texture features (luminal contrast using wavelets, nuclei energy using wavelets, nuclei variance using color vector LBP, and glomerular correlation using GLCM) were, together, the best performing biomarkers. Accuracies of 77% and 87% were obtained for glomerular and patient-level classification, respectively.
Computational methods, using explainable glomerular biomarkers, have diagnostic value and are compatible with our existing knowledge of disease pathogenesis. Furthermore, this algorithm can be applied to clinical datasets for novel prognostic and mechanistic biomarker discovery.
Introduction
Glomerulonephritides (or glomerulopathies) are a group of rare kidney diseases characterized by injury to the glomerular filtration barrier, which, with limited treatment options, often progress to ESKD (1). Renal pathologists examine kidney biopsy tissue by routine light microscopy (LM), immunofluorescence (IF), and electron microscopy (EM), to diagnose, often descriptively, the many glomerular diseases. The disease names are well recognized by treating clinicians, although they usually do not offer much etiologic or pathogenic insight. Manual approaches are limited in not being able to glean or quantitate any potentially useful subvisual features, leaving clinically informative pathologic findings untapped. Digitization of whole-slide images (WSIs) and novel software tools, including traditional image analysis and machine learning, can be used to uncover objective biomarkers, besides their use in augmenting pathology workflows. Biomarkers that are designed to be intuitive and explainable, relating to the underlying pathology (tissue micro- and macrostructure), give the user confidence in the system and further enhance mechanistic understanding of disease.
Several reports have been published describing machine-learning algorithms in mouse and human renal tissue (2–4). Taken together, these approaches demonstrate that automated tools are feasible for renal pathology classification. For this proof-of-principle study, three glomerular disorders are used: minimal change disease (MCD), membranous nephropathy (MN), and thin basement membrane nephropathy (TBMN). Although complex in overall pathology, these disorders are separated by somewhat simple histologic principles. MCD appears normal by LM, but shows podocyte foot process effacement, which can only be appreciated with EM. MN may show thick glomerular walls on LM, but immune complexes can only be recognized by IF and EM. TBMN shows diffuse thinning of the glomerular wall and can only be reliably identified on EM. MCD and MN are common causes of nephrotic syndrome, whereas TBMN is the major cause of isolated hematuria. Pathologists often suspect, but cannot diagnose, these diseases from LM alone. We reasoned that such suspicious features can be better defined and thus quantitated by computational approaches.
An image analysis and machine-learning algorithm was designed to identify biomarkers that can be used to distinguish between these disorders using Periodic acid–Schiff (PAS)–stained WSIs, which, to the best of our knowledge, is novel in the field. An explainable biomarker panel of 233 features was designed with future opportunities to correlate with clinical outcomes and expand to other glomerular diseases. The proposed method has four steps: (1) image preprocessing, (2) automatic glomerular structure segmentation, (3) biomarker feature extraction, and (4) glomerular and patient-level classification. Three machine-learning classifiers were compared with pathologist-derived assessment that combined three modalities (LM, IF, and EM) as the gold standard.
Materials and Methods
Data Preparation
The dataset is from Toronto General Hospital (TGH) in Toronto, Ontario, Canada and has institutional research board approval. This dataset consists of WSIs of renal biopsy specimens from n=45 different patients with n=15 WSIs per disease (MCD, TBMN, and MN), derived from pathologist (R.J.) assessment using LM, IF, and EM. Biopsy samples were chosen from cases with classic disease features and without changes associated with other glomerular disease. One slide from each case stained by the standard PAS method was used for analysis. Slides were scanned at 40× magnification with a resolution of 0.2526×0.2526 µm. A total of 375 manually cropped regions of interest (ROIs) and glomerular boundary segmentations (sized 1500×1500 pixels) containing glomeruli were segmented using Pathcore Sedeen Image Viewer (5). Glomerular detection and segmentation were not considered in this work because the main focus was analyzing differences in diseases on the basis of explainable biomarkers.
This dataset also has manual glomerular structure annotations for 150 ROIs (50 from each disease) that were used in two validations methods to evaluate the performance of the glomerular structure segmentation algorithm. Every nucleus was manually annotated from the selected glomeruli images. When annotating luminal space (Bowman space and capillary lumen) and glomerular tuft (including glomerular basement membrane [GBM] and mesangial matrix) structures, a similar annotation approach to that performed by Ginley et al. (4) was used. k-Means ground truths were generated, and initial cluster center positions were chosen using the average luminal and glomerular tuft annotation intensities. Annotations were performed by a trained biomedical student (M.N.B.) and validated by a pathologist (R.J.) for quality control.
Preprocessing
Preprocessing was used to prepare the data for biomarker measurement. Glomeruli can vary in size depending on the level at which the glomerulus sections are sampled (6). A glomerular size <1.5 the interquartile range of quartile 1 was used to determine outlier glomeruli on a per-slide basis and these were removed from the analysis. Because specimen preparation can contribute to large variations in the quality of WSIs, color standardization was performed to facilitate texture and color feature comparisons. Each image was color normalized using a modified version of the Reinhard method for color normalization, which decreases color variability in WSIs (7–9).
Glomerular Structure Segmentation
Subglomerular structures were automatically segmented for feature extraction and analysis. Color normalization was not used for segmentation because there was a decrease in performance. Three structures were segmented: (1) luminal (space inside the Bowman capsule and the capillary lumen), (2) glomerular tuft (the GBM and mesangial matrix), and (3) nuclei. For each structure, the ROIs were transformed into a color representation that was optimal for the given structure, followed by Otsu binary thresholding, and a 3×3 median filter to remove noise in the estimated segmented structures.
The estimated luminal, glomerular tuft, and nuclei masks obtained from the previous approaches were compared and five categories were found: (1) unlabeled pixels, (2) luminal pixels, (3) glomerular tuft pixels, (4) nuclei pixels, and (5) double-labeled pixels. A naive Bayesian classifier was implemented to determine the class of the unlabeled or double-labeled pixels (4,10). This classifier was trained using the known pixel classes from the estimated structural masks. A prediction for the unlabeled or double-labeled pixels was found using the trained model (4,10). Final glomerular structure segmentation masks were collected for biomarker feature extraction.
Biomarker Feature Extraction
Image analysis tools were used to gather color, morphologic, and microstructural texture features from glomeruli images, forming a total of 233 biomarkers. The red-green-blue color images were converted to the hue saturation value color space, which is similar to the human perception of color (11). Color structures were analyzed using histogram mean, variance, skewness, kurtosis, energy, and entropy (12). These features describe the amount of structures present in the glomerulus by quantifying the relative hue (color), purity (saturation), and intensity (value) of the image (11).
Morphologic features were extracted from the subglomerular structures to quantify shape and object-based characteristics. These features were organized into four groups: containment features, shape features, interstructural distance features, and intrastructural distance features (4). Containment features measure the fraction of one structure’s area in comparison within another (e.g., nuclei area divided by glomerular area) (4). Shape features, such as equivalent diameter, were computed for each structure, and circularity was used to quantify the roundness of the glomerulus. Interstructural distance features were used to assess distance between glomerular structures and describe how structures interact with each other (4). These features were measured by finding the centroid of each glomerular component and finding the pairwise distance between two structures. Lastly, intrastructural distance features were used to measure the thickness of a structure (e.g., glomerular tuft maximum thickness) (4). To measure thickness, the Euclidean distance transform operator was computed on each glomerular structure, yielding feature images that quantify spatial thickness. From these images, maximum, median, and total thickness features were extracted.
Microstructural texture features were designed to measure spatial relationships between color or gray-level pixels and describe glomerular microstructure tissue texture. Local and global texture-based biomarkers were evaluated using gray-level co-occurrence matrices (GLCM), color vector local binary patterns (LBP), and wavelet features (13–16).
Classification
A traditional machine-learning approach for glomerular classification was designed to classify glomeruli as either MCD, MN, or TBMN. The dataset was split into training, validation, and testing WSIs on the basis of patients, and glomeruli were individually labeled using the WSI disease label. Five-fold crossvalidation was used to examine which features were most important, followed by classifier hyperparameter fine tuning. Using the optimal configuration for the glomerular classification tool, the held-out testing patients were classified on a patient-level basis.
Feature Importance
For each glomerular image in the training and validation set, feature selection was used to examine which features were most discriminatory. The following four feature selection techniques were examined: using all features, statistical ANOVA F-value feature importance (17), Gini feature importance (GFI) (18), and maximum relevance minimum redundancy (mRMR) (19).
Glomerular Classification
Traditional machine-learning classifiers were chosen for classification because they are more interpretable and require less training data. After selecting the most relevant features, the following three classifiers were analyzed: linear discriminant analysis (LDA) (20), random forest (12), and logistic regression (21). The output of each classifier is a form of a class conditional probability for each disease, which was transformed into hard decisions by taking the maximum probability across disease groups. Glomerular classification performance metrics were then analyzed for all classifier methods.
Patient-Level Classification Model
To automatically predict the disease of a patient, each glomerulus from the WSI renal biopsy specimen was first classified using the optimal feature set and model found previously. Three methods were investigated when performing patient-level classification. In the first method, an average WSI disease diagnosis was represented from all glomeruli predictions. The probabilities found for each glomerulus on a WSI were averaged forming a confidence for each patient. The maximum disease confidence rating then corresponded to the predicted patient diagnosis. Similarly, the second method took the top four glomeruli with the highest probability and averaged them to get an estimated WSI disease diagnosis. In the last method examined, the glomerulus with the highest probability was used to determine the patient-level diagnosis. These methods were compared with how a pathologist visually inspects a patient’s WSI and on quantitative measures.
Performance Evaluation
The dice similarity coefficient (DSC) was used to measure the overlap between a segmented object and ground truth, whereas extra fraction (EF) was used to measure the false positive rate (22). The precision and recall were also used to measure the proportion of correctly segmented pixels, and the proportion of the ground truth pixels that were correctly identified by the predicted segmentation, respectively. To quantify glomerular and patient-level classification, accuracy, precision, recall, and F1-scores were investigated. Accuracy measures the fraction of correct predictions over the total number of predictions. F1-score is a combination of both precision and recall and gives an overall accuracy score. High F1-scores indicate the classifier is predicting with high precision and recall.
Results
Table 1 describes the dataset for glomerular and patient-level classification. The dataset was split by patient into 67% (n=250 glomeruli, n=30 WSIs) training/validation and 33% (n=121 glomeruli, n=15 WSIs) testing to ensure the same patients’ glomeruli were either in the training/validation dataset or in the testing set, with no overlap. Supplemental Figure 1 illustrates sample WSI needle biopsy specimens and Supplemental Table 1 details additional clinical information, such as age, sex, and disease-specific information (Supplemental Appendix 1). The experimental design of the proposed system is shown in Figure 1.
Table 1.
Data configuration for glomeruli and patient-level classification
Glomerular Disease | TGH Dataset | Training/Validation Set (67%) | Testing Set (33%) | |||
---|---|---|---|---|---|---|
Patients | Glomeruli | Patients | Glomeruli | Patients | Glomeruli | |
MCD | 15 | 103 | 10 | 66 | 5 | 37 |
MN | 15 | 148 | 9 | 101 | 6 | 45 |
TBMN | 15 | 124 | 11 | 83 | 4 | 39 |
Total | 45 | 375 | 30 | 250 | 15 | 121 |
TGH dataset was composed of MCD, MN, and TBMN WSIs. This dataset was split into training/validation and testing on a per-patient basis. Glomeruli and patient-level classification was trained/validated and tested using the following configuration. TGH, Toronto General Hospital; MCD, minimal change disease; MN, membranous nephropathy; TBMN, thin basement membrane nephropathy; WSI, whole-slide image.
Figure 1.
Overview of experimental design. MCD, minimal change disease; MN, membranous nephropathy; ROI, region of interest; TBMN, thin basement membrane nephropathy; TGH, Toronto General Hospital; WSI, whole-slide image.
Preprocessing
In total, four glomeruli were found to be outliers (<1.5 interquartile range of Q1), reducing the TGH dataset from 375 to 371 glomeruli images. See Supplemental Figure 2 (Supplemental Appendix 2) for the distributions in glomerular size and Supplemental Figure 3 for the images of removed glomeruli. Further analysis was conducted on the reduced set.
Glomerular Structure Segmentation Performance
Subglomerular structures were automatically segmented according to luminal, glomerular tuft, and nuclei structures. Sparse rectangular regions were annotated for luminal and glomerular tuft structures, as seen in Supplemental Figure 4 (Supplemental Appendix 4). First, manual annotations and automated segmentations were compared using DSC, EF, precision, and recall found in Supplemental Table 2.
To further verify the segmentation performance, a semisupervised k-means approach was used to develop ground truths comparable with gold standard annotations validated in Supplemental Table 3. Visual results of the automated segmentation and respective k-means ground truths can be found in Supplemental Figure 5. Figure 2 shows validation metric distributions (DSC, EF, precision, and recall) for each structure, and the mean metrics are summarized in Table 2. All three structures had high mean DSC values (>0.80), with the luminal structure having the highest and the nuclei structure having the lowest mean DSC. When analyzing EF results, each structure had relatively low false positive rates. Lastly, for both precision and recall metrics, the segmentation model performed well over all structures. The average segmentation DSC over luminal, glomerular tuft, and nuclei components across all diseases was 0.893±0.057, indicating overall high agreement between all structures and ground truths.
Figure 2.
Automated glomerular segmentation performance compared to k-means ground truths. (A) Dice similarity coefficient (DSC) measures overlap between segmented object and ground truth, (B) extra fraction (EF) measures false positive rate, (C) precision measures the proportion of correctly segmented pixels relative to the ground truth, and (D) recall measures the proportion of the ground truth pixels that were correctly identified by the predicted segmentation.
Table 2.
Mean dice similarity coefficient, extra fraction, precision, and recall for predicted glomerular segmentation with respect to glomerular disease using k-means ground truths
Segmentation Performance Metrics | Minimal Change Disease | Membranous Nephropathy | Thin Basement Membrane Nephropathy |
---|---|---|---|
DSC | 0.890±0.056 | 0.897±0.058a | 0.893±0.057 |
EF | 0.123±0.106 | 0.120±0.108a | 0.126±0.116 |
Precision | 0.890±0.086 | 0.893±0.089a | 0.888±0.092 |
Recall | 0.900±0.075 | 0.909±0.073a | 0.906±0.070 |
The DSC measures the overlap between the segmented object and ground truth, whereas EF measures the false positive rate (22). Precision and recall measure the proportion of correctly segmented pixels relative to the ground truth and the proportion of the ground truth pixels that were correctly identified by the predicted segmentation. DSC, dice similarity coefficient; EF, extra fraction.
Indicates highest segmentation performance.
Biomarker Feature Extraction
Exploratory analysis on the 233 biomarkers is visualized in Figure 3 and sample glomeruli are shown in Figure 3A. From the luminal, glomerular tuft, and nuclei structures, average proportions of each structure over all data were found for each respective disease (shown in Figure 3B). Important disease phenotypes were observed: MCD had the highest proportion of nuclei, TBMN had the highest proportion of lumen, and MN had the highest proportion of glomerular tuft, in relation to glomerular area. These observations reflect what pathologists observe but cannot necessarily quantify because MCD has larger podocyte nuclei reflecting hypertrophy, TBMN has thin glomerular walls (GBM thinning), and MN has diffused thickening of the GBM causing increased area of the glomerular tuft. From the hue color histogram shown in Figure 3C, the TBMN glomerulus had a higher hue mean, indicating increased luminal structure. Figure 3D illustrates the intrastructural distance feature for the glomerular tuft structure. Visual results in zoomed-in regions show GBM in yellow (high values) for MN, indicating slight thickening, whereas the GBM is dark blue (low values) for TBMN, which suggests thinning. Lastly, Figure 3E illustrates the color vector LBP texture map images that quantify ultrastructural spatial relationships between pixels. As can be seen, MCD and TBMN are finer in texture than MN, which is likely from an increase in GBM and overlapping mesangial matrix in MN.
Figure 3.
Visual representation of biomarker features extracted. (A) Columns represent sample MCD, MN, and TBMN diseases. (B) Bar and scatterplots visualizing glomerular structure proportions according to structure and disease. (C) Color features: displays the hue histogram for the following three sample images, with their corresponding mean values. (D) Morphologic features: glomerular tuft intrastructural distance feature maps for each corresponding sample image. Thicker structures are represented as red or orange color, whereas thinner structures are green and blue in color. (E) Microstructural texture features: texture maps for sample glomeruli using color vector local binary patterns. Scale bars, 100 µm. MCD, minimal change disease; MN, membranous nephropathy; TBMN, thin basement membrane nephropathy.
Classification Performance
Performance evaluation and hyperparameter tuning was completed on the glomerular training/validation set through five-fold crossvalidation. Patient-level classification was then performed on the held-out test set using the optimal model.
Feature Importance
The following four feature selection algorithms were used to reduce the feature set to small subsets of biomarkers: all features (233 features), statistical ANOVA F-value feature importance (ten features), GFI (ten features), and mRMR (ten features). The top ten features were selected for improved interpretability. See Supplemental Table 4 (Supplemental Appendix 5) for the top selected features from all approaches, along with all features.
Glomerular Classification Performance
Using the four feature sets, three different machine-learning classifiers were applied to the glomeruli validation and testing set. Five-fold crossvalidation was performed on the validation set for all feature sets and classifiers seen in Table 3. These results indicate that the GFI set and LDA classifier (GFI-LDA) had the highest performance in crossvalidation accuracy, at 68%±9%.
Table 3.
Average classification accuracy from five-fold crossvalidation for feature selection and classifier combinations
Feature Selection Method | Linear Discriminant Analysis | Random Forest | Logistic Regression |
---|---|---|---|
All Features | 36±9 | 64±13 | 58±37 |
ANOVA | 65±19 | 64±6 | 63±15 |
GFI | 68±9a | 67±12 | 65±31 |
mRMR | 67±11 | 67±13 | 61±24 |
The four feature selection methods analyzed were: all features, ANOVA F-value, GFI, and mRMR. The three classifiers analyzed were LDA, RF, and LR. ANOVA, ANOVA F-value; LDA, linear discriminant analysis; RF, random forest; LR, logistic regression; GFI, Gini feature importance; mRMR, maximum relevance minimum redundancy.
Indicates the model that achieved the best crossvalidation performance.
Using the GFI-LDA model, glomerular classification performance was measured on the held-out testing set of 121 glomeruli images. Accuracy, precision, recall, and F1-score performance metrics (described in Table 4) resulted in a testing accuracy of 77% and F1-score of 76.47. Supplemental Table 5 highlights the glomerular confusion matrix and how automated predictions are comparable with the gold standard disease labels. Although classification accuracy is high for MN, lower performance for MCD and TBMN stems from these diseases having more similar appearance under LM.
Table 4.
Glomerular classification performance for top feature sets and classifiers on the held-out test set
Classification Metrics | Feature Selection Method | Glomerular Classification | ||
---|---|---|---|---|
Linear Discriminant Analysis | Random Forest | Logistic Regression | ||
Accuracy | All features | 60 | 58 | 71 |
ANOVA | 65 | 71 | 65 | |
GFI | 77a | 69 | 75 | |
mRMR | 60 | 62 | 65 | |
Precision | All features | 58 | 53 | 70 |
ANOVA | 64 | 71 | 65 | |
GFI | 77 | 68 | 77a | |
mRMR | 59 | 59 | 64 | |
Recall | All features | 60 | 56 | 70 |
ANOVA | 63 | 70 | 64 | |
GFI | 76a | 67 | 74 | |
mRMR | 59 | 60 | 64 | |
F1-score | All features | 59 | 55 | 70 |
ANOVA | 63 | 70 | 64 | |
GFI | 76a | 68 | 76 | |
mRMR | 59 | 60 | 64 |
The four feature selection methods analyzed were: all features, ANOVA F-value, GFI, and mRMR. The three classifiers analyzed were LDA, RF, and LR. Accuracy, precision, recall, and F1-score classification performance metrics were evaluated. ANOVA, ANOVA F-value; GFI, Gini feature importance; LDA, linear discriminant analysis; RF, random forest; LR, logistic regression; mRMR, maximum relevance minimum redundancy.
Indicates the model that achieved the best classification performance.
Patient-Level Classification
Using the GFI-LDA model, patient-level classification was performed on the held-out testing set of WSIs (n=15 patients). The performance of the three patient-level classification methods examined is shown in Table 5. All three methods resulted in similar performance, with all glomeruli and the top four glomeruli resulting in accuracies and F1-scores of 87% and 85.94, respectively. Using all glomeruli for patient-level classification had the lowest confidence across all patients (Supplemental Figure 6). The method using the top glomerulus was more susceptible to misclassification than the others because only one sample is taken in the WSI. When scanning visually through a slide, a pathologist may look for a few glomeruli to make diagnostic inferences. The top four glomeruli for patient-level classification can, therefore, mimic how the pathologist would analyze a biopsy specimen. Therefore, this was the method chosen for patient-level classification. As shown in Supplemental Table 6, this classification model correctly predicted 100% of the MN WSIs, while predicting 80% and 75% of the MCD and TBMN WSIs, respectively. This indicates the model had difficulty differentiating between TBMN and MCD, even on the patient level.
Table 5.
Performance of three patient-level classification methods on the held-out test set
Method | Accuracy | Precision | Recall | F1-score |
---|---|---|---|---|
All glomeruli | 87a | 87a | 85a | 86a |
Top four glomeruli | 87a | 87a | 85a | 86a |
Top glomerulus | 80 | 82 | 78 | 80 |
The three patient-level classification methods analyzed were using all glomeruli, top four glomeruli, and top glomerulus. Accuracy, precision, recall, and F1-score classification performance metrics were evaluated.
Indicates the model that achieved the best classification performance.
Figure 4 illustrates the confidence rating for each of the held-out patients with respect to disease. All patients, except 8 and 43, were correctly classified, where patient 8 was predicted to have MN (TBMN ground truth) and patient 43 was predicted to have TBMN (MCD ground truth). Correctly classified and misclassified WSIs are shown in Figure 5. Figure 5A shows all glomeruli on the WSI were correctly predicted as MN, with a confidence of 99.65%. Figure 5, A1–A4, shows four glomeruli with high probabilities for MN (>0.98). In Figure 5B, the WSI label was predicted incorrectly as TBMN but was truly MCD with a confidence of 62%. Figure 5, B1–B4, shows the glomeruli with the highest probabilities (>72%). Three out of the four glomeruli were incorrectly predicted as TBMN, and one correctly predicted as MCD.
Figure 4.
Patient-level confidence results per testing patient. Each patient was predicted with a certain confidence corresponding to TBMN, MN, and MCD. The four highest glomerular probabilities were averaged and then used to get a patient-level confidence. The disease with the largest confidence determined the patient’s predicted diagnosis. Symbols the confidence bars indicates whether the patient was predicted correctly (checkmark) or incorrectly (×). TBMN, thin basement membrane nephropathy; MN, membranous nephropathy; MCD, minimal change disease.
Figure 5.
Correctly classified and misclassified patient WSI, and glomeruli with the highest probabilities. (A) Correctly predicted patient 16 to have MN with 99.65% confidence. (A1–A4) Top four glomeruli ROIs with highest probabilities from (A). (B) Patient 43 was misclassified as having TBMN while truly being diagnosed with MCD, with 62% confidence. (B1–B4) Top four glomeruli ROIs with highest probabilities from (B). WSI, whole-slide image; TBMN, thin basement membrane nephropathy; MN, membranous nephropathy; MCD, minimal change disease.
Glomerular Biomarker Analysis
The ten biomarkers selected by the GFI algorithm are visualized with distributions across diseases in Figure 6. Four microstructural texture and six morphologic features were selected as the best to discriminate between the pathologies. Color features were not represented (although two color features were selected when performing mRMR feature selection: Supplemental Table 4). This indicates color biomarkers were not as discriminative compared with the morphologic and texture counterparts.
Figure 6.
Glomerular biomarker distributions. Top four microstructural texture features and top six morphologic features and their respective disease group distributions corresponding to TBMN, MN, and MCD. CV-LBP, color vector local binary patterns; GLCM, gray-level co-occurrence matrices; GT, glomerular tuftTBMN, thin basement membrane nephropathy; MN, membranous nephropathy; MCD, minimal change disease.
The interpretation of the four microstructural texture features will be examined here. Namely, there were nuclei energy using wavelets, nuclei variance using color vector LBP, luminal contrast using wavelets, and glomerular correlation using GLCM. The two nuclei features were based on the color vector LBP and the wavelet energy. The mean color vector LBP was higher for TBMN, and lowest for MN, with MCD in the middle. LBP looks for repeating patterns of lines and edges, and a higher variance in these features indicates there are similar (and repeating) patterns in the objects that are being investigated. Therefore, this feature suggests there are more consistent texture patterns between nuclei in TBMN. The wavelet energy examines the magnitude and prevalence of multiscale edges in the image. If there are many, high-contrast edges in the nuclei, this feature will be low. In the analysis, we found the mean wavelet energy of the nuclei to be highest in MN, followed by MCD and TBMN. Although these nuclei texture features are not clinically reported, these features may provide mechanistic insights into differences between pathologies, which can be a future avenue of investigation regarding disease etiologies and differences. The GLCM correlation biomarker was higher for TBMN compared with MN and MCD. GLCM correlation measures the number of rapid changes of intensity in objects, where homogenous regions would have larger correlation values. Because there are large, continuous luminal regions in TBMN, this may explain the larger correlation value for this disease type. The variance seen in MN could relate to changes caused by irregular subepithelial deposits, influencing glomerular tuft structures. The last feature was the luminal contrast using wavelets. This feature looks at the variation in edge magnitudes in images. Luminal contrast was higher for TBMN, and lowest for MN, with MCD in the middle. Because TBMN has thinner membranes that are on the boundaries and interwoven within the luminal region, there is likely higher and more edge content that is reflected by the luminal contrast metric. MN was found to have less edge content and was more homogeneous in texture. Because MCD looks normal on the LM level, it is interesting that the average texture distribution lies between the two basement membrane diseases.
The glomerular structure proportions (Figure 3B) indicated key pathologic findings that were expressed through the morphologic features. MCD had the highest proportion of nuclei compared with the glomerular area. This is likely since MCD has podocyte nuclear enlargement. TBMN had the highest proportion of luminal area to glomeruli area, which can be attributed to thin glomerular walls. Lastly, the mean glomerular tuft composition was higher for MN, and lowest for TBMN, with MCD in the middle. This is seen clinically because MN has thicker glomerular walls, TBMN has thin glomerular walls, and MCD appears normal on LM. The top morphologic features were nuclei/glomerular tuft area ratio, nuclei/glomerular area ratio, glomerular tuft thickness greater than ten, glomerular tuft thickness greater than three, total glomerular tuft thickness, and glomerular circularity.
One-way ANOVA and pairwise Tukey post hoc test were performed to check for any significant differences between the top ten biomarker means across each disease with a confidence interval of 95%. Table 6 shows the statistical results of the selected biomarkers (in order of GFI importance). All biomarkers were statistically significant across group means. Tukey post hoc testing revealed that most features were significant between MCD-MN and MN-TBMN pairs; however, only nuclei/glomerular ratio, luminal contrast using wavelets, and glomerular circularity were significant between MCD and TBMN.
Table 6.
Biomarkers sorted by Gini feature importance with one-way ANOVA and Tukey post hoc statistical analysis across biomarker feature groups
Gini Feature Importance | Biomarkers | F | Pr>Fcrit | Minimal Change Disease versus Membranous Nephropathy | Minimal Change Disease versus Thin Basement Membrane Nephropathy | Membranous Nephropathy versus Thin Basement Membrane Nephropathy |
---|---|---|---|---|---|---|
0.0350 | Nuclei/glomerular tuft ratio | 125.78 | 2.36×10−42 | 0.001a | 0.3716 | 0.001a |
0.0198 | Nuclei/glomerular ratio | 71.15 | 7.54×10−27 | 0.001a | 0.001a | 0.001a |
0.0162 | Glomerular tuft thickness >10 | 37.16 | 2.00×10−15 | 0.001a | 0.4521 | 0.001a |
0.0157 | Wavelet: luminal contrast | 27.50 | 7.39×10−12 | 0.0088a | 0.001a | 0.001a |
0.0138 | Wavelet: nuclei energy | 39.56 | 2.74×10−16 | 0.001a | 0.6407 | 0.001a |
0.0122 | Glomerular tuft thickness >3 | 63.12 | 2.71×10−24 | 0.001a | 0.3075 | 0.001a |
0.0115 | Color vector LBP: nuclei variance | 31.53 | 2.30×10−13 | 0.001a | 0.5403 | 0.001a |
0.0103 | Glomerular tuft thickness | 59.81 | 3.22×10−23 | 0.001a | 0.3446 | 0.001a |
0.0098 | GLCM: glomerular correlation | 11.02 | 2.26×10−5 | 0.0849 | 0.0642 | 0.001a |
0.0097 | Glomerular circularity | 21.26 | 1.84× 10−9 | 0.0048a | 0.001a | 0.001a |
Pr>Fcrit, one-way ANOVA P-values; LBP, local binary patterns; GLCM, gray-level co-occurrence matrices.
Indicates disease group distributions means are significant and P value is <0.05.
Discussion
A computer-aided diagnosis system applied to PAS images for MCD, MN, and TBMN classification is presented that integrates preprocessing, glomerular structure segmentation, biomarker extraction, and glomerular and patient-level classification. Results showed that, from 233 explainable biomarkers, six morphologic and four microstructural texture features are enough to obtain high glomerular and patient-level classification accuracy. The top system combined the GFI and LDA classifier, with an accuracy of 68%±9% for the glomeruli crossvalidation, and an accuracy and F1-score of 77% and 76.47, respectively. For patient-level classification, the model had an accuracy and F1-score of 87% and 85.94, respectively.
The interaction between physicians and computer-aided systems is important because little or too much trust in a system can affect patient diagnoses (23). Therefore, the proposed confidence rating can be used to encourage more trust in the computer-generated results. Biopsy specimens from subjects in Figure 4 and Figure 5 show that patient-level decisions made on the basis of misclassified glomeruli, within the same biopsy specimen, often had a lower prediction confidence. These cases can be flagged for secondary review through traditional IF/EM approaches. The use of biologically relevant biomarker features is a key advantage in comparison to deep learning methods. Six morphologic (nuclei/glomerular tuft area ratio, nuclei/glomerular area ratio, glomerular tuft thickness greater than ten, glomerular tuft thickness greater than three, total glomerular tuft thickness, and glomerular circularity) and four microstructural texture features (luminal contrast using wavelets, nuclei energy using wavelets, nuclei variance using color vector LBP, and glomerular correlation using GLCM) were selected as the best performers to discriminate between the glomerular disorders. The selected morphologic features demonstrate that MCD had larger nuclei, reflecting podocyte hypertrophy; MN had glomerular tuft thickening; and TBMN had glomerular tuft thinning. Microstructural texture features described TBMN and MCD to be heterogeneous (nuclei and luminal structures), whereas MN was homogeneous (glomerular tuft thickening). For expanded applications, it may be relevant to use more of the 233 biomarker features.
Our study has some limitations. The dataset had only three pathologic lesions and, in the future, we will apply these tools to other glomerular diseases, such as FSGS. In addition, the cohort was obtained from a single center and was a small sample size, which we hope to bolster in future studies that include more diseases. The intent of this work was to determine whether explainable features from PAS-only biopsy specimens, which are routinely used in clinical practice, are sufficient for classification, and we believe we have satisfied these goals. Further analysis into other stains and pathologic modalities, i.e., IF and EM, could improve biomarker feature selection and analysis.
In conclusion, our work reveals that image analysis algorithms applied to glomerular diseases can quantify biomarkers that are compatible with our existing knowledge of pathogenesis. In the future, these tools can be applied on larger datasets with other glomerular diseases to quantitate subvisual features to seek linkages with clinical outcomes for biomarker discovery and for insights into disease mechanisms.
Disclosures
M. Barua reports having ownership interest in AstraZeneca; serving on the editorial board of Glomerular Diseases; receiving honoraria from Natera; and receiving research funding from Otsuka, Regulus, and Sanofi. All remaining authors have nothing to disclose.
Funding
This work was supported by Faculty of Engineering & Architectural Science, Ryerson University (A. Khademi), Alport Syndrome Foundation, and Gouvernement du Canada, Canadian Institutes of Health Research (CIHR).
Acknowledgments
We would like to acknowledge Ryerson University’s Dean’s Research Fund programs for funding this research. Additionally, we thank KRESCENT, Alport Syndrome Foundation, McLaughlin Centre–University of Toronto, NephCure Kidney International–Neptune, Can-SOLVE CKD Network, the TGH Foundation, and CIHR for their support.
Footnotes
See related editorial, “How Whole Slide Imaging and Machine Learning can Partner with Renal Pathology,” on pages 413–415.
Author Contributions
M. Barua and R. John were responsible for data curation; M. Barua, R. John, and A. Khademi were responsible for funding acquisition; M.N. Basso wrote the original draft and was responsible for formal analysis, investigation, methodology, and visualization; M.N. Basso, R. John, and A. Khademi conceptualized the study; A. Khademi provided supervision and was responsible for project administration; and all authors reviewed and edited the manuscript and were responsible for validation.
Data Sharing Statement
Data cannot be shared due to privacy and confidentiality of the patient data.
Supplemental Material
This article contains the following supplemental material online at http://kidney360.asnjournals.org/lookup/suppl/doi:10.34067/KID.0005102021/-/DCSupplemental.
Clinical information. Download Supplemental Appendix 1, PDF file, 943 KB (942.9KB, pdf)
Preprocessing: Glomeruli size variability. Download Supplemental Appendix 2, PDF file, 943 KB (942.9KB, pdf)
Glomerular structure segmentation performance. Download Supplemental Appendix 3, PDF file, 943 KB (942.9KB, pdf)
Classification performance: Glomerular and patient-level. Download Supplemental Appendix 4, PDF file, 943 KB (942.9KB, pdf)
Biomarker feature extraction: Feature sets. Download Supplemental Appendix 5, PDF file, 943 KB (942.9KB, pdf)
Sample kidney needle biopsies from TGH dataset. (A) MCD, (B) MN, and (C) TBMN biopsy images. Download Supplemental Figure 1, PDF file, 943 KB (942.9KB, pdf)
Glomerular size distribution on a patient basis. Download Supplemental Figure 2, PDF file, 943 KB (942.9KB, pdf)
Glomeruli that were removed from analysis after performing glomeruli size outlier analysis. Download Supplemental Figure 3, PDF file, 943 KB (942.9KB, pdf)
Sample manual segmentation ground truths for each disease class and structure. Download Supplemental Figure 4, PDF file, 943 KB (942.9KB, pdf)
Sample segmentation results for each disease class with respect to k-means ground truths. Download Supplemental Figure 5, PDF file, 943 KB (942.9KB, pdf)
A comparison between the three patient-level classification methods; all glomeruli, top 4 glomeruli, and top glomerulus for the held-out test set. Download Supplemental Figure 6, PDF file, 943 KB (942.9KB, pdf)
Additional clinical information from the TGH dataset. Download Supplemental Table 1, PDF file, 943 KB (942.9KB, pdf)
Mean DSC, EF, precision, and recall for predicted glomerular structure segmentation with respect to glomerular disease when compared to manual ground truths. Download Supplemental Table 2, PDF file, 943 KB (942.9KB, pdf)
Mean DSC, EF, precision, and recall for k-means glomerular structure segmentation with respect to glomerular disease when compared to manual ground truths. Download Supplemental Table 3, PDF file, 943 KB (942.9KB, pdf)
Confusion matrix containing the number of correctly classified glomerular images as either minimal change disease (MCD), membranous nephropathy (MN), or thin-basement membrane nephropathy (TBMN). Download Supplemental Table 4, PDF file, 943 KB (942.9KB, pdf)
Confusion matrix containing the number of correctly classified WSI’s as minimal change disease (MCD), membranous nephropathy (MN), or thin-basement membrane nephropathy (TBMN). Download Supplemental Table 5, PDF file, 943 KB (942.9KB, pdf)
Biomarker features according to feature group (color, morphological, and microstructural texture). Download Supplemental Table 6, PDF file, 943 KB (942.9KB, pdf)
References
- 1.Sim JJ, Bhandari SK, Batech M, Hever A, Harrison TN, Shu YH, Kujubu DA, Jonelis TY, Kanter MH, Jacobsen SJ: End-stage renal disease and mortality outcomes across different glomerulonephropathies in a large diverse US population. Mayo Clin Proc 93: 167–178, 2018. 10.1016/j.mayocp.2017.10.021 [DOI] [PubMed] [Google Scholar]
- 2.Cascarano GD, Debitonto FS, Lemma R, Brunetti A, Buongiorno D, De Feudis I, Guerriero A, Rossini M, Pesce F, Gesualdo L, Bevilacqua V: An innovative neural network framework for glomerulus classification based on morphological and texture features evaluated in histological images of kidney biopsy. In: Intelligent Computing Methodologies. ICIC 2019. Lecture Notes in Computer Science, Vol. 11645, edited by Huang DS, Huang ZK, Hussain A, Cham, Switzerland, Springer, 2019, pp 727–738 [Google Scholar]
- 3.Barros GO, Navarro B, Duarte A, Dos-Santos WLC: PathoSpotter-K: A computational tool for the automatic identification of glomerular lesions in histological images of kidneys. Sci Rep 7: 46769, 2017. 10.1038/srep46769 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ginley B, Lutnick B, Jen KY, Fogo AB, Jain S, Rosenberg A, Walavalkar V, Wilding G, Tomaszewski JE, Yacoub R, Rossi GM, Sarder P: Computational segmentation and classification of diabetic glomerulosclerosis. J Am Soc Nephrol 30: 1953–1967, 2019. 10.1681/ASN.2018121259 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Martel AL, Hosseinzadeh D, Senaras C, Zhou Y, Yazdanpanah A, Shojaii R, Patterson ES, Madabhushi A, Gurcan MN: An image analysis resource for cancer research: PIIP–Pathology Image Informatics Platform for visualization, analysis, and management. Cancer Res 77: e83–e86, 2017. 10.1158/0008-5472.CAN-17-0323 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Hoy WE, Samuel T, Hughson MD, Nicol JL, Bertram JF: How many glomerular profiles must be measured to obtain reliable estimates of mean glomerular areas in human renal biopsies? J Am Soc Nephrol 17: 556–563, 2006. 10.1681/ASN.2005070772 [DOI] [PubMed] [Google Scholar]
- 7.Reinhard E, Ashikhmin M, Gooch B, Shirley P: Color transfer between images. IEEE Comput Graph Appl 21: 34–41, 2001. 10.1109/38.946629 [DOI] [Google Scholar]
- 8.Gallego J, Pedraza A, Lopez S, Steiner G, Gonzalez L, Laurinavicius A, Bueno G: Glomerulus classification and detection based on convolutional neural networks. J Imaging 4: 20, 2018. 10.3390/jimaging4010020 [DOI] [Google Scholar]
- 9.Bueno G, Fernandez-Carrobles MM, Gonzalez-Lopez L, Deniz O: Glomerulosclerosis identification in whole slide images using semantic segmentation. Comput Methods Programs Biomed 184: 105273, 2020. 10.1016/j.cmpb.2019.105273 [DOI] [PubMed] [Google Scholar]
- 10.Ginley BG, Tomaszewski JE, Jen K-Y, Fogo A, Jain S, Sarder P: Computational analysis of the structural progression of human glomeruli in diabetic nephropathy. Presented at 2018 SPIE Medical Imaging: Digital Pathology, Houston, TX, February 10–15, 2018. Available at: 10.1117/12.2295249 [DOI]
- 11.Reinhard E, Khan EA, Oğuz Akyüz A, Johnson G: Color Imaging: Fundamentals and Applications, Natick, MA, A K Peters, 2008 [Google Scholar]
- 12.Fernández-Carrobles MM, Bueno G, Déniz O, Salido J, García-Rojo M, González-López L: Influence of texture and colour in breast TMA classification. PLoS One 10: e0141556, 2015. 10.1371/journal.pone.0141556 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Khademi A, Krishnan S: Shift-invariant discrete wavelet transform analysis for retinal image classification. Med Biol Eng Comput 45: 1211–1222, 2007. 10.1007/s11517-007-0273-z [DOI] [PubMed] [Google Scholar]
- 14.Porebski A, Vandenbroucke N, Macaire L: Haralick feature extraction from LBP images for color texture classification. Presented at the 2008 First Workshops on Image Processing Theory, Tools and Applications, Sousse, Tunisia, November 23–26, 2008. Available at: 10.1109/IPTA.2008.4743780. Accessed February 7, 2021 [DOI] [Google Scholar]
- 15.Khademi A, Krishnan S: Medical image texture analysis: A case study with small bowel, retinal and mammogram images. Presented at the 2008 Canadian Conference on Electrical and Computer Engineering, Niagara Falls, Canada, May 4–7, 2008. Available at: 10.1109/CCECE.2008.4564884. Accessed February 7, 2021 [DOI]
- 16.Khademi A, Krishnan S: Multiresolution analysis and classification of small bowel medical images. Annu Int Conf IEEE Eng Med Biol Soc 2007: 4524–4527, 2007. 10.1109/IEMBS.2007.4353345 [DOI] [PubMed] [Google Scholar]
- 17.Radovic M, Ghalwash M, Filipovic N, Obradovic Z: Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinformatics 18: 9, 2017. 10.1186/s12859-016-1423-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Menze BH, Kelm BM, Masuch R, Himmelreich U, Bachert P, Petrich W, Hamprecht FA: A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformatics 10: 213, 2009. 10.1186/1471-2105-10-213 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Ding C, Peng H: Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3: 185–205, 2005. 10.1142/S0219720005001004 [DOI] [PubMed] [Google Scholar]
- 20.Balakrishnama S, Ganapathiraju A, Picone J: Linear discriminant analysis for signal processing problems. Presented at the Proceedings IEEE Southeastcon ’99. Technology on the brink of 2000, Lexington, KY, March 25–28, 1999. Available at: 10.1109/SECON.1999.766096. Accessed February 7, 2021 [DOI]
- 21.Khairunnahar L, Hasib MA, Rezanur RHB, Islam MR, Hosain MK: Classification of malignant and benign tissue with logistic regression [published correction appears in Inform Med Unlocked 20: 100435, 2020]. Inform Med Unlocked 16: 100189, 2019. 10.1016/j.imu.2019.100189 [DOI] [Google Scholar]
- 22.Pontalba JT, Gwynne-Timothy T, David E, Jakate K, Androutsos D, Khademi A: Assessing the impact of color normalization in convolutional neural network-based nuclei segmentation frameworks. Front Bioeng Biotechnol 7: 300, 2019. 10.3389/fbioe.2019.00300 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Jorritsma W, Cnossen F, van Ooijen PMA: Improving the radiologist-CAD interaction: Designing for appropriate trust. Clin Radiol 70: 115–122, 2015. 10.1016/j.crad.2014.09.017 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Clinical information. Download Supplemental Appendix 1, PDF file, 943 KB (942.9KB, pdf)
Preprocessing: Glomeruli size variability. Download Supplemental Appendix 2, PDF file, 943 KB (942.9KB, pdf)
Glomerular structure segmentation performance. Download Supplemental Appendix 3, PDF file, 943 KB (942.9KB, pdf)
Classification performance: Glomerular and patient-level. Download Supplemental Appendix 4, PDF file, 943 KB (942.9KB, pdf)
Biomarker feature extraction: Feature sets. Download Supplemental Appendix 5, PDF file, 943 KB (942.9KB, pdf)
Sample kidney needle biopsies from TGH dataset. (A) MCD, (B) MN, and (C) TBMN biopsy images. Download Supplemental Figure 1, PDF file, 943 KB (942.9KB, pdf)
Glomerular size distribution on a patient basis. Download Supplemental Figure 2, PDF file, 943 KB (942.9KB, pdf)
Glomeruli that were removed from analysis after performing glomeruli size outlier analysis. Download Supplemental Figure 3, PDF file, 943 KB (942.9KB, pdf)
Sample manual segmentation ground truths for each disease class and structure. Download Supplemental Figure 4, PDF file, 943 KB (942.9KB, pdf)
Sample segmentation results for each disease class with respect to k-means ground truths. Download Supplemental Figure 5, PDF file, 943 KB (942.9KB, pdf)
A comparison between the three patient-level classification methods; all glomeruli, top 4 glomeruli, and top glomerulus for the held-out test set. Download Supplemental Figure 6, PDF file, 943 KB (942.9KB, pdf)
Additional clinical information from the TGH dataset. Download Supplemental Table 1, PDF file, 943 KB (942.9KB, pdf)
Mean DSC, EF, precision, and recall for predicted glomerular structure segmentation with respect to glomerular disease when compared to manual ground truths. Download Supplemental Table 2, PDF file, 943 KB (942.9KB, pdf)
Mean DSC, EF, precision, and recall for k-means glomerular structure segmentation with respect to glomerular disease when compared to manual ground truths. Download Supplemental Table 3, PDF file, 943 KB (942.9KB, pdf)
Confusion matrix containing the number of correctly classified glomerular images as either minimal change disease (MCD), membranous nephropathy (MN), or thin-basement membrane nephropathy (TBMN). Download Supplemental Table 4, PDF file, 943 KB (942.9KB, pdf)
Confusion matrix containing the number of correctly classified WSI’s as minimal change disease (MCD), membranous nephropathy (MN), or thin-basement membrane nephropathy (TBMN). Download Supplemental Table 5, PDF file, 943 KB (942.9KB, pdf)
Biomarker features according to feature group (color, morphological, and microstructural texture). Download Supplemental Table 6, PDF file, 943 KB (942.9KB, pdf)