Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2022 Jul 11;17(7):e0271161. doi: 10.1371/journal.pone.0271161

Evaluating tubulointerstitial compartments in renal biopsy specimens using a deep learning-based approach for classifying normal and abnormal tubules

Satoshi Hara 1,2,#, Emi Haneda 3,#, Masaki Kawakami 3, Kento Morita 3, Ryo Nishioka 2, Takeshi Zoshima 2, Mitsuhiro Kometani 4, Takashi Yoneda 4,5,6, Mitsuhiro Kawano 2,*, Shigehiro Karashima 7, Hidetaka Nambo 3,*
Editor: Franziska Theilig8
PMCID: PMC9273082  PMID: 35816495

Abstract

Renal pathology is essential for diagnosing and assessing the severity and prognosis of kidney diseases. Deep learning-based approaches have developed rapidly and have been applied in renal pathology. However, methods for the automated classification of normal and abnormal renal tubules remain scarce. Using a deep learning-based method, we aimed to classify normal and abnormal renal tubules, thereby assisting renal pathologists in the evaluation of renal biopsy specimens. Consequently, we developed a U-Net-based segmentation model using randomly selected regions obtained from 21 renal biopsy specimens. Further, we verified its performance in multiclass segmentation by calculating the Dice coefficients (DCs). We used 15 cases of tubulointerstitial nephritis to assess its applicability in aiding routine diagnoses conducted by renal pathologists and calculated the agreement ratio between diagnoses conducted by two renal pathologists and the time taken for evaluation. We also determined whether such diagnoses were improved when the output of segmentation was considered. The glomeruli and interstitium had the highest DCs, whereas the normal and abnormal renal tubules had intermediate DCs. Following the detailed evaluation of the tubulointerstitial compartments, the proximal, distal, atrophied, and degenerated tubules had intermediate DCs, whereas the arteries and inflamed tubules had low DCs. The annotation and output areas involving normal and abnormal tubules were strongly correlated in each class. The pathological concordance for the glomerular count, t, ct, and ci scores of the Banff classification of renal allograft pathology remained high with or without the segmented images. However, in terms of time consumption, the quantitative assessment of tubulitis, tubular atrophy, degenerated tubules, and the interstitium was improved significantly when renal pathologists considered the segmentation output. Deep learning algorithms can assist renal pathologists in the classification of normal and abnormal tubules in renal biopsy specimens, thereby facilitating the enhancement of renal pathology and ensuring appropriate clinical decisions.

Introduction

Renal diseases are a significant global burden in all facets of health and economy [1, 2]. Therefore, the precise diagnosis of kidney diseases is a prerequisite for selecting an appropriate treatment strategy. As the golden standard for diagnosing kidney diseases, renal pathology is essential. Information obtained from renal biopsy specimens is used to confirm the diagnosis and further assess the severity and prognosis of kidney disease. Therefore, to ensure appropriate clinical decisions, the accurate assessment of renal biopsy specimens is essential.

Currently, deep learning-based approaches have developed rapidly, and they have been applied extensively in the subspecialty of renal pathology [3]. Specifically, convolutional neural networks (CNNs), which are the most popular deep learning-based techniques, are mainly used for the automated detection and morphometric analysis of histological components and in the prediction of renal disease prognosis. The applications of CNNs in renal pathology include glomerular counting [48], global glomerulosclerosis [914], podocyte morphometric analysis [1417], the classification of diabetic glomerulosclerosis [18], IgA nephropathy [19, 20], glomerular hypercellularity [21], several glomerular changes [22], kidney transplant pathology [2325], interstitial fibrosis and tubular atrophy [10, 11, 14, 2628], vascular detection [28], immunofluorescence staining patterns [29], and the classification of normal and abnormal structures in the renal cortex [4, 3032] (Table 1). However, studies on the development of CNNs that can be successfully applied in the classification of normal and abnormal renal tubules [4, 5, 11, 30], which remains a challenging domain even among renal pathologists, are scarce. Because tubulointerstitial abnormalities significantly predict the outcome of various renal diseases, including acute tubulointerstitial nephritis, diabetic nephropathy, lupus nephritis, and allograft kidneys [3337], it is crucial to evaluate tubulointerstitial abnormalities quantitatively.

Table 1. Deep learning methodologies used for renal pathological studies.

Methodology Stains Histological primitive Number of WSIs or cases Task Ref. No.
U-Net with ResNet34 backbone PAS (paraffin sections) Glomerulosclerosis, tubular atrophy 83 WSIs from human transplant biopsies Segmentation and classification of glomerular and tubular structures [11]
PAS, MT (paraffin sections) Arteries, interstitial fibrosis 65 WSIs from human transplant biopsies Segmentation of kidney blood vessel and fibrosis [28]
U-Net PAS (paraffin section) Glomeruli, sclerotic glomeruli, empty Bowman’s capsule, proximal tubuli, distal tubuli, atrophic tubuli, undefined tubuli, capsule, arteries 137 WSIs from 122 human kidney transplant biopsies and 15 human nephrectomy specimens Segmentation and classification of multiclass for histological primitives [4]
PAS, HE, PAM, MT Glomerular tuft, glomeruli, proximal tubules, distal tubules, artery, peritubular capillaries 459 curated WSIs from 125 human biopsies with minimal change disease Multiclass segmentation of histological primitives [32]
PAS (paraffin section) Glomerular tuft, glomeruli, tubules, arteries, arterial lumina, tubular atrophy, glomerular size, interstitial expansion 168 WSIs from 16 humans, 41 healthy mice, 75 murine disease models, 30 other species, and 6 others Multiclass segmentation of histological primitives [31]
U-Net PAS (paraffin sections) Glomeruli 22 WSIs from mouse kidneys Glomerular segmentation [8]
U-Net and Yolo V2 architecture CNN CD3, CD4, CD8, CD20, T-bet, GATA3, CD68, CD163 Interstitial infiltration of inflammatory cells 22 WSIs from human kidney transplant biopsies Quantitative assessment of the inflammatory infiltrates [25]
U-Net and Mask R-CNN PAS (paraffin section) Interstitial fibrosis, tubular atrophy, interstitial inflammation 789 WSIs from human kidney transplant biopsies Compartment or mononuclear leukocyte detection and tissue detection to predict Banff scores (ci, ct, ti) and rejection [24]
U-Net, DenseNet, LSTM-GCNet, 2D V-Net PAS (paraffin section) Glomeruli, mesangial hypercellularity 400 WSIs from human kidney biopsies with IgA nephropathy Detection of glomerular location, lesion identification, glomeruli decomposition, mesangial hypercellularity score calculation [20]
U-Net and U-Net cycleGAN WT1, DACH1 Glomeruli, podocytes 110 WSIs from human kidney biopsies with ANCA-associated glomerulonephritis Podocyte morphometrics [17]
VGG16 HE (frozen and paraffin sections) Glomeruli, glomerulosclerosis 149 WSIs (98 frozen and 51 paraffin sections) from human kidney biopsies Quantification of the percent global glomerulosclerosis [9]
Inceptionv3 PAS, HE, PAM (paraffin sections) Normal, antibody-mediated rejection, T-cell mediated rejection, mixed rejection, borderline T-cell mediated rejection, other disease 5,844 WSIs from human kidney transplant biopsies Classification of Banff category [23]
PAS, PAM (paraffin sections) Glomerulosclerosis, segmental sclerosis, endocapillary proliferation, mesangial matrix accumulation, mesangial cell proliferation, crescent, basement membrane structural changes 15,888 glomeruli images from 283 human kidney biopsies Classification of multiple glomerular findings [22]
DeepLab V2 PAS Nonsclerotic glomeruli, sclerotic glomeruli, IFTA 223 WSIs from human kidney biopsies with 148 diabetic nephropathy and 75 allograft kidneys Detection and quantification of the percentages of glomerulosclerosis and IFTA [10]
PAS, HE (paraffin sections) Nonsclerotic glomeruli, globally sclerotic glomeruli, podocyte nuclei, other nuclei, interstitial fibrosis, tubular atrophy WSIs from mice kidneys and human kidney biopsies Segmentation of multiclasses of histological primitives [14]
DeepLabv2 ResNet and RNN PAS (paraffin section) Nuclear component, PAS-positive component, luminal component 54 WSIs from human kidney biopsies and 25 WSIs from mice kidneys Detection and segmentation of glomerular boundaries on WSIs; diabetic nephropathy classification/prediction [18]
SegNet and DeepLab v3+ with ResNet backbone PAS (paraffin section) Glomerulosclerosis 26 WSI from donor kidney biopsies Glomerular detection and classification [12]
DeepLab v3 and pix2pix GAN PAS, p57, WT1 (paraffin sections) Podocyte nuclei 122 WSIs from mice, rat, human kidney specimens Automatically detection and quantification of podocytes [16]
SegNet-VGG19 and fine-tuned AlexNet PAS (paraffin section) Glomerulosclerosis 47 WSIs from human kidney biopsies Segmentation and classification of glomeruli [13]
ResNet-101 Immunofluorescence (frozen section) Appearance (granular, linear, pseudolinear), distribution (focal, diffuse, segmental, global), location (mesangial, capillary wall), intensity (0–3) 12,259 images from 2,542 subjects undergoing kidney biopsies Classification of immune deposits on glomeruli [29]
fine-tuned NASNet HE (paraffin section) Unsupervised extracted features 68 WSIs form human kidney biopsies with IgA nephropathy Extraction of features associated with clinical parameters; after clustering, multiclass classification of defined clusters to produce scores [19]
CNN and SVM PAS, HE (paraffin sections) Endocapillary hypercellularity, mesangial hypercellularity, endoMes (both lesions) hypercellularity, normal glomeruli 811 images (300 images of normal human glomeruli and 511 images of human glomeruli with hypercellularity) Classification of glomerular hypercellularity [21]
Google’s Inception v3 MT (paraffin section) Interstitial fibrosis 171 WSIs from human kidney biopsies Prediction of clinical phenotype [26]
MT (paraffin section) Glomeruli 275 WSIs from 171 human kidney biopsies Glomerular segmentation and classification [7]
glapathnet (FPN) MT (paraffin section) Interstitial fibrosis 67 WSIs from human kidney biopsies Prediction of the IFTA grade [27]
AlexNet + SVM PAS (paraffin section) Glomeruli, mesangial matrix expansion, tubular nuclei, tubular vacuolization 98 glomeruli from 17 mice kidneys, 500 image patches of tubule structure Glomerular detection; classification of glomeruli and tubules [5]
Region-based CNN (AlexNet) MT (paraffin section) Glomeruli 87 WSIs from rat kidneys and 6 WSIs from human kidney biopsies Glomerular localization and detection [6]
Pix2pix GAN PAS, WT-1 (paraffin sections) Glomeruli, podocytes 24 WSIs from 14 mice kidneys Automated detection of podocytes [15]

ANCA, antineutrophil cytoplasmic antibody; CNN, convolutional neural network; FPN, feature pyramid network; GAN, generative adversarial network; GCNet, graph convolutional network; HE, hematoxylin eosin; IFTA, interstitial fibrosis and tubular atrophy; MT, Masson’s-trichrome; NASNet: neural architecture search network; PAM, periodic-acid silver methenamine; PAS, periodic-acid Schiff; ResNet, residual network; RNN, recurrent neural network; SVM, support-vector machine; WSI, whole-slide image; WT-1, Wilms tumor-1

In this study, we aimed to classify normal and abnormal renal tubules precisely by developing a segmentation model using U-Net [38], which is a representative CNN-based architecture mainly used for the segmentation of biomedical images. We improved U-Net by implementing fine finetuning and Dice cross-entropy [39, 40]. We annotated the abnormal tubules in detail, including the atrophic and degenerated tubules as well as tubulitis. The automated classification of renal tubules could help renal pathologists evaluate renal biopsy specimens rapidly and accurately.

Methods

Renal biopsy specimens

We used formalin-fixed, paraffin-embedded needle-core biopsies obtained from 21 patients (7 patients 1 h after renal transplantation and 14 patients with tubulointerstitial nephritis) who underwent renal biopsy between 2000 and 2020 at Kanazawa University Hospital and its affiliated hospitals. Because various kidney diseases can involve glomeruli in addition to tubulointerstitial compartments, we needed to collect homogenous samples that involved only the tubulointerstitial compartments for annotation. Thus, specimens with tubulointerstitial nephritis without other involvement were used to annotate abnormal tubulointerstitial structures, whereas specimens collected 1 h after renal transplantation were nearly healthy controls to annotate normal kidney structures. In each specimen, a 2 μm section was stained using a periodic-acid Schiff staining reagent.

This study was approved by the Ethical Committee of Kanazawa University (approval No. 2020–178). The ethics committee waived the requirement for obtaining informed consent from the participants because our study design is retrospective and does not involve any further tests or treatments of the participants. In addition, all data were fully anonymized before we accessed them. Further, all participants had access to the detailed information about the study, including the purpose, subjects, and content, available on our website. All subjects were allowed to withdraw from the study participation using a written form whenever they wanted. All these processes were approved by the Ethical Committee of Kanazawa University.

Ground truth training and test sets

From 21 kidney specimens, 311 regions were randomly selected, and 500×500 μm2 (approximately 1,000×1,000 pixels) images were captured by a human observer. For each image, the corresponding annotation data were generated using the MATLAB Image Labeler (MathWorks, MA). The annotation data included images labeled pixel-by-pixel for each tissue. Two patterns of classes were marked; (1) five classes: “glomeruli,” “normal tubules,” “abnormal tubules,” “arteries,” and “interstitium” and (2) eight classes: “glomeruli,” “proximal tubules,” “distal tubules,” “arteries,” “tubulitis,” “degenerated tubules,” “atrophic tubules,” and the “interstitium.” These are in the palette format of the PNG images.

The annotations were carried out by a nephrologist with sufficient experience in renal pathology (S.H.). Because the number of renal pathologists is still quite small in Japan, nephrologists are trained and practice renal pathology in most facilities. The annotations performed by S.H. were double-checked by another nephrologist with sufficient renal pathology experience (M.K.) to improve the annotation quality. When the two nephrologists had different opinions, they discussed the issue and then annotated after reaching concordance.

All the normal or abnormal glomeruli were labeled as “glomeruli.” Thin ascending limbs of Henle, convoluted distal tubules, and cortical collecting ducts were labeled as “distal tubules.” The “arteries” included archery arteries, interlobular arteries, and arterioles. Tubules with infiltration of inflammatory cells and without atrophy or degeneration were defined as “tubulitis.” The “atrophic tubules” showed narrowing of the tubular lumen owing to atrophy or the wrinkling of the tubular basement membranes, regardless of inflammatory infiltration, without tubular degeneration. The “degenerated tubules” were defined as tubular abnormalities, such as tubular vacuolation, tubular simplification, budding, loss of brush border, and cell detachment, excluding tubular atrophy and tubulitis. All other unlabeled structures were included in the “interstitium” category.

First, the kidney biopsy images were annotated with eight classes as described. Then, the eight classes were recategorized into five classes. “Proximal tubules” and “distal tubules” were recategorized into “normal tubules,” whereas “atrophic tubules,” “tubulitis,” and “degenerated tubules” were recategorized into “abnormal tubules.” The total numbers in the annotated training and test sets are listed in Table 2.

Table 2. Number of annotations per class used in the training and test sets of U-Net.

Normal tubules Abnormal tubules
  Glomeruli Proximal tubules Distal tubules Atrophic tubules Tubulitis Degenerated tubules Arteries
Train 141 2,798 1,877 1,465 618 1,307 205
Test 35 700 469 266 155 327 51
Total 176 3,498 2,346 1,831 773 1,634 256

CNN design

We used U-Net for semantic segmentation. U-Net is a model that applies a CNN [38]. Finetuning was implemented using the VGG-16 model [41], which was pretrained on the ImageNet dataset, as the U-Net encoder. The model inputs were the image and annotation data, and the output was the label information for each pixel. We compared the segmentation models FCN, U-Net, PSP-Net, and DeepLab v3 in a preliminary study, and chose U-Net as the most suitable for the present study because it exhibited the highest accuracy and relatively clear segmented images (S1 Table and S1 Fig).

To train the model, we used 80% of the prepared images, which were randomly selected, and the remaining 20% were used to evaluate the model’s performance. One image was only used for the training or the test set. The input images for the model were resized to 512×512 pixels. In addition, we standardized the color appearance by the setting of mean (0.485, 0.456, 0.406) and standard deviation (0.229, 0.224, 0.225) as compared to RGB. Data augmentation was performed during the training process to improve the model’s generalization performance, even with a limited amount of data. We adjusted contrast and flipped horizontally at a rate of 50% and rotated in a range of -15° to +15° for each epoch within random ranges. For contrast adjustment, we calculated the average gray color of the input image in grayscale, and then we created an image “a” of that single gray color. Next, we overlaid the input image and image “a,” where the alpha value was a numerical value between 0.5 and 1.5. The alpha value signifies the transparency, and the formula for the output image is given as follows: output = image “a” × (1.0—alpha) + input image × alpha. A value of zero signifies a solid gray image, whereas a value of one signifies that the input image remains the same. All these processes were performed using Python functions. The number of epochs was set to 200. Adam was used as the learning rate optimization algorithm, and Dice cross-entropy was used as the loss function. The output of U-Net was the probability of each label per pixel, and the label with the highest probability was assigned as the predicted label for that pixel.

Assessment of U-Net’s performance

The Dice coefficient (DC), score of the similarity between two sets, was used to evaluate the segmentation accuracy. The DC for two sets A and B, which ranges from 0 to 1, is defined as follows: Because the ground truth (A) and the segmentation result (B) are similar, i.e., the model’s performance is higher, the DC value becomes larger and closer to one. We calculated the DC for each label. Cross-validation was performed 20 times, and the median DC value was calculated.

Agreement rate and time comparison between renal pathologists referring to and not referring to U-Net-segmented images

To evaluate the usefulness of our U-Net algorithm, we examined the agreement ratio between two nephrologists with sufficient experience in renal pathology (R.N. and T.Z.), with and without U-Net-segmented images. For this evaluation, we selected another 15 specimens of tubulointerstitial nephritis obtained through renal biopsies between 2000 and 2020 at Kanazawa University Hospital and its affiliated hospitals. We needed to collect homogenous samples that involved only the tubulointerstitial compartments for validation. Thus, patients with tubulointerstitial nephritis without other involvement were used to estimate abnormal tubulointerstitial structures.

In each sample, a 2 μm section was stained using periodic-acid Schiff staining reagent, and we created whole-slide images for U-Net segmentation. Each renal pathologist evaluated all the biopsy specimens twice. The first assessment was performed without the reference of U-Net-segmented images (U-Net- group), and the other assessment was performed with the reference of U-Net-segmented images (U-Net+ group). There was a washout period of at least two weeks between the U-Net- and U-Net+ groups to avoid habituation effects on the samples. The order of evaluation was crossed: U-Net-→U-Net+ group in nine cases and U-Net+→U-Net-group in six cases. In each review, renal pathologists examined the (1) glomerular count, the (2) Banff t, ct, and ci scores [42], and the (3) percentage of tubulitis, tubular atrophy, degenerated tubules, and interstitial spaces. Each pathologist recorded the total time taken.

Statistical analysis

Interclass correlation coefficient (ICC) values (2.1) were calculated for the agreement ratio of continuous variables among the renal pathologists. Cohen’s κ was calculated for the agreement ratio of categorical variables among the renal pathologists. Non-parametric parameters of the two groups were compared using the Mann-Whitney U test. The areas of output were compared with those of annotations using linear regression analysis, and the coefficients of determination were calculated. The significance level for all the analyses was set at 0.05.

Results

Segmentation performance of U-Net for detecting abnormal tubules

First, we performed the semantic segmentation of five classes (glomeruli, normal tubules, abnormal tubules, arteries, and the interstitium) to clarify whether our U-Net can distinguish between normal and abnormal tubules. Representative examples of the ground truth and segmentation masks used in the test set are shown in Fig 1. The multiclass segmentation performance of U-Net was evaluated and calculated using the DCs listed in Table 3. The highest DCs obtained were for the interstitium and glomeruli. Normal and abnormal tubules had middle DCs. A low DC was observed in the arteries. A confusion matrix shows the way in which one class could be misidentified as a different class (Table 4). Normal tubules were often misidentified as the interstitium but not as abnormal tubules, whereas abnormal tubules were often misidentified as normal tubules (19%) or the interstitium (17%). Arteries were mostly misidentified as the interstitium (64%).

Fig 1. Representative images of ground truth and eight-class segmentation using U-Net.

Fig 1

(A) Whole-slide image of segmentation using U-Net in a specimen with tubulointerstitial nephritis. (B) PAS-stained slide, ground truth, and segmentation using U-Net. The top row represents a normal specimen; the middle and bottom rows represent specimens with tubulointerstitial nephritis.

Table 3. Dice coefficients per class.

Features Five classes, median (IQR1, IQR3) Eight classes, median (IQR1, IQR3)
Glomeruli 0.88 (0.55, 0.90) 0.88 (0.56, 0.90)
Normal Tubules 0.76 (0.64, 0.79)
    Proximal Tubules 0.69 (0.49, 0.74)
    Distal Tubules 0.65 (0.53, 0.68)
Abnormal Tubules 0.67 (0.56, 0.69)
Atrophied Tubules 0.55 (0.38, 0.59)
Tubulitis 0.30 (0.094, 0.35)
Degenerated Tubules 0.48 (0.29, 0.54)
Arteries 0.059 (0, 0.16) 0.027 (0, 0.29)
Interstitium 0.81 (0.74, 0.83) 0.81 (0.74, 0.82)

IQR: interquartile range

Table 4. Confusion matrix for five-class segmentation using U-Net.

Interstitium Glomeruli Normal tubules Arteries Abnormal tubules
Interstitium 0.81 0.0013 0.12 0.0012 0.054
Glomeruli 0.11 0.83 0.034 0.0048 0.022
Normal tubules 0.12 0.0021 0.79 0.00017 0.085
Arteries 0.64 0.096 0.063 0.096 0.10
Abnormal tubules 0.17 0.0039 0.19 0.0005 0.63

The ground truth labels are given vertically, and the segmentation model’s predictions are given horizontally.

Detection of different types of abnormal tubules using U-Net

Next, we performed the semantic segmentation of eight classes (glomeruli, proximal tubules, distal tubules, atrophied tubules, tubulitis, degenerated tubules, arteries, and the interstitium) to verify whether our U-Net can be used to detect different types of abnormal tubules in detail. Representative examples of the ground truth and segmentation masks used in the test set are shown in Fig 2. The multiclass segmentation performance of the U-Net was evaluated using the DCs listed in Table 3. The highest DCs were obtained from the interstitium and glomeruli as well as from the five classes of semantic segmentation. Proximal tubules, distal tubules, atrophied tubules, and degenerated tubules had intermediate DCs. Arteries and tubulitis had low DCs. In the confusion matrix, proximal tubules were misidentified as the interstitium (13%) or as degenerated tubules (11%) (Table 5). Distal tubules were misidentified as the interstitium (14%). Arteries were mostly misidentified as the interstitium (60%). Tubulitis was misidentified as the interstitium (21%), distal tubules (15%), or degenerated tubules (15%). Degenerated tubules were misidentified as proximal tubules (17%) or the interstitium (16%). Atrophied tubules were misidentified as the interstitium (17%) or as degenerated tubules (10%).

Fig 2. Representative images of ground truth and eight-class segmentation using U-Net.

Fig 2

(A) Whole-slide image of segmentation using U-Net in a specimen with tubulointerstitial nephritis. (B) PAS-stained slide, ground truth, and segmentation using U-Net. The top row represents a normal specimen, and the second through fourth rows represent specimens with tubulointerstitial nephritis.

Table 5. Confusion matrix for eight-class segmentation using U-Net.

Interstitium Glomeruli Proximal tubules Distal tubules Arteries Tubulitis Degenerated tubules Atrophic tubules
Interstitium 0.82 0.015 0.077 0.036 0.0024 0.011 0.029 0.014
Glomeruli 0.083 0.85 0.030 0.013 0.0023 0.0050 0.012 0.0053
Proximal tubules 0.13 0.0030 0.70 0.033 0.00067 0.017 0.11 0.011
Distal tubules 0.14 0.0050 0.067 0.67 0.00096 0.077 0.017 0.015
Arteries 0.60 0.086 0.020 0.036 0.14 0.023 0.039 0.064
Tubulitis 0.21 0.00071 0.087 0.15 0.027 0.28 0.15 0.12
Degenerated tubules 0.16 0.0055 0.17 0.015 0.0033 0.065 0.52 0.054
Atrophic tubules 0.17 0.0026 0.063 0.035 0.0012 0.085 0.11 0.53

The ground truth labels are given vertically, and the segmentation model’s predictions are given horizontally.

We also quantified the areas of each class using U-Net to determine whether the algorithm could precisely estimate the area of normal and abnormal tubulointerstitial lesions (Fig 3), which directly resulted in a reasonable prediction of renal prognosis. We found a strong correlation between annotations and the segmentation model predictions in the glomeruli, proximal tubules, distal tubules, and the interstitium. Various abnormal tubules, such as tubulitis, degenerated tubules, atrophied tubules, and arteries, were also moderately correlated between annotations and segmentation model predictions.

Fig 3. Correlation of areas between annotations and segmentation model predictions.

Fig 3

There were high correlations in the interstitium, glomeruli, proximal tubules, and distal tubules. Tubulitis, degenerated tubules, atrophied tubules, and arteries were moderately correlated between annotations and segmentation model predictions.

Application of U-Net-segmented images to diagnostic situations by renal pathologists

Finally, we evaluated the usefulness of U-Net-segmented images as an aid for routine diagnostic work performed by renal pathologists. We investigated whether referring to five classes of U-Net-segmented images would improve the agreement ratios between two renal pathologists when evaluating tubulointerstitial findings in renal biopsy specimens and the time required for evaluation.

The ICCs for the glomerular count were 0.97 and 0.95 for the U-Net- and U-Net+ groups, respectively (Table 6). The Cohen’s κ values of the Banff t, ct, and ci scores were similar at high levels in both groups, ranging from 0.91 to 0.92 in the U-Net- group and 0.81 to 0.94 in the U-Net+ group. The ICCs for the quantitative evaluation of areas in tubulitis, tubular atrophy, degenerated tubules, and the interstitium were low in the U-Net- group (0.14–0.59). However, in the U-Net+ group, the ICCs improved significantly (0.52–0.81), except for degenerated tubules (0.17). Furthermore, referring to the U-Net-segmented images improved the median time for evaluation from 317 s to 214 s [214 s {interquartile range1 (IQR1)180, IQR3 280} in the U-Net+ group vs. 317 s (IQR1 260, IQR3 371) in the U-Net- group; p = 0.044].

Table 6. Agreement ratios between renal pathologists with and without U-Net-segmented images.

U-Net- group U-Net+ group
κ ICC κ ICC
Glomerular count 0.97 0.95
t score 0.92 0.90
ct score 0.91 0.95
ci score 0.91 0.82
%Tubulitis 0.14 0.52
%Tubular atrophy 0.28 0.76
%Degenerative tubules 0.18 0.17
%Interstitial space 0.59 0.81

ICC, intraclass correlation coefficient

Discussion

In this study, we developed a U-Net-based segmentation model to classify the multisystem compartments of renal biopsy specimens primarily related to normal and abnormal tubules. Our developed U-Net could classify normal and abnormal tubules with high accuracy. However, it was still challenging to identify the exact type of abnormal tubules. On the other hand, our U-Net was suitable for the quantitative evaluation of the area in each class and was helpful as an aid for renal pathologists in evaluating tubulointerstitial lesions among renal biopsy specimens.

In this study, we annotated the most significant number of tubular components to discriminate the types of abnormal tubules by adopting U-Net, which is used for the semantic segmentation of kidney histology [4, 17, 24, 31, 32]. Hermsen et al. achieved multiclass segmentation through U-Net, which showed high DCs on multiclass structures, using whole-slide images obtained from multicenter institutions [4]. Normal tubules were detected highly, but the DCs of both atrophic and undefined tubules were low (0.49 and 0.30, respectively) [4]. In this study, we prepared the most significant amount of annotated data for different types of normal and abnormal tubules, and the detection rate of atrophic tubules was improved. Degenerated tubules were moderately detected, but the model’s performance in detecting tubulitis was low. This may be as a result of the diversity of abnormal tubular findings and the fact that different types of abnormal tubular findings often coincide within the same tubules.

The second notable point of the present study is that we improved U-Net by implementing finetuning and Dice cross-entropy. For finetuning, we used the VGG-16 model [41], which was pretrained on the ImageNet dataset, as the U-Net encoder. The introduction of finetuning did not change the accuracy but shortened the learning time taken. It needed about 150 epochs without finetuning to maintain high accuracy, whereas approximately 90 epochs were needed with finetuning. In addition, we adapted Dice cross-entropy as a loss function. Dice cross-entropy is a combination of Dice loss and cross-entropy [39, 40]. Dice cross-entropy improved accuracy more than other loss functions such as focal loss and cross-entropy in our preliminary study. We believe that the use of Dice cross-entropy in renal pathological studies is lacking. Recently, studies have been conducted to detect tubulointerstitial abnormalities using various methodologies. Ginley et al. developed a DeepLab v2-based algorithm to assess interstitial fibrosis and tubular atrophy (IFTA) and glomerulosclerosis in native and transplanted kidneys [10]. They achieved the automated detection and quantification of IFTA lesions by setting IFTA collectively without considering each compartment of IFTA. Bouteldja et al. conducted the multiclass segmentation of healthy and five murine disease models using U-Net [31]. They extracted tubular dilation and atrophy by measuring the tubular diameter. Yi et al. constructed a deep learning-based model through the combination of a mask region-based CNN and U-Net algorithms to recognize normal and abnormal tissue compartments in transplant kidneys, including the Banff t, ci, and ct scores [24]. They applied their algorithms to the prediction of graft survival. Furthermore, Salvi et al. employed two different U-Nets, denoted TSC and TCC, and obtained excellent performance in tubular segmentation (DC = 0.92) [11]. Essentially, although it is still challenging to determine the types of abnormal tubules using U-Net alone, in addition to increasing the validity and the number of annotations, the improvement of deep learning-based methods and their combination with clinical information would be required to improve accuracy in the detection of different types of abnormal tubules and enhancing its significance in clinical outcomes.

Another noteworthy aspect of this study is that referring to the U-Net-segmented images can help renal pathologists in evaluating tubulointerstitial lesions accurately and rapidly. The five-class segmented images were visually easier to understand and more accurate than those of the eight-class segmentation. Therefore, the five-class segmentation images were used to assist renal pathologists in evaluating renal biopsy specimens. The glomerular count and tubulointerstitial compartments of Banff scoring showed the highest agreement with and without U-Net-segmented images. However, interestingly, in the quantitative evaluation of tubular abnormalities, which are more difficult for renal pathologists to assess, U-Net significantly improved the interpathologist agreement ratios, except for degenerated tubules. This may be as a result of the high correlation between the U-Net-segmented and annotated regions in each class. Because abnormal tubulointerstitial areas are associated with worsening renal prognoses in various kidney diseases [26, 3337], the accurate assessment and quantification of odd tubular areas would improve the quality of the prediction of renal prognosis. Furthermore, the improvement in the time required for evaluation by referring to the segmented images using U-Net is another advantage of U-Net in the reduction of the physical burden on renal pathologists [10]. This includes the development of an application for automated detection and quantification, which would help renal pathologists estimate renal prognosis promptly. In addition, the link between U-Net-based segmentation and clinical information would be useful to predict renal prognosis more precisely. This would notably improve the estimation of renal prognosis compared with the current method of semi-quantification of tubulointerstitial compartments in both native kidney specimens [43] and the Banff-grading system of kidney allografts [42].

This study has several limitations. First, our developed U-Net did not recognize tubules as single structures, and different normal and abnormal tubules were mixed within a single tubule, thereby resulting in lower DCs. Second, a relatively small number of renal pathologists participated in this study to validate the usefulness of referring to U-Net-segmented images. Finally, our developed U-Net had a significantly low accuracy for the “arteries” class. The number of annotated arteries was small. Specifically, the number of annotated arteries was 256 of 311 regions taken and 80% of them were used for training and the remaining 20% for testing. This is insufficient for U-Net to train for detecting arteries in the test set. In addition, the size of the arteries was extremely small compared with other compartments. The areas of “arteries” are approximately one-fortieth of those of “interstitium.” Thus, “arteries” tended to be misrecognized as “interstitium.” This study focused on tubulointerstitial structures, and further examination is required to scan the entire renal biopsy specimens, including the arteries.

In conclusion, our deep learning algorithm assisted renal pathologists in detecting and quantifying different types of normal and abnormal tubules in renal biopsy specimens. However, because the current algorithm is still insufficient for the automated detection and classification of different types of abnormal tubules, we must improve its predictive accuracy. Nevertheless, our current algorithm can be expected to help renal pathologists evaluate renal biopsy specimens accurately and rapidly, thereby contributing to highly appropriate clinical decisions.

Supporting information

S1 Fig. Representative images of ground truth and eight-class segmentation using various deep learning methods.

PAS-stained slide, ground truth, and segmentation using U-Net. The top row represents a normal specimen, and the second through fourth rows represent specimens with tubulointerstitial nephritis.

(TIF)

S1 Table. Dice coefficients of various deep learning methods.

(DOCX)

S1 File. Dataset of the present study.

(XLSX)

Acknowledgments

We would like to thank Yuya Honda and Hiroka Furuya for their support in annotating the images. We would also like to thank Editage (www.editage.com) for English language editing.

Data Availability

All relevant data are within the paper and its Supporting Information files.

Funding Statement

The authors received no specific funding for this work.

References

  • 1.GBD Chronic Kidney Disease Collaboration. Global, regional, and national burden of chronic kidney disease, 1990–2017: A systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2020;395: 709–733. doi: 10.1016/S0140-6736(20)30045-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Wang V, Vilme H, Maciejewski ML, Boulware LE. The economic burden of chronic kidney disease and end-stage renal disease. Semin Nephrol. 2016;36: 319–330. doi: 10.1016/j.semnephrol.2016.05.008 [DOI] [PubMed] [Google Scholar]
  • 3.Barisoni L, Lafata KJ, Hewitt SM, Madabhushi A, Balis UGJ. Digital pathology and computational image analysis in nephropathology. Nat Rev Nephrol. 2020;16: 669–685. doi: 10.1038/s41581-020-0321-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Hermsen M, de Bel T, den Boer M, Steenbergen EJ, Kers J, Florquin S, et al. Deep learning-based histopathologic assessment of kidney tissue. J Am Soc Nephrol. 2019;30: 1968–1979. doi: 10.1681/ASN.2019020144 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Sheehan S, Mawe S, Cianciolo RE, Korstanje R, Mahoney JM. Detection and classification of novel renal histologic phenotypes using deep neural networks. Am J Pathol. 2019;189: 1786–1796. doi: 10.1016/j.ajpath.2019.05.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Bukowy JD, Dayton A, Cloutier D, Manis AD, Staruschenko A, Lombard JH, et al. Region-based convolutional neural nets for localization of glomeruli in trichrome-stained whole kidney sections. J Am Soc Nephrol. 2018;29: 2081–2088. doi: 10.1681/ASN.2017111210 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Kannan S, Morgan LA, Liang B, Cheung MG, Lin CQ, Mun D, et al. Segmentation of glomeruli within trichrome images using deep learning. Kidney Int Rep. 2019;4: 955–962. doi: 10.1016/j.ekir.2019.04.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Gupta L, Klinkhammer BM, Boor P, Merhof D, Gadermayr M. Iterative learning to make the most of unlabeled and quickly obtained labeled data in histology. Proc mach learn res. 2019;102: 215–224. [Google Scholar]
  • 9.Marsh JN, Liu T, Wilson PC, Swamidass SJ, Gaut JP. Development and validation of a deep learning model to quantify glomerulosclerosis in kidney biopsy specimens. JAMA Netw Open. 2021;4: e2030939. doi: 10.1001/jamanetworkopen.2020.30939 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Ginley B, Jen KY, Han SS, Rodrigues L, Jain S, Fogo AB, et al. Automated computational detection of interstitial fibrosis, tubular atrophy, and glomerulosclerosis. J Am Soc Nephrol. 2021;32: 837–850. doi: 10.1681/ASN.2020050652 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Salvi M, Mogetta A, Gambella A, Molinaro L, Barreca A, Papotti M, et al. Automated assessment of glomerulosclerosis and tubular atrophy using deep learning. Comput Med Imaging Graph. 2021;90: 101930. doi: 10.1016/j.compmedimag.2021.101930 [Epub 2021 May 2]. [DOI] [PubMed] [Google Scholar]
  • 12.Altini N, Cascarano GD, Brunetti A, Marino F, Rocchetti MT, Matino S, et al. Semantic segmentation framework for glomeruli detection and classification in kidney histological sections. Electronics. 2020;9: 503. doi: 10.3390/electronics9030503 [DOI] [Google Scholar]
  • 13.Bueno G, Fernandez-Carrobles MM, Gonzalez-Lopez L, Deniz O. Glomerulosclerosis identification in whole slide images using semantic segmentation. Comput Methods Programs Biomed. 2020;184: 105273. doi: 10.1016/j.cmpb.2019.105273 [Epub 2019 Dec 19]. [DOI] [PubMed] [Google Scholar]
  • 14.Lutnick B, Ginley B, Govind D, McGarry SD, LaViolette PS, Yacoub R, et al. An integrated iterative annotation technique for easing neural network training in medical image analysis. Nat Mach Intell. 2019;1: 112–119. doi: 10.1038/s42256-019-0018-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Govind D, Santo BA, Ginley B, Yacoub R, Rosenberg AZ, Jen KY, et al. Automated detection and quantification of Wilms’ tumor 1-positive cells in murine diabetic kidney disease. Proc SPIE Int Soc Opt Eng. 2021;11603. doi: 10.1117/12.2581387 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Govind D, Becker JU, Miecznikowski J, Rosenberg AZ, Dang J, Tharaux PL, et al. PodoSighter: A cloud-based tool for label-free podocyte detection in kidney whole-slide images. J Am Soc Nephrol. 2021;32: 2795–2813. doi: 10.1681/ASN.2021050630 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Zimmermann M, Klaus M, Wong MN, Thebille AK, Gernhold L, Kuppe C, et al. Deep learning-based molecular morphometrics for kidney biopsies. JCI Insight. 2021;6: e144779. doi: 10.1172/jci.insight.144779 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ginley B, Lutnick B, Jen KY, Fogo AB, Jain S, Rosenberg A, et al. Computational segmentation and classification of diabetic glomerulosclerosis. J Am Soc Nephrol. 2019;30: 1953–1967. doi: 10.1681/ASN.2018121259 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Sato N, Uchino E, Kojima R, Sakuragi M, Hiragi S, Minamiguchi S, et al. Evaluation of kidney histological images using unsupervised deep learning. Kidney Int Rep. 2021;6: 2445–2454. doi: 10.1016/j.ekir.2021.06.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Zeng C, Nan Y, Xu F, Lei Q, Li F, Chen T, et al. Identification of glomerular lesions and intrinsic glomerular cell types in kidney diseases via deep learning. J Pathol. 2020;252: 53–64. doi: 10.1002/path.5491 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Chagas P, Souza L, Araújo I, Aldeman N, Duarte A, Angelo M, et al. Classification of glomerular hypercellularity using convolutional features and support vector machine. Artif Intell Med. 2020;103: 101808. doi: 10.1016/j.artmed.2020.101808 [DOI] [PubMed] [Google Scholar]
  • 22.Uchino E, Suzuki K, Sato N, Kojima R, Tamada Y, Hiragi S, et al. Classification of glomerular pathological findings using deep learning and nephrologist-AI collective intelligence approach. Int J Med Inform. 2020;141: 104231. doi: 10.1016/j.ijmedinf.2020.104231 . [DOI] [PubMed] [Google Scholar]
  • 23.Kers J, Bülow RD, Klinkhammer BM, Breimer GE, Fontana F, Abiola AA, et al. Deep learning-based classification of kidney transplant pathology: A retrospective, multicentre, proof-of-concept study. Lancet Digit Health. 2022;4: e18–e26. doi: 10.1016/S2589-7500(21)00211-9 Online ahead of print. [DOI] [PubMed] [Google Scholar]
  • 24.Yi Z, Salem F, Menon MC, Keung K, Xi C, Hultin S, et al. Deep learning identified pathological abnormalities predictive of graft loss in kidney transplant biopsies. Kidney Int. 2022;101: 288–298. doi: 10.1016/j.kint.2021.09.028 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Hermsen M, Volk V, Bräsen JH, Geijs DJ, Gwinner W, Kers J, et al. Quantitative assessment of inflammatory infiltrates in kidney transplant biopsies using multiplex tyramide signal amplification and deep learning. Lab Invest. 2021;101: 970–982. doi: 10.1038/s41374-021-00601-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Kolachalama VB, Singh P, Lin CQ, Mun D, Belghasem ME, Henderson JM, et al. Association of pathological fibrosis with renal survival using deep neural networks. Kidney Int Rep. 2018;3: 464–475. doi: 10.1016/j.ekir.2017.11.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Zheng Y, Cassol CA, Jung S, Veerapaneni D, Chitalia VC, Ren KYM, et al. Deep-learning-driven quantification of interstitial fibrosis in digitized kidney biopsies. Am J Pathol. 2021;191: 1442–1453. doi: 10.1016/j.ajpath.2021.05.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Salvi M, Mogetta A, Meiburger KM, Gambella A, Molinaro L, Barreca A, et al. Karpinski score under digital investigation: A fully automated segmentation algorithm to identify vascular and stromal injury of donors’ kidneys. Electronics. 2020;9: 1644. doi: 10.3390/electronics9101644 [DOI] [Google Scholar]
  • 29.Ligabue G, Pollastri F, Fontana F, Leonelli M, Furci L, Giovanella S, et al. Evaluation of the classification accuracy of the kidney biopsy direct immunofluorescence through convolutional neural networks. Clin J Am Soc Nephrol. 2020;15: 1445–1454. doi: 10.2215/CJN.03210320 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Eccher A, Neil D, Ciangherotti A, Cima L, Boschiero L, Martignoni G, et al. Digital reporting of whole-slide images is safe and suitable for assessing organ quality in preimplantation renal biopsies. Hum Pathol. 2016;47: 115–120. doi: 10.1016/j.humpath.2015.09.012 [DOI] [PubMed] [Google Scholar]
  • 31.Bouteldja N, Klinkhammer BM, Bülow RD, Droste P, Otten SW, Freifrau von Stillfried S, et al. Deep learning-based segmentation and quantification in experimental kidney histopathology. J Am Soc Nephrol. 2021;32: 52–68. doi: 10.1681/ASN.2020050597 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Jayapandian CP, Chen Y, Janowczyk AR, Palmer MB, Cassol CA, Sekulic M, et al. Development and evaluation of deep learning-based segmentation of histologic structures in the kidney cortex with multiple histologic stains. Kidney Int. 2021;99: 86–101. doi: 10.1016/j.kint.2020.07.044 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Srivastava A, Palsson R, Kaze AD, Chen ME, Palacios P, Sabbisetti V, et al. The prognostic value of histopathologic lesions in native kidney biopsy specimens: Results from the Boston kidney biopsy cohort study. J Am Soc Nephrol. 2018;29: 2213–2224. doi: 10.1681/ASN.2017121260 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Furuichi K, Shimizu M, Yuzawa Y, Hara A, Toyama T, Kitamura H, et al. Clinicopathological analysis of biopsy-proven diabetic nephropathy based on the Japanese classification of diabetic nephropathy. Clin Exp Nephrol. 2018;22: 570–582. doi: 10.1007/s10157-017-1485-7 [DOI] [PubMed] [Google Scholar]
  • 35.Bajema IM, Wilhelmus S, Alpers CE, Bruijn JA, Colvin RB, Cook HT, et al. Revision of the International Society of Nephrology/Renal Pathology Society classification for lupus nephritis: Clarification of definitions, and modified National Institutes of Health activity and chronicity indices. Kidney Int. 2018;93: 789–796. doi: 10.1016/j.kint.2017.11.023 [DOI] [PubMed] [Google Scholar]
  • 36.Park KS, Park SJ, Park H, Kim M, Park J, Chung HC, et al. Association of baseline histopathology and kidney donor risk index with graft outcomes in deceased donor kidney transplantation. Clin Nephrol. 2019;91: 363–369. doi: 10.5414/CN109639 [DOI] [PubMed] [Google Scholar]
  • 37.Valluri A, Hetherington L, Mcquarrie E, Fleming S, Kipgen D, Geddes CC, et al. Acute tubulointerstitial nephritis in Scotland. Q J M. 2015;108: 527–532. doi: 10.1093/qjmed/hcu236 [DOI] [PubMed] [Google Scholar]
  • 38.Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells W, Frangi A, editors Medical Image Computing and Computer-Assisted Intervention—MICCAI. Lect Notes Comput Sci. MICCAI 2015, vol 9351. Cham: Springer; 2015. doi: 10.1007/978-3-319-24574-4_28 [DOI] [Google Scholar]
  • 39.Isensee F, Petersen J, Klein A, Zimmerer D, Jaeger PF, Kohl S, et al. nnU-Net: Self-adapting framework for U-Net-Based medical image segmentation. Available from: arXiv:1809.10486v1; 2018: ArXiv. [Google Scholar]
  • 40.Patravali J, Jain S, Chilamkurthy S. 2D-3D fully convolutional neural networks for cardiac MR segmentation. Available from: arXiv:1707.09813v1; 2017: ArXiv. [Google Scholar]
  • 41.Karen S, Zisserman A. Very deep convolutional networks for large-scale image recognition. Available from: arXiv:1409.1556v6; 2014: ArXiv. [Google Scholar]
  • 42.Roufosse C, Simmonds N, Clahsen-van Groningen M, Haas M, Henriksen KJ, Horsfield C, et al. A 2018 reference guide to the Banff classification of renal allograft pathology. Transplantation. 2018;102: 1795–1814. doi: 10.1097/TP.0000000000002366 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Sethi S, D’Agati VD, Nast CC, Fogo AB, De Vriese AS, Markowitz GS, et al. A proposal for standardized grading of chronic changes in native kidney biopsy specimens. Kidney Int. 2017;91: 787–789. doi: 10.1016/j.kint.2017.01.002 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Franziska Theilig

6 Apr 2022

PONE-D-22-02424Evaluating tubulointerstitial compartments in renal biopsy specimens using a deep learning-based approach for classifying normal and abnormal tubulesPLOS ONE

Dear Dr. Kawano,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by May 21 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Franziska Theilig

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. In the ethics statement in the Methods and online submission information, please ensure that you have specified (1) whether consent was informed and (2) what type you obtained (for instance, written or verbal, and if verbal, how it was documented and witnessed). If your study included minors, state whether you obtained consent from parents or guardians. If the need for consent was waived by the ethics committee, please include this information.

If you are reporting a retrospective study of medical records or archived samples, please ensure that you have discussed whether all data were fully anonymized before you accessed them and/or whether the IRB or ethics committee waived the requirement for informed consent. If patients provided informed written consent to have data from their medical records used in research, please include this information.

3. Thank you for stating the following financial disclosure:

“The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

At this time, please address the following queries:

a)        Please clarify the sources of funding (financial or material support) for your study. List the grants or organizations that supported your study, including funding received from your institution.

b)        State what role the funders took in the study. If the funders had no role in your study, please state: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

c)        If any authors received a salary from any of your funders, please state which authors and which funders.

d)        If you did not receive any funding for this study, please state: “The authors received no specific funding for this work.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

4. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors present a deep learning method for the segmentation of renal structures in histopathological images. The manuscript is easy to follow and results are sound. My comments are listed below:

- Background and related works: several papers have been published on this topic (e.g.: doi: 10.1016/j.compmedimag.2021.101930, doi:doi.org/10.3390/electronics9030503, doi: 10.1016/j.cmpb.2019.105273, doi: 10.3390/electronics9101644). authors should at least include these references within the article. To give the reader an idea of current approaches to assessing kidney disease (glomerulosclerosis, tubular atrophy, fibrosis, etc.), the authors could include a table summarizing all state-of-the-art methods.

- Novelty: it is unclear where the novelty lies in the proposed approach, since a well-known segmentation network (UNET) is used for the segmentation task. Was any particular training technique used? Was any kind of pre- or post-processing employed? The authors should highlight the technical novelty (if any) of the work

- Page 7, Line 106: is unclear when five classes are used and when eight classes are employed.

- page 10, Line 140: please specify what kind of operation is performed on RGB images (contrast adjustment)

- Future work?

Reviewer #2: This study sought to distinguish between different kinds of renal tissues on pathology, particularly normal and abnormal tubules using deep learning. To that end, they trained and validated a U-Net based segmentation model. Next, they evaluated the agreement between two pathologists for different tissue types (both with and without the output of the segmentation), as well as the time it took for evaluation.

1) Abstract: Line 47: “whereas the arteries and tubulitis.” Do you mean “arteries and tubules” or do you mean to refer to the pathological condition of “tubulitis?” I presume you mean the latter, but the wording here is a bit confusing when first read, as there appears to be a switch between anatomical structures and a pathological condition.

2) Abstract: Line 49: “The pathological concordance for the glomerular count, Banff t, ct, and ci scores remained high with or without the segmented images.” You may want to clarify if you are referring to the Banff Classification of Renal Allograft Pathology (I presume).

3) Introduction: Line 83: “Because tubulointerstitial abnormalities significantly predict the outcome of renal diseases.” I would consider giving a few examples of these diseases.

4) Methods: Line 95: The Introduction talks about renal diseases in general, but here very specific patients were selected: "We used formalin-fixed, paraffin-embedded needle-core biopsies obtained from 21 patients (7 patients 1 h after renal transplantation and 14 patients with tubulointerstitial nephritis)." It would be helpful to provide an explanation of why these particular patients were selected.

5) Methods, Line 110: “The annotations were carried out by a nephrologist with sufficient experience in renal pathology (S.H.).” Did you consider having more than one nephrologist with renal pathology experience label some of the images to determine their concordance?

6) Methods, Line 133: “We compared the segmentation models FCN, U-Net, PSP-Net, and Deeplab v3 in advance, and we chose U-Net as it was the most suitable for our preliminary data.” Consider citing these other models. Also, please clarify what you mean by “it was the most suitable for our preliminary data.” Did it have the best performance?

7) Methods, Line 135: “To train the model, we used 80% of the prepared images, which were randomly selected, and the remaining 20% were used to evaluate the model’s performance.” Earlier, you state that from 21 kidney specimens, 311 regions were randomly selected. Did regions from the same patient ever end up in both the training set and the test set?

8) Methods, Line 159: “For this evaluation, we selected another 15 specimens of tubulointerstitial nephritis.” Like #4, it would be helpful to have a brief explanation of why this patient population was selected (as opposed to the one referred to earlier).

9) Table 4: Why would you say that the arteries were so frequently identified as interstitium?

10) Results: Line 230, Line 231, Line 234, Figure 3: Please clarify what you mean by “renal outcome” and “output.” Also, in Figure 3, please consider labeling the y-axis with units.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Massimo Salvi

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Jul 11;17(7):e0271161. doi: 10.1371/journal.pone.0271161.r002

Author response to Decision Letter 0


20 May 2022

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

Response:

We appreciate your comment. Accordingly, we have double-checked and confirmed that our revised manuscript meets PLOS ONE’s style requirements. We have also ensured that all our submission files are named according to the PLOS ONE file naming requirements.

2. In the ethics statement in the Methods and online submission information, please ensure that you have specified (1) whether consent was informed and (2) what type you obtained (for instance, written or verbal, and if verbal, how it was documented and witnessed). If your study included minors, state whether you obtained consent from parents or guardians. If the need for consent was waived by the ethics committee, please include this information.

If you are reporting a retrospective study of medical records or archived samples, please ensure that you have discussed whether all data were fully anonymized before you accessed them and/or whether the IRB or ethics committee waived the requirement for informed consent. If patients provided informed written consent to have data from their medical records used in research, please include this information.

Response:

This study was approved by the Ethical Committee of Kanazawa University (approval No. 2020-178). The ethics committee waived the requirement for obtaining informed consent from the participants because our study design is retrospective and does not involve any further tests or treatments of the participants. In addition, all data were fully anonymized before we accessed them. Further, all participants have access to detailed information about the study, including the purpose, subjects, and content, which is available on our website. All subjects were also allowed to withdraw from the study participation using a written form whenever they wanted. All these processes were approved by the Ethical Committee of Kanazawa University. We included these explanations in the Methods section (lines 117–124) and online submission information.

3. Thank you for stating the following financial disclosure:

“The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

At this time, please address the following queries:

a) Please clarify the sources of funding (financial or material support) for your study. List the grants or organizations that supported your study, including funding received from your institution.

b) State what role the funders took in the study. If the funders had no role in your study, please state: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

c) If any authors received a salary from any of your funders, please state which authors and which funders.

d) If you did not receive any funding for this study, please state: “The authors received no specific funding for this work.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

Response:

We apologize for writing that statement; in fact, we had no funders for the present study. Thus, we would like to amend the statement to “The authors received no specific funding for this work.” We included it in the cover letter as well.

4. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

Response:

We appreciate your comment. We have added our dataset as S1 File to improve the transparency of the present study.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

________________________________________

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

________________________________________

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

________________________________________

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

________________________________________

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors present a deep learning method for the segmentation of renal structures in histopathological images. The manuscript is easy to follow and results are sound. My comments are listed below:

Response:

We appreciate your detailed review and comments. We have revised our manuscript according to your advice, and our point-by-point responses are provided below.

- Background and related works: several papers have been published on this topic (e.g.: doi: 10.1016/j.compmedimag.2021.101930, doi:doi.org/10.3390/electronics9030503, doi: 10.1016/j.cmpb.2019.105273, doi: 10.3390/electronics9101644). authors should at least include these references within the article. To give the reader an idea of current approaches to assessing kidney disease (glomerulosclerosis, tubular atrophy, fibrosis, etc.), the authors could include a table summarizing all state-of-the-art methods.

Response:

We appreciate your comment. According to your advice, we have added all these important related works as references (refs.11–13, 28) and discuss them in Introduction and Discussion (lines 75–83, 364–365). In addition, we have added a table summarizing the state-of-the-art methods for assessing renal pathology (Table 1).

- Novelty: it is unclear where the novelty lies in the proposed approach, since a well-known segmentation network (UNET) is used for the segmentation task. Was any particular training technique used? Was any kind of pre- or post-processing employed? The authors should highlight the technical novelty (if any) of the work

Response:

We appreciate your comment. The technical novelty of the present study includes finetuning and Dice cross-entropy.

First, finetuning was implemented using the VGG-16 model (ref. 39), which was pretrained on the ImageNet dataset, as the U-Net encoder. The introduction of finetuning did not change the accuracy but shortened the learning time taken. It needed about 150 epochs without finetuning to maintain high accuracy, whereas approximately 90 epochs were needed with finetuning.

Second, we adapted Dice cross-entropy as a loss function. Dice cross-entropy is a combination of Dice loss and cross-entropy (refs. 39,40). Dice cross-entropy improved accuracy more than other loss functions such as focal loss and cross-entropy in our preliminary study. We believe that that the use of Dice cross-entropy in renal pathological studies is lacking.

With regard to preprocessing, we employed augmentation such as left–right flipping, rotation, and contrast adjustment, image sizing, and standardization of color information. As for post-processing, the output of the model was the probability of each of the labels for each pixel, and the process was to use the highest probability as the predicted label for the model.

We included the above explanations in the Introduction section (lines 97–98) and Methods section (lines 164–165, 171–185) and highlighted in the Discussion section (lines 343–352).

- Page 7, Line 106: is unclear when five classes are used and when eight classes are employed.

Response:

We appreciate your comment. First, the kidney biopsy images were annotated with eight classes as described. Then, the eight classes were recategorized into five classes. “Proximal tubules” and “distal tubules” were recategorized into “normal tubules,” whereas “atrophic tubules,” “tubulitis,” and “degenerated tubules” were recategorized into “abnormal tubules.” For analyses, we first used the five-class set and then the eight-class set.

To show the above more precisely, we have included these explanations in the Methods section (lines 154–158) and Table 2.

- page 10, Line 140: please specify what kind of operation is performed on RGB images (contrast adjustment)

Response:

We appreciate your comment. For the input images for the model, we standardized the color appearance by the setting of mean (0.485, 0.456, 0.406) and standard deviation (0.229, 0.224, 0.225) as compared to RGB, in addition to resizing. With regard to augmentation, we adjusted the contrast and flipped horizontally at a rate of 50% and rotated in a range of -15° to +15° for each epoch within random ranges. Specifically for contrast adjustment, we calculated the average gray color of the input image in grayscale, and then created an image “a” of that single gray color. Next, we overlaid the input image and image “a,” where the alpha value is a numerical value between 0.5 and 1.5. The alpha value denotes the transparency, and the formula for the output image is as follows: output = image “a” × (1.0 - alpha) + input image × alpha. A value of zero signifies a solid gray image, whereas a value of one signifies that the input image remains the same. All these processes were performed using Python functions.

We have included the above explanations in the Methods section (lines 171–185).

- Future work?

Response:

We appreciate your comment. We are considering two ways to improve our model.

First, it would be useful to develop an application for automated detection and quantification to support pathologists. In such an application, when pathologists pointed to a tubule on the screen, the application would indicate the type of tubule. Similarly, automated quantification of abnormal tubules would help pathologists estimate renal prognosis promptly.

Second, the link between U-Net-based segmentation and clinical information would be useful to predict renal prognosis more precisely. This includes information on how different types of tubular abnormalities affect renal prognosis. This would notably improve the estimation of renal prognosis compared with semi-quantification of tubulointerstitial compartments in both native kidney specimens (ref. 43) and the Banff-grading system of kidney allografts (ref. 42).

We have included these discussions in the Discussion section (lines 387–394). 

Reviewer #2: This study sought to distinguish between different kinds of renal tissues on pathology, particularly normal and abnormal tubules using deep learning. To that end, they trained and validated a U-Net based segmentation model. Next, they evaluated the agreement between two pathologists for different tissue types (both with and without the output of the segmentation), as well as the time it took for evaluation.

Response:

We appreciate your detailed review and comments. We have revised our manuscript according to your advice, and our point-by-point responses are provided below.

1) Abstract: Line 47: “whereas the arteries and tubulitis.” Do you mean “arteries and tubules” or do you mean to refer to the pathological condition of “tubulitis?” I presume you mean the latter, but the wording here is a bit confusing when first read, as there appears to be a switch between anatomical structures and a pathological condition.

Response:

We appreciate your comment. As you have pointed out, our intention was the latter. Thus, we changed the expression from “tubulitis” to “inflamed tubules” in the Abstract (line 48).

2) Abstract: Line 49: “The pathological concordance for the glomerular count, Banff t, ct, and ci scores remained high with or without the segmented images.” You may want to clarify if you are referring to the Banff Classification of Renal Allograft Pathology (I presume).

Response:

We appreciate your comment. According to your advice, we have clarified this term as t, ct, and ci scores of the Banff classification of renal allograft pathology in the Abstract (lines 50–51).

3) Introduction: Line 83: “Because tubulointerstitial abnormalities significantly predict the outcome of renal diseases.” I would consider giving a few examples of these diseases.

Response:

We appreciate your comment. Accordingly, we have given some examples of various renal diseases, such as acute tubulointerstitial nephritis, diabetic nephropathy, lupus nephritis, and allograft kidneys (refs. 34–37), in which tubulointerstitial involvement affects renal prognosis, in the Introduction section (lines 85–87).

4) Methods: Line 95: The Introduction talks about renal diseases in general, but here very specific patients were selected: "We used formalin-fixed, paraffin-embedded needle-core biopsies obtained from 21 patients (7 patients 1 h after renal transplantation and 14 patients with tubulointerstitial nephritis)." It would be helpful to provide an explanation of why these particular patients were selected.

Response:

We appreciate your comment. Because various kidney diseases can involve glomeruli in addition to tubulointerstitial compartments, we needed to collect homogenous samples that involved only the tubulointerstitial compartments for annotation. Thus, specimens with tubulointerstitial nephritis without other involvement were used to annotate abnormal tubulointerstitial structures, whereas specimens collected 1 h after renal transplantation were almost healthy controls to annotate normal kidney structures.

We have included these explanations in the Methods section (lines 108–114).

5) Methods, Line 110: “The annotations were carried out by a nephrologist with sufficient experience in renal pathology (S.H.).” Did you consider having more than one nephrologist with renal pathology experience label some of the images to determine their concordance?

Response:

We apologize for giving insufficient explanation and stating that only one nephrologist with renal pathology experience performed the annotation. The annotator (S.H.) was double-checked by another nephrologist with sufficient renal pathology experience (M.K.) to improve the annotation quality. When both nephrologists had different opinions, they discussed the issue and then annotated after reaching concordance.

We have added these explanations in the Methods section (lines 138–142).

6) Methods, Line 133: “We compared the segmentation models FCN, U-Net, PSP-Net, and Deeplab v3 in advance, and we chose U-Net as it was the most suitable for our preliminary data.” Consider citing these other models. Also, please clarify what you mean by “it was the most suitable for our preliminary data.” Did it have the best performance?

Response:

We appreciate your comment. We compared the performance of various segmentation models as a preliminary study using 229 images. Of those, 183 were used for training and the remaining 46 were used for testing. In that experiment, U-Net exhibited the highest accuracy in the overall average Dice coefficients and relatively clear segmented images. Thus, we considered that U-Net was the most suitable model for the present study.

We have added these explanations in the Methods section and added the data as S1 Table and S1 Fig (lines 167–169).

7) Methods, Line 135: “To train the model, we used 80% of the prepared images, which were randomly selected, and the remaining 20% were used to evaluate the model’s performance.” Earlier, you state that from 21 kidney specimens, 311 regions were randomly selected. Did regions from the same patient ever end up in both the training set and the test set?

Response:

We appreciate your comment. Here, 311 regions were divided into training and test sets. The 80% of 311 regions were randomly included in the training set and the remaining 20% of 311 regions were used for the test set. As a result, the same patient data may have been used in both the training and test sets, but the same region was used for only the training set or the test set.

We have added these explanations in the Methods section (lines 171–172).

8) Methods, Line 159: “For this evaluation, we selected another 15 specimens of tubulointerstitial nephritis.” Like #4, it would be helpful to have a brief explanation of why this patient population was selected (as opposed to the one referred to earlier).

Response:

We appreciate your comment. The reason for selecting these patients is the same as in #4. We needed to collect homogenous samples that involved only the tubulointerstitial compartments for validation. Thus, patients with tubulointerstitial nephritis without other involvement were used to estimate abnormal tubulointerstitial structures.

We have included these explanations in the Methods section (lines 206–209).

9) Table 4: Why would you say that the arteries were so frequently identified as interstitium?

Response:

We appreciate your comment. We consider the following two reasons.

First, the number of annotated arteries was small. Specifically, the number of annotated arteries was 256 of 311 regions taken and 80% of them were used for training and the remaining 20% were for testing. This is insufficient for U-Net to train for detecting arteries in the test set.

Secondly, the size of arteries is extremely small compared to other compartments. The areas of arteries were about one-fortieth of those of interstitium. Thus, arteries tended to be misrecognized as interstitium.

We have included these discussions in the Discussion section (lines 400–406).

10) Results: Line 230, Line 231, Line 234, Figure 3: Please clarify what you mean by “renal outcome” and “output.” Also, in Figure 3, please consider labeling the y-axis with units.

Response:

We appreciate your comment. We mean “renal outcome” as renal prognosis. “Output” means the segmentation model’s prediction. We have revised these expressions to improve consistency (lines 279–283). In addition, we have labeled the x- and y-axis in Figure 3 with units. The x-axis signifies the ratio of areas of annotation divided by area of image, whereas the y-axis signifies the ratio of areas of segmentation model prediction divided by area of image. We have revised these expressions in Figure 3.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Franziska Theilig

27 Jun 2022

Evaluating tubulointerstitial compartments in renal biopsy specimens using a deep learning-based approach for classifying normal and abnormal tubules

PONE-D-22-02424R1

Dear Dr. Kawano,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Franziska Theilig

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors addressed all my previous comments. The manuscript greatly improved after revision. The revised manuscript is clear and focused.

Reviewer #2: The authors addressed my concerns. They have clarified several points in the methods, which is quite helpful. The detailed addition of prior studies was particularly useful, as well as additional detail regarding areas of technical novelty, model particulars, and future directions.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Hersh Sagreiya

**********

Acceptance letter

Franziska Theilig

1 Jul 2022

PONE-D-22-02424R1

Evaluating tubulointerstitial compartments in renal biopsy specimens using a deep learning-based approach for classifying normal and abnormal tubules

Dear Dr. Kawano:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Franziska Theilig

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Representative images of ground truth and eight-class segmentation using various deep learning methods.

    PAS-stained slide, ground truth, and segmentation using U-Net. The top row represents a normal specimen, and the second through fourth rows represent specimens with tubulointerstitial nephritis.

    (TIF)

    S1 Table. Dice coefficients of various deep learning methods.

    (DOCX)

    S1 File. Dataset of the present study.

    (XLSX)

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    All relevant data are within the paper and its Supporting Information files.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES