Weakly supervised learning for multi-organ adenocarcinoma classification in whole slide images

Masayuki Tsuneki; Fahdi Kanavati

doi:10.1371/journal.pone.0275378

. 2022 Nov 23;17(11):e0275378. doi: 10.1371/journal.pone.0275378

Weakly supervised learning for multi-organ adenocarcinoma classification in whole slide images

Masayuki Tsuneki ^1,^*, Fahdi Kanavati ¹

Editor: Andrey Bychkov²

PMCID: PMC9683606 PMID: 36417401

Abstract

The primary screening by automated computational pathology algorithms of the presence or absence of adenocarcinoma in biopsy specimens (e.g., endoscopic biopsy, transbronchial lung biopsy, and needle biopsy) of possible primary organs (e.g., stomach, colon, lung, and breast) and radical lymph node dissection specimen is very useful and should be a powerful tool to assist surgical pathologists in routine histopathological diagnostic workflow. In this paper, we trained multi-organ deep learning models to classify adenocarcinoma in biopsy and radical lymph node dissection specimens whole slide images (WSIs). We evaluated the models on five independent test sets (stomach, colon, lung, breast, lymph nodes) to demonstrate the feasibility in multi-organ and lymph nodes specimens from different medical institutions, achieving receiver operating characteristic areas under the curves (ROC-AUCs) in the range of 0.91 -0.98.

Introduction

Adenocarcinoma is a type of carcinoma that has the propensity to differentiate into glandular, ductal, and acinar cells in several organs (e.g., stomach, colon, lung, and breast). According to the Global Cancer Statistics 2020 [1], number of new deaths and % of all sites for stomach, colon, lung, and breast cancers were as follows: 768,793 cases (7.7%) in stomach, 576,858 cases (5.8%) in colon, 1,796,144 cases (18.0%) in lung, and 684,996 cases (6.9%) in breast. Adenocarcinoma is the most common type of cancer affecting these four organs, so that adenocarcinoma classification in the primary organs especially on biopsy specimens is one of the most important histopathological inspection in clinical workflow to determine the strategies of cancer treatment. Moreover, lymph nodes are the most common site of metastatic adenocarcinoma, and can be constituted the first clinical manifestation of the cancer. The important clinical practice of the surgical pathologist is to identify the presence or absence of a malignant process in the lymph node. If cancer cells are identified within the efferent lymph vessels and extra-nodal tissues, it is necessary to note in the pathological report because of the possible prognostic significance. Histopathological evaluation of lymph node metastasis is very important for staging of tumors, documentation of tumor recurrence, and prediction of the most probable primary site for a metastatic cancer of uncertain primary site. However, in the routine practical diagnosis, frequently there are numerous number of lymph nodes to be inspected in a single glass slide and there are number of radical lymph node dissection specimen glass slides in the same patient, which should be a workload burden for surgical pathologists.

The incorporation of deep learning models in routine histopathological diagnostic workflow is on the horizon and is a promising technology, allowing the potential of reducing the burden of time-consuming diagnosis and increasing the detection rate of anomalies including cancers. Deep learning has been widely applied in tissue classification and adenocarcinoma detection on whole-slide images (WSIs), cellular detection and segmentation, and the stratification of patient outcomes [2–15]. Previous works have looked into applying deep learning models for adenocarcinoma classification separately for different organ, such as stomach [15–17], colon [15, 18], lung [16, 19], and breast [20, 21] histopathological specimen WSIs. Although these existing models exhibited very high ROC-AUCs for each organ, they cannot classify adenocarcinoma across organs accurately.

In this study, we trained deep learning models using weakly-supervised learning to predict adenocarcinoma in WSIs of stomach, colon, lung, and breast biopsy specimens for primary tumors as well as radical lymph node dissection specimens for metastatic carcinoma using training datasets for stomach, colon, lung, and breast biopsy specimen WSIs without annotations. We evaluated the models on each primary organ biopsy specimen (stomach, colon, lung, and breast) as well as radical lymph node dissection specimens to evaluate presence or absence of metastatic adenocarcinoma, achieving and ROC-AUC from 0.91 to 0.9 8. Our results suggest that deep learning algorithms might be useful for histopathological diagnostic aids for adenocarcinoma classification in primary organs and lymph node metastatic cancer screening.

Materials and methods

Clinical cases and pathological records

In the present retrospective study, a total of 8,896 H&E (hematoxylin & eosin) stained histopathological specimen slides of human adenocarcinoma and non-adenocarcinoma (adenoma and non-neoplastic) were collected from the surgical pathology files of five hospitals: International University of Health and Welfare (IUHW), Mita Hospital (Tokyo, Japan) and Kamachi Group Hospitals (total four hospitals: Wajiro, Shinkuki, Shinkomonji, and Shinmizumaki Hospital) (Fukuoka, Japan) after histopathological review by surgical pathologists. Adenoma cases were included as adenoma is a common differential diagnosis and exhibits some similarities to adenocarcinoma. The histopathological specimens were selected randomly to reflect a real clinical settings as much as possible. Prior to the experimental procedures, each WSI diagnosis was observed by at least two pathologists with the final checking and verification performed by senior pathologists. All WSIs were scanned at a magnification of x20 using the same Leica Aperio AT2 Digital Whole Slide Scanner (Leica Biosystems, Tokyo, Japan) and were saved as SVS file format with JPEG2000 compression.

Dataset

Hospitals which provided histopathological specimen slides were anonymised by randomly assigning a letter (e.g., Hospital-A, B, C, D, and E). Table 1 breaks down the distribution of training sets from four domestic hospitals (Hospital-A, B, C, and D). Table 2 shows the distribution of 1K (1,000 WSIs), 2K (2,000 WSIs), and 4K (4,000 WSIs) training sets. Validation sets were selected randomly from the training sets and the numbers of validation sets were given in parentheses (Table 2). The distribution of test sets from five domestic hospitals (Hospital-A, B, C, D, and E) was summarized in Table 3. In both training and test sets, stomach, colon, lung, and breast WSIs solely consisted of biopsy (stomach and colon: endoscopic biopsy, lung: transbronchial lung biopsy (TBLB), breast: needle biopsy) specimens and lymph node WSIs consisted of radical dissection specimens (Tables 1–3). The distribution of lymph nodes using test sets were summatized in Table 4. All training sets WSIs were not manually annotated and the training algorithm only used the WSI labels which were extracted from the histopathological diagnostic reports after reviewing surgical pathologists; meaning that the only information available for the training was whether the WSI contained adenocarcinoma or non-adenocarcinoma but no information available about the location of the cancerous lesions.

Table 1. Distribution of cases in the training sets obtained from different hospitals (A-D).

Organ	Specimen type	Class	Diagnosis	Hospital-A	Hospital-B	Hospital-C	Hospital-D	total
Stomach	Endoscopic biopsy	Adenocarcinoma	Adenocarcinoma	250	100	150	0	500
		Non-adenocarcinoma	Adenoma	40	20	10	0	70
		Non-adenocarcinoma	Non-neoplastic	200	100	130	0	430
			total	490	220	290	0	1000
Colon	Endoscopic biopsy	Adenocarcinoma	Adenocarcinoma	150	150	0	200	500
		Non-adenocarcinoma	Adenoma	30	30	0	40	100
		Non-adenocarcinoma	Non-neoplastic	200	100	0	100	400
			total	380	280	0	340	1000
Lung	TBLB	Adenocarcinoma	Adenocarcinoma	200	100	0	100	400
Lung	TBLB	Non-adenocarcinoma	Non-neoplastic	300	200	0	100	600
			total	500	300	0	200	1000
Breast	Needle biopsy	Adenocarcinoma	Invasive ductal carcinoma	200	100	100	0	400
Breast	Needle biopsy	Non-adenocarcinoma	Non-neoplastic	300	200	100	0	600
			total	500	300	200	0	1000
			total	1870	1100	490	540	4000

Open in a new tab

Table 2. Distribution of cases in the training sets and validation sets.

The numbers of validation cases are given in parentheses.

Organ	Specimen type	Class	Diagnosis	1K-training sets	2K-training sets	4K-training sets
Stomach	Endoscopic biopsy	Adenocarcinoma	Adenocarcinoma	130 (8)	250 (8)	500 (8)
		Non-adenocarcinoma	Adenoma	20 (3)	40 (3)	70 (3)
		Non-adenocarcinoma	Non-neoplastic	100 (4)	210 (4)	430 (4)
			total	250 (15)	500 (15)	1000 (15)
Colon	Endoscopic biopsy	Adenocarcinoma	Adenocarcinoma	130 (8)	250 (8)	500 (8)
		Non-adenocarcinoma	Adenoma	30 (3)	50 (3)	100 (3)
		Non-adenocarcinoma	Non-neoplastic	90 (4)	200 (4)	400 (4)
			total	250 (15)	500 (15)	1000 (15)
Lung	TBLB	Adenocarcinoma	Adenocarcinoma	120 (8)	200 (8)	400 (8)
Lung	TBLB	Non-adenocarcinoma	Non-neoplastic	130 (7)	300 (7)	600 (7)
			total	250 (15)	500 (15)	1000 (15)
Breast	Needle biopsy	Adenocarcinoma	Invasive ductal carcinoma	120 (8)	200 (8)	400 (8)
Breast	Needle biopsy	Non-adenocarcinoma	Non-neoplastic	130 (7)	300 (7)	600 (7)
			total	250 (15)	500 (15)	1000 (15)
			total	1000 (60)	2000 (60)	4000 (60)

Open in a new tab

Table 3. Distribution of cases in the test sets obtained from hospitals (A-E).

Organ	Specimen type	Class	Diagnosis	Hosp-A	Hosp-B	Hosp-C	Hosp-D	Hosp-E	total
Stomach	Endoscopic biopsy	Adenocarcinoma	Adenocarcinoma	108	120	57	0	52	337
		Non-adenocarcinoma	Adenoma	33	35	27	0	32	127
		Non-adenocarcinoma	Non-neoplastic	263	86	78	0	109	536
			total	404	241	162	0	193	1000
Colon	Endoscopic biopsy	Adenocarcinoma	Adenocarcinoma	125	158	0	74	42	399
		Non-adenocarcinoma	Adenoma	61	55	0	83	78	277
		Non-adenocarcinoma	Non-neoplastic	136	109	0	62	17	324
			total	322	322	0	219	137	1000
Lung	TBLB	Adenocarcinoma	Adenocarcinoma	211	156	0	103	0	470
Lung	TBLB	Non-adenocarcinoma	Non-neoplastic	259	198	0	73	0	530
			total	470	354	0	176	0	1000
Breast	Needle biopsy	Adenocarcinoma	Invasive ductal carcinoma	289	44	59	0	0	392
Breast	Needle biopsy	Non-adenocarcinoma	Non-neoplastic	233	166	177	0	0	576
			total	522	210	236	0	0	968
Lymph node	Radical dissection	Adenocarcinoma	Adenocarcinoma	57	79	10	0	0	146
Lymph node	Radical dissection	Non-adenocarcinoma	Non-neoplastic	222	314	246	0	0	782
			total	279	393	256	0	0	928
			total	1997	1520	654	395	330	4896

Open in a new tab

Table 4. Distribution of whole slide images (WSIs) in the lymph nodes test sets.

Resected organ	Clinical diagnosis	Histopathological diagnosis	WSI
Stomach	Advanced gastric cancer	Adenocarcinoma	18
Stomach	Advanced gastric cancer	Non-neoplastic	97
Colon	Advanced colon cancer	Adenocarcinoma	21
Colon	Advanced colon cancer	Non-neoplastic	166
Lung	Lung cancer	Adenocarcinoma	38
Lung	Lung cancer	Non-neoplastic	181
Lung	Metastatic colon cancer	Adenocarcinoma	27
Lung	Metastatic colon cancer	Non-neoplastic	172
Breast	Breast cancer	Invasive ductal carcinoma	42
Breast	Breast cancer	Non-neoplastic	166

Open in a new tab

Deep learning models

In this study, we used the EfficientNetB1 [22] as the architecture of our models. We observed no further improvements from using larger models. We used the partial fine-tuning approach [23] to train them. This method starts with an existing pre-trained models on ImageNet and fine-tunes only the affine parameters of the batch normalization layers and the final classification layer while leaving the remaining weights frozen. Fig 1 shows an overview of the training method.

Fig 1 — (a) shows a zoomed-in example of a tile from a WSI. (b) During training, we alternated between an inference step and a training step. During the inference step, the model weights were frozen and the model was used to select tiles with the highest probability after applying it on the entire tissue regions of each WSI. The top k tiles with the highest probabilities were then selected from each WSI and placed into a queue. During training, the selected tiles from multiple WSIs formed a training batch and were used to train the model.

As we only had WSI labels, we used a weakly supervised method to train the models. The training method is similar to the one described in [24].

WSIs typically have large areas of white background that is not required for training the model and can easily be eliminated with preprocessing via thresholding using Otsu’s method [25]. This creates a mask of the tissue regions from which it would then be possible to sample tiles in real-time using the OpenSlide library [26] by providing coordinates from the tissue regions.

For a given WSI, we obtained a single prediction on the slide-level using the following approach: we divided the WSIs into a grid with a fixed stride, and we applied the model in a sliding window fashion over the grid, resulting in a predictions for the entire tissue regions. We then took the maximum probability from all the tiles as used that as a slide-level probability of the WSI having ADC.

During training, we initially performed a balanced random sampling of tiles from the tissue regions for first two epochs; this meant that we alternated between a positive WSI and a negative WSI and selecting an equal number of tiles from each. After the second epoch, we switched into hard mining of tiles, whereby we alternated between a positive WSI and a negative WSI; however, this time performing a sliding window inference on the entire tissue regions and then selecting the top k tiles with the highest probabilities for being positive. If the WSI is negative, this effectively selects the tiles most likely to be false positives. The selected tiles were placed in a training subset, and once that subset contained N tiles, a training was run whereby the model weights get updated. We used k = 8, N = 256, and a batch size of 32.

In addition, during training, we performed data augmentation of the images by performing random shifts in brightness, contrast, hue and saturation, and rotation angles as well as horizontal and vertical flipping.

We optimised the model weights by minimising the binary cross-entropy loss using the Adam optimization algorithm [27] with the following parameters: beta₁ = 0.9, beta₂ = 0.999 and a learning rate of 0.001. We applied a learning rate decay of 0.95 every 2 epochs. We used early stopping by tracking the performance of the model on a validation set; this allows stopping the training when no improvement was observed for more than 10 epochs. The model with the lowest validation loss was chosen as the final model.

Software and statistical analysis

The deep learning models were implemented and trained using TensorFlow [28]. AUCs were calculated in python using the scikit-learn package [29] and plotted using matplotlib [30]. The 95% CIs of the AUCs were estimated using the bootstrap method [31] with 1000 iterations.

The true positive rate (TPR) was computed as

\begin{matrix} T P R = \frac{T P}{T P + F N} \end{matrix}

(1)

and the false positive rate (FPR) was computed as

\begin{matrix} F P R = \frac{F P}{F P + T N} \end{matrix}

(2)

Where TP, FP, and TN represent true positive, false positive, and true negative, respectively. The ROC curve was computed by varying the probability threshold from 0.0 to 1.0 and computing both the TPR and FPR at the given threshold.

Compliance with ethical standards

The experimental protocol was approved by the ethical board of International University of Health and Welfare (No. 19-Im-007) and Kamachi Group Hospitals (No. 173). All research activities complied with all relevant ethical regulations and were performed in accordance with relevant guidelines and regulations in the all hospitals mentioned above.

Availability of data and material

Results

Insufficient AUC performance of WSI adenocarcinoma evaluation using existing stomach adenocarcinoma classification model

Prior to the training of multi-organ adenocarcinoma model, we have demonstrated the existing stomach adenocarcinoma classification model [15] AUC performance on test sets (Table 3). Table 5 and Fig 2A show that stomach and colon endoscopic biopsy WSIs exhibited high ROC-AUC and low log loss values but not in lung TBLB, breast needle biopsy, and radical lymph node dissection WSIs. Thus, we have trained the models using different WSI number of training sets (Table 2).

Table 5. ROC-AUC and log loss results for adenocarcinoma classification on test sets using existing stomach adenocarcinoma classification model.

	Existing stomach adenocarcinoma model
test sets	ROC-AUC	log loss
Stomach endoscopic biopsy	0.937 [0.918–0.953]	0.450 [0.364–0.557]
Colon endoscopic biopsy	0.986 [0.977–0.992]	0.192 [0.142–0.252]
Lung TBLB	0.698 [0.665–0.726]	1.807 [1.680–1.960]
Breast needle biopsy	0.888 [0.864–0.907]	1.225 [1.111–1.329]
Lymph node radical dissection	0.804 [0.771–0.832]	1.940 [1.787–2.091]

Open in a new tab

High AUC performance of WSI evaluation of adenocarcinoma histopathology images

We trained models using weakly-supervised (WS) learning which could be used with weak labels (WSI labels) [24]. We trained using the EfficientNetB1 convolutional neural network (CNN) architecture at magnification x10. The models were applied in a sliding window fashion with input tiles of 224x224 pixels and a stride of 256 (Fig 1). To train the deep learning models, we used a total of 1,000 (1K), 2,000 (2K), and 4,000 (4K) training set WSIs (Table 2). This resulted in three different models: (1) WS-1K: 224, x10 EfficientNetB1, (2) WS-2K: 224, x10 EfficientNetB1, and (3) WS-4K: 224, x10 EfficientNetB1. We evaluated the models on test sets from domestic hospitals (Table 3). For each test set (stomach endoscopic biopsy, colon endoscopic biopsy, lung TBLB, breast needle biopsy, and radical lymph node dissection), we computed the ROC-AUC, log loss, accuracy, sensitivity, and specificity (using a probability threshold of 0.5) and summarized the results in Tables 6 and 7 and Fig 2B–2D. The models trained using 2K and 4K training sets have a higher ROC-AUCs compared to the model trained using 1K and existing stomach adenocarcinoma model (Table 6, Fig 2). However, there was no obvious difference between the model trained using 2K and 4K training sets (Table 6, Fig 2C and 2D). In test sets from domestic hospitals, the model (WS-4K: 224, x10 EfficientNetB1) achieved very high ROC-AUCs (0.912–0.97 8) with low values of log loss (0.203–0.437) (Table 6). In all test sets, the model (WS-4K: 224, x10 EfficientNetB1) achieved very high accuracy (0.853–0.9 29), sensitivity (0.79 6–0.9 11), and specificity (0.82 5– 0.931) (Table 7). As shown in Fig 2, Tables 5–7, the model (WS-4K: 224, x10 EfficientNetB1) is fully applicable for multi-organ adenocarcinoma classification in wide variety of organs (stomach, colon, lung, breast, and lymph node) WSIs. Figs 3–7 show representative cases of true positive, true-negative, false positive, and false negative, respectively from using the model (WS-4K: 224, x10 EfficientNetB1).

Table 6. ROC-AUC and log loss results for adenocarcinoma classification on test sets using trained models.

	WS-1K: 224, x10 EfficientNetB1
test sets	ROC-AUC	log loss
Stomach endoscopic biopsy	0.886 [0.864–0.912]	0.415 [0.356–0.473]
Colon endoscopic biopsy	0.973 [0.964–0.981]	0.209 [0.175–0.242]
Lung TBLB	0.879 [0.859–0.900]	0.501 [0.443–0.555]
Breast needle biopsy	0.919 [0.900–0.935]	0.358 [0.315–0.404]
Lymph node radical dissection	0.929 [0.903–0.951]	0.427 [0.380–0.486]
	WS-2K: 224, x10 EfficientNetB1
test sets	ROC-AUC	log loss
Stomach endoscopic biopsy	0.913 [0.894–0.932]	0.351 [0.301–0.396]
Colon endoscopic biopsy	0.977 [0.969–0.936]	0.197 [0.167–0.226]
Lung TBLB	0.931 [0.915–0.946]	0.342 [0.300–0.386]
Breast needle biopsy	0.919 [0.901–0.936]	0.371 [0.325–0.423]
Lymph node radical dissection	0.953 [0.939–0.978]	0.228 [0.188–0.257]
	WS-4K: 224, x10 EfficientNetB1
test sets	ROC-AUC	log loss
Stomach endoscopic biopsy	0.914 [0.890–0.931]	0.355 [0.315–0.404]
Colon endoscopic biopsy	0.978 [0.970–0.984]	0.203 [0.173–0.236]
Lung TBLB	0.933 [0.917–0.946]	0.437 [0.391–0.494]
Breast needle biopsy	0.912 [0.894–0.930]	0.374 [0.330–0.421]
Lymph node radical dissection	0.962 [0.942–0.978]	0.309 [0.272–0.356]

Open in a new tab

Table 7. Scores of accuracy, sensitivity, and specificity on test sets using the best model (WS-4K: 224, x10 EfficientNetB1).

	WS-4K: 224, x10 EfficientNetB1
test sets	Accuracy	Sensitivity	Specificity
Stomach endoscopic biopsy	0.859 [0.837–0.877]	0.813 [0.766–0.850]	0.882 [0.856–0.905]
Colon endoscopic biopsy	0.929 [0.912–0.944]	0.907 [0.878–0.935]	0.943 [0.924–0.960]
Lung TBLB	0.853 [0.831–0.875]	0.885 [0.861–0.915]	0.825 [0.792–0.855]
Breast needle biopsy	0.853 [0.831–0.876]	0.796 [0.755–0.837]	0.892 [0.868–0.917]
Lymph node radical dissection	0.928 [0.912–0.944]	0.911 [0.866–0.955]	0.931 [0.913–0.948]

Open in a new tab

Fig 3 — In the adenocarcinoma whole slide images (WSIs) of stomach endoscopic biopsy (A), colon endoscopic biopsy (D), lung transbronchial lung biopsy (TBLB) (H), and breast core needle biopsy (J) specimens, the heatmap images show true positive prediction of adenocarcinoma cells (B, E, F, G, I, K) which correspond respectively to H&E histopathology (A, C, D, F, G, H, J). The heatmap images show true negative predictions of non-neoplastic tissue fragments (#2 in (B) and #3 in (E)) and true positive predictions of adenocarcinoma tissue fragments (#1 in (B) and #1-#2 in (E)) which correspond respectively to H&E histopathology of adenocarcinoma area (C, F, G). The heatmap uses the jet color map where blue indicates low probability and red indicates high probability.

Fig 7 — According to the histopathological diagnostic report, this case (A) has metastatic adenocarcinoma foci in (C) but not in other areas. The heatmap image exhibited no positive adenocarcinoma prediction (B). The heatmap uses the jet color map where blue indicates low probability and red indicates high probability.

True positive adenocarcinoma prediction of stomach, colon, lung, and breast biopsy WSIs

Our model (WS-4K: 224, x10 EfficientNetB1) satisfactorily predicted adenocarcinoma in stomach endoscopic biopsy (Fig 3A–3C), colon endoscopic biopsy (Fig 3D–3G), lung TBLB (Fig 3H and 3I), and breast needle biopsy (Fig 3J and 3K) specimens. Importantly, the heatmap images showed true negative predictions of internal non-neoplastic tissue fragments (#2 in Fig 3A and 3B; #3 in Fig 3D and 3E; 3H–3K) which were confirmed by surgical pathologists.

True positive adenocarcinoma prediction of radical lymph node dissection (lymphadenectomy) WSIs

A lymphadenectomy (radical lymph node dissection) is a surgical procedure to evaluate evidence of metastatic cancer. In routine histopathological diagnosis, the histopathological inspection of lymph nodes is one of the very important but time-consuming task to avoid the risk of medical oversight. Therefore, in clinical settings, the multi-organ adenocarcinoma model is more useful when performing histopathological diagnosis of lymphadenectomy specimen WSIs. Our model (WS-4K: 224, x10 EfficientNetB1) perfectly predicted metastatic lung adenocarcinoma (Fig 4A–4D) and breast invasive ductal carcinoma (Fig 4E–4J). The heatmap images showed true negative predictions (Fig 4B) of internal non-neoplastic lymph nodes (Fig 4A). Importantly, adenocarcinoma localization areas in both metastatic lung adenocarcinoma (Fig 4C) and breast invasive ductal carcinoma (Fig 4G) are positively predicted by heatmap images (Fig 4D and 4H).

Fig 4 — In the metastatic lung adenocarcinoma (A) and breast invasive ductal carcinoma (E) whole slide images (WSIs) of radical lymph node dissection specimens, the heatmap images show true positive prediction of metastatic lung adenocarcinoma (B, D) and breast invasive ductal carcinoma (F, H) cells which correspond respectively to H&E histopathology (A, C, E, G, I, J). According to the histopathological diagnostic report, in (A), only one lymph node (blue dot line circled) was positive for metastatic lung adenocarcinoma (C). The heatmap image (B) shows true positive prediction which was consistent with areas of metastatic lung adenocarcinoma invasion in the same lymph node (D). The heatmap image (B) also shows no positive predictions in the lymph nodes without evidence of cancer metastasis (A). As compared to (A), histopathologically, it was not easy to determine metastatic cancer areas in (E) at low power view. According to the histopathological report, in (E), metastatic breast invasive ductal carcinoma was localized in (G). The heatmap image (F) shows true positive predictions in (H) which are coincided with metastatic carcinoma infiltrating areas (G, I, J). The heatmap uses the jet color map where blue indicates low probability and red indicates high probability.

True negative adenocarcinoma prediction of radical lymph node dissection (lymphadenectomy) WSIs

Our model (WS-4K: 224, x10 EfficientNetB1) showed true negative predictions of metastatic adenocarcinoma in lymph nodes without evidence of cancer metastasis (Fig 5). In Fig 5A, there were numbers of lymph nodes with broad ranging of size (small to large) and shape (round to irregular) which were not predicted as metastatic lymph nodes (Fig 5B). Moreover, in Fig 5C, the lymph node was enlarged due to lymphadenitis (Fig 5E) but without evidence of metastatic adenocarcinoma which were not predicted as metastatic lymph nodes (Fig 5D).

Fig 5 — Histopathologically, in (A), there were diverse size (small to large) and shape (round to irregular) of lymph nodes without evidence of metastatic adenocarcinoma. The heatmap image (B) shows true negative prediction of metastatic adenocarcinoma. Histopathologically, in (C), there were lymph nodes with lymphadenitis (E) but without evidence of metastatic adenocarcinoma (C, E). The heatmap image (D) shows true negative prediction of metastatic adenocarcinoma. The heatmap uses the jet color map where blue indicates low probability and red indicates high probability.

False positive adenocarcinoma prediction of radical lymph node dissection (lymphadenectomy) WSIs

Histopathologically, Fig 6A shows no evidence of metastatic adenocarcinoma. Our model (WS-4K: 224, x10 EfficientNetB1) exhibited false positive predictions of adenocarcinoma (Fig 6B, 6D and 6F). These tissue areas (Fig 6C and 6E) showed dense hematoxylic artifacts induced by crushing during specimen handling procedures which could be the primary cause of false positive due to its morphological similarity to irregular shaped and dense nuclei in adenocarcinoma cells.

False negative adenocarcinoma prediction of radical lymph node dissection (lymphadenectomy) WSIs

In Fig 7A, histopathologically, only two metastatic colon adenocarcinoma foci were observed in the left-most lymph node (Fig 7C). After double checking two independent pathologists, there were no more metastatic adenocarcinoma cells in Fig 7A. However, the heatmap image did not predict any adenocarcinoma cells (Fig 7B).

Discussion

In the present study, we trained multi-organ deep learning models for the classification of adenocarcinoma in WSIs using weakly-supervised learning. The models were trained on WSIs obtained from four medical institutions and were then applied on multi-organ test sets obtained from five medical institutions to demonstrate the generalisation of the model on unseen data. The deep learning model (WS-4K: 224, x10 EfficientNetB1) achieved ROC-AUCs in the range of 0.91-0.9 8.

So far, we have been investigating adenocarcinoma classification on histopathological WSIs in diverse organs (e.g., stomach [15–17], colon [15, 18], lung [19, 24], and breast [20, 21]). These models are specific to each organ, and versatile adenocarcinoma histopathological classification model(s) which can be applied in multi-organ have not been developed to date. The global adenocarcinoma classification model in multi-organ may play key roles in first-screening processes especially radical lymph node dissection specimens which consist of a large number of lymph nodes in a single WSI in routine pathological diagnosis in the clinical laboratories.

Prior to the training, we have demonstrated the versatility of the existing models. For example, the existing stomach adenocarcinoma classification model [15] exhibited scores of high ROC-AUC and low log loss for the stomach and colon endoscopic biopsy test sets, but not for the lung, breast, and lymph node test sets (Table 5). Therefore, we have trained the deep learning models from scratch by the weakly-supervised learning approach in this study.

We have collected histopathological H&E stained specimens from as many medical institutions as possible to ensure diversities of histopathological variability and specimen quality in training sets (Table 1). In the training sets, we did not include radical lymph node dissection specimens because we would like to train the model based on the primary organs and predict metastatic adenocarcinoma in lymph nodes. In all training sets (1K, 2K, and 4K), WSIs from each organ (stomach, colon, lung, and breast) were equally distributed (Table 2).

In this study, we showed that it was possible to exploit the use of a moderate size training sets of 2,000 (2K) and 4,000 (4K) WSIs to train deep learning models using a weakly-supervised learning, and we have obtained high ROC-AUC performance on primary organ (stomach, colon, lung, and breast) and radical lymph node dissection test sets, which is highly promising in terms of the generalisation performance of our models to classify adenocarcinoma in multi-organs. Using the weakly-supervised learning method allowed us to train on our datasets and obtain high performance without manually performed annotations. This means that it is possible to train a very high performance model for any type of cancer classification in multi-organ without having to have detailed cellular level or rough annotations or requiring an extremely large number of WSI. We have demonstrated the usefulness of weakly-supervised learning approach for lung carcinoma classification [24]. Importantly, there were no significant difference in ROC-AUC and log loss results between 2K and 4K training sets, meaning that small number (total 2,000 WSIs) of training datasets were enough for adenocarcinoma classification in multi-organ.

Our model satisfactorily predicted adenocarcinoma areas not only in primary organs (stomach, colon, lung, and breast) (Fig 3) but also in radical lymph node dissection specimens (Fig 4). In routine histopathological diagnosis, inspecting cancer metastasis in lymph nodes is laborious because usually there are a lot of lymph nodes with wide variety of sizes and shapes in glass slides. Our model can localise the prediction of adenocarcinoma invasion and visualise them as heatmap images (Fig 4) which would be a great tool for primary screening or double-check purpose in clinical workflow in laboratories. Importantly, our model can evaluate adenocarcinoma-free (non-metastatic) lymph nodes (Fig 5) which reflected high specificity (0.931) (Table 7). This is an important finding to apply our model in clinical workflow. This study is not without limitations. One limitation is the use of a single scanner type for the majority of collected cases. Another limitation is the presence of false positive/negatives. The false positives seem to be primarily caused by the dense hematoxylic artifacts induced by crushing during specimen handling procedures which have morphological similarities to adenocarcinoma cell clusters with irregular shaped and dense nuclei (Fig 6). Another major limitation is that the models were not validated in independent cohorts from different institutions.

Acknowledgments

We are grateful for the support provided by Dr. Shin Ichihara at Department of Surgical Pathology, Sapporo Kosei General Hospital (Sapporo, Japan); Dr. Makoto Abe at Department of Pathology, Tochigi Cancer Center (Tochigi, Japan); Dr. Shigeo Nakano at Kamachi Group Hospitals (Fukuoka, Japan); Professor Takayuki Shiomi at Department of Pathology, Faculty of Medicine, International University of Health and Welfare (Tokyo, Japan); Dr. Ryosuke Matsuoka at Diagnostic Pathology Center, International University of Health and Welfare, Mita Hospital (Tokyo, Japan). We thank pathologists who have been engaged in reviewing cases and clinicopathological discussion for this study.

Data Availability

The datasets generated during and/or analysed during the current study are not publicly available due to specific institutional requirements governing privacy protection but are available from the corresponding author on reasonable request. The datasets that support the findings of this study are available from International University of Health and Welfare, Mita Hospital (Tokyo, Japan) and Kamachi Group Hospitals (Fukuoka, Japan), but restrictions apply to the availability of these data, which were used under a data use agreement which was made according to the Ethical Guidelines for Medical and Health Research Involving Human Subjects as set by the Japanese Ministry of Health, Labour and Welfare, and so are not publicly available. The data contains potentially sensitive information. However, the data are available from the authors upon reasonable request for private viewing and with permission from the corresponding medical institutions within the terms of the data use agreement and if compliant with the ethical and legal requirements as stipulated by the Japanese Ministry of Health, Labour and Welfare. Contact person: Professor Dr. Takayuki Shiomi, Department of Pathology, Faculty of Medicine, International University of Health and Welfare (Tokyo, Japan) phone: +81-476-20-7701 E-mail: t_shiomi@iuhw.ac.jp (2) Ethical board of Kamachi Group Hospitals (Wajiro, Shinkuki, Shinkomonji, andShinmizumaki Hospital) Contact person: Dr. Shigeo Nakano, Head of Department of Surgical Pathology at Kamachi Group Hospitals (Fukuoka, Japan) Phone: +81-92-608-0001 E-mail: sdnakano@harajuku-reha.com

Funding Statement

This study is based on results obtained from a project, JPNP14012, subsidized by the New Energy and Industrial Technology Development Organization (NEDO). The founder provided support in the form of salaries for authors M.T. and F.K, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.

References

1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a cancer journal for clinicians. 2021;71(3):209–249. [DOI] [PubMed] [Google Scholar]
2. Yu KH, Zhang C, Berry GJ, Altman RB, Ré C, Rubin DL, et al. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nature communications. 2016;7:12474. doi: 10.1038/ncomms12474 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Hou L, Samaras D, Kurc TM, Gao Y, Davis JE, Saltz JH. Patch-based convolutional neural network for whole slide tissue image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 2424–2433. [DOI] [PMC free article] [PubMed]
4. Madabhushi A, Lee G. Image analysis and machine learning in digital pathology: Challenges and opportunities. Medical Image Analysis. 2016;33:170–175. doi: 10.1016/j.media.2016.06.037 [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Litjens G, Sánchez CI, Timofeeva N, Hermsen M, Nagtegaal I, Kovacs I, et al. Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Scientific reports. 2016;6:26286. doi: 10.1038/srep26286 [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Kraus OZ, Ba JL, Frey BJ. Classifying and segmenting microscopy images with deep multiple instance learning. Bioinformatics. 2016;32(12):i52–i59. doi: 10.1093/bioinformatics/btw252 [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Korbar B, Olofson AM, Miraflor AP, Nicka CM, Suriawinata MA, Torresani L, et al. Deep learning for classification of colorectal polyps on whole-slide images. Journal of pathology informatics. 2017;8. doi: 10.4103/jpi.jpi_34_17 [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Luo X, Zang X, Yang L, Huang J, Liang F, Rodriguez-Canales J, et al. Comprehensive computational pathological image analysis predicts lung cancer prognosis. Journal of Thoracic Oncology. 2017;12(3):501–509. doi: 10.1016/j.jtho.2016.10.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Coudray N, Ocampo PS, Sakellaropoulos T, Narula N, Snuderl M, Fenyö D, et al. Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nature medicine. 2018;24(10):1559–1567. doi: 10.1038/s41591-018-0177-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Wei JW, Tafe LJ, Linnik YA, Vaickus LJ, Tomita N, Hassanpour S. Pathologist-level classification of histologic patterns on resected lung adenocarcinoma slides with deep neural networks. Scientific reports. 2019;9(1):1–8. doi: 10.1038/s41598-019-40041-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Gertych A, Swiderska-Chadaj Z, Ma Z, Ing N, Markiewicz T, Cierniak S, et al. Convolutional neural networks can accurately distinguish four histologic growth patterns of lung adenocarcinoma in digital slides. Scientific reports. 2019;9(1):1483. doi: 10.1038/s41598-018-37638-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Bejnordi BE, Veta M, Van Diest PJ, Van Ginneken B, Karssemeijer N, Litjens G, et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Jama. 2017;318(22):2199–2210. doi: 10.1001/jama.2017.14585 [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Saltz J, Gupta R, Hou L, Kurc T, Singh P, Nguyen V, et al. Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell reports. 2018;23(1):181–193. doi: 10.1016/j.celrep.2018.03.086 [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Campanella G, Hanna MG, Geneslaw L, Miraflor A, Silva VWK, Busam KJ, et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature medicine. 2019;25(8):1301–1309. doi: 10.1038/s41591-019-0508-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Iizuka O, Kanavati F, Kato K, Rambeau M, Arihiro K, Tsuneki M. Deep learning models for histopathological classification of gastric and colonic epithelial tumours. Scientific reports. 2020;10(1):1–11. doi: 10.1038/s41598-020-58467-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Kanavati F, Tsuneki M. A deep learning model for gastric diffuse-type adenocarcinoma classification in whole slide images. arXiv preprint arXiv:210412478. 2021;. [DOI] [PMC free article] [PubMed]
17. Kanavati F, Ichihara S, Rambeau M, Iizuka O, Arihiro K, Tsuneki M. Deep learning models for gastric signet ring cell carcinoma classification in whole slide images. Technology in Cancer Research & Treatment. 2021;20:15330338211027901. doi: 10.1177/15330338211027901 [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Tsuneki M, Kanavati F. Deep learning models for poorly differentiated colorectal adenocarcinoma classification in whole slide images using transfer learning. Diagnostics. 2021;11(11):2074. doi: 10.3390/diagnostics11112074 [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Kanavati F, Toyokawa G, Momosaki S, Takeoka H, Okamoto M, Yamazaki K, et al. A deep learning model for the classification of indeterminate lung carcinoma in biopsy whole slide images. Scientific Reports. 2021;11(1):1–14. doi: 10.1038/s41598-021-87644-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Kanavati F, Tsuneki M. Breast invasive ductal carcinoma classification on whole slide images with weakly-supervised and transfer learning. bioRxiv. 2021;. doi: 10.3390/cancers13215368 [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Kanavati F, Ichihara S, Tsuneki M. A deep learning model for breast ductal carcinoma in situ classification in whole slide images. Virchows Archiv. 2022; p. 1–14. [DOI] [PubMed] [Google Scholar]
22.Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning. PMLR; 2019. p. 6105–6114.
23.Kanavati F, Tsuneki M. Partial transfusion: on the expressive influence of trainable batch norm parameters for transfer learning. arXiv preprint arXiv:210205543. 2021;.
24. Kanavati F, Toyokawa G, Momosaki S, Rambeau M, Kozuma Y, Shoji F, et al. Weakly-supervised learning for lung carcinoma classification using deep learning. Scientific reports. 2020;10(1):1–11. doi: 10.1038/s41598-020-66333-x [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Otsu N. A threshold selection method from gray-level histograms. IEEE transactions on systems, man, and cybernetics. 1979;9(1):62–66. doi: 10.1109/TSMC.1979.4310076 [DOI] [Google Scholar]
26. Goode A, Gilbert B, Harkes J, Jukic D, Satyanarayanan M. OpenSlide: A vendor-neutral software foundation for digital pathology. Journal of pathology informatics. 2013;4. doi: 10.4103/2153-3539.119005 [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980. 2014;.
28.Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, et al.. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems; 2015. Available from: https://www.tensorflow.org/.
29. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–2830. [Google Scholar]
30. Hunter JD. Matplotlib: A 2D graphics environment. Computing in Science & Engineering. 2007;9(3):90–95. doi: 10.1109/MCSE.2007.55 [DOI] [Google Scholar]
31. Efron B, Tibshirani RJ. An introduction to the bootstrap. CRC press; 1994. [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0275378.r001

Decision Letter 0

Andrey Bychkov

6 Jul 2022

PONE-D-22-13432Weakly supervised learning for multi-organ adenocarcinoma classification in whole slide imagesPLOS ONE

Dear Dr. Tsuneki,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

ACADEMIC EDITOR

I wish to reinforce the following comments picked up by the peer reviewers: 1) code availability; 2) issues with figures. When evaluating computational pathology submissions, each reviewer is routinely asked to run the provided code and assess its performance independently, which was not possible in this case.

Please submit your revised manuscript by Aug 18 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Andrey Bychkov

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please provide additional details regarding participant consent. In the Methods section, please ensure that you have specified (1) whether consent was informed and (2) what type you obtained (for instance, written or verbal). If your study included minors, state whether you obtained consent from parents or guardians. If the need for consent was waived by the ethics committee, please include this information

3. Thank you for stating the following in the Competing Interests section:

"Fahdi Kanavati and Masayuki Tsuneki are employees of Medmain Inc. (Fukuoka, Japan)."

We note that one or more of the authors are employed by a commercial company: Medmain Inc.

a. Please provide an amended Funding Statement declaring this commercial affiliation, as well as a statement regarding the Role of Funders in your study. If the funding organization did not play a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries and/or research materials, please review your statements relating to the author contributions, and ensure you have specifically and accurately indicated the role(s) that these authors had in your study. You can update author roles in the Author Contributions section of the online submission form.

Please also include the following statement within your amended Funding Statement.

“The funder provided support in the form of salaries for authors [insert relevant initials], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.”

If your commercial affiliation did play a role in your study, please state and explain this role within your updated Funding Statement.

b. Please also provide an updated Competing Interests Statement declaring this commercial affiliation along with any other relevant declarations relating to employment, consultancy, patents, products in development, or marketed products, etc.

Within your Competing Interests Statement, please confirm that this commercial affiliation does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests) . If this adherence statement is not accurate and there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

Please include both an updated Funding Statement and Competing Interests Statement in your cover letter. We will change the online submission form on your behalf.

4. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

5.Your ethics statement should only appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please move it to the Methods section and delete it from any other section. Please ensure that your ethics statement is included in your manuscript, as the ethics statement entered into the online submission form will not be published alongside your manuscript.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: No

Reviewer #3: I Don't Know

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors use weakly supervised approach for training an algorithm for adenocarcinoma detection.

The study is of potential interest.

There are, however, some important issues:

- Page 2: “(e.g., Hospital-A, B, C, D, and E)”

I would remove word anonymization as this data cannot be anonymized in such way. You can always identify your datasets according to number of slides, staining, cutting peculiarities, etc. What is the meaning of anonymization of hospitals? You probably have to anonymize cases, but not hospitals.

- Page 3: What kind of slides did the authors use from TCGA cohort?

As in Table 5 they state to have 246 slides for lung cancer cohort containing only benign tissue, it is probably the slides of fresh frozen sections. This should be clarified. If authors used fresh frozen sections they should provide the clarification for this as they train on FFPE sections and frozen section is a very different type of material.

- Table 4: Histopathological diagnosis for lymph nodes without metastasis is probably not “Non-neoplastic lesion”, just free of tumor.

- Tumors are extremely heterogenous. How did the pathologists select the cases, how they considered the representativeness. E.g., lung is an organ with a very complex morphology of benign tissue peritumoral. Sometimes you cannot just tell where the tumors starts and where ends. Fibrosis. Inflammation. Not including enough training data can reduce generalizability of the algorithms.

- Authors used EfficientNetB1 as the architecture of neural network. It is a relatively small network and not a “state-of-the-art” network for digital pathology tasks as it is known empirically that larger networks (with 20-50 mln of parameters) might perform better. Importantly, authors refer to their other paper (ref. 24) from where they inherit the technical methodology of the current study. And in ref. 24 they used EfficientNetB3. Given this fact, the authors should clarify their selection and support it by experimental data.

- Authors train only batch normalization layers and the classification layer leaving the whole model untrained (authors should state if they used ImageNet weights or not). This seems strange as in most situations for computational pathology you have to retrain the whole model. Although, ImageNet pretrained models can work as feature extractors on pathology tasks, normally one get up to 5% more accuracy when the whole model is retrained (in transfer learning setting).

- Page 4: “We used k = 8, N = 256, and a batch size of 32”. The selection of exactly these hyperparameters should be clarified and supported by experimental evidence.

- Page 4: Authors start with randomly selecting some patches from training slides (they however do not state, how many patches they use – this should be stated). And train only 1 epoch before switching to hard-mining. By such approach the first selection of random patches can have enormous effects on the result. Authors need to perform some form of bootstrapping or at least present the results from several independent trainings to be able to compare the results among these.

- Page 4: “We used early stopping by tracking the performance of the model on a

validation set; this allows stopping the training when no improvement was observed for more than 10 epochs. The model with the lowest validation loss was chosen as the final model.”

This seems to be unclear for me – validation set is practically several reserved biopsies for each tumor entity. The tumor-bearing biopsies contain regions with tumor and with benign tissue. If the authors did not annotate them, the only way to validate their trained models is on the case/slide level. But the authors state they used validation loss as a trigger for stopping training. How exactly did they calculate loss if they have access only to the validation at case/slide level?

This should be clarified in all details.

- Code availability:

“To train the classification model in this study we adapted the publicly available 140

TensorFlow training script available at https://github.com/tensorflow/models/ 141

tree/master/official/vision/image_classification.”

By this the authors mean that they will not release their code. If it is so, this should be stated clearly and nor referring the readers to the tensorflow repo! At that, this link is not working. However, my personal opinion is that this is inacceptable for research publication in the field of computational pathology because the others have absolutely no chance to reproduce the results. The methodology of the study is very simple so at least at this point there is, in my opinion, no substantial interference with “commercial background” as authors are affiliated to a company. Anyway, if authors do not provide code, there should be robust clarification for this.

- In Table 6 the authors show the test of the existing (ref. 15) model for stomach adenocarcinoma detection on the test datasets of the current study. On the test dataset from the same domain (stomach) AUC is 0.937. It is totally unclear how authors calculated AUC, whether if they used probability or tumor areas thresholds etc, as they make tests on the case/slide level (non-annotated slides). The same is true for log loss.

Authors show AUC = 0.991 for TCGA breast cancer cohort – however this seems to be totally non-believable as this is an absolutely different pathology domain.

- The same issue (exact method for calculation of AUC and log loss, and other metrics is unclear):

“For each test set (stomach endoscopic biopsy, colon endoscopic biopsy, lung TBLB, breast needle biopsy, radical lymph node dissection, lung TCGA, and breast TCGA), we computed the ROC-AUC, log loss, accuracy, sensitivity, and specificity and summarized the results in Table 7 and 8 and Fig. 2B-D.”

- Table 7, Table 8

Again, it is not believable, that you can get AUC 0.999 on such a hard problem like breast cancer training in weakly supervised mode on several hundred biopsies. Probably such results stem from the fact that the authors use only 4 images without tumor for TCGA breast cancer cohort and test at case level. It is a major flaw in design and authors should completely remove TCGA breast dataset from their work.

- Figures: Authors should clarify what different colors mean (probabilities?)

All figures are barely interpretable (the tissue behind color maps cannot be seen)

- Authors used datasets from different institutes. At that they do not adopt any normalization strategies. In this case, different number of cases from different institutes can introduce bias and substantially reduce the generalizability. Authors should ideally provide independent tests using stain normalization strategies. Normally, the results should improve.

- The main achievements of the study (for best models) are presented in Table 8.

Here very low sensitivity and specificity for all tumor entities (TCGA breast should be removed) – actually inacceptable for clinical practice. This reported accuracy corresponds to the inference maps presented in Figures.

This stems probably at least partially from a very low number of cases (selected for study – histologically very complex and heterogeneous tumor entities). It is known from other publication that for weakly supervised approaches 4000 slides are not enough, the number should be definitively > 10 k, ideally 15k-20k slides. Also methodology is sometimes suboptimal: using very small CNN, not training model (!), only classification layers and batch normalization layers. The authors are encouraged to provide the results for training with stain normalization or style transfer, training of full model as well as training in bootstrapping mode (iterations of random selection of training and test cases).

Reviewer #2: The authors present a unique model that is able to classify adenocarcinomas originated from multiple organs in a single model.

Major

1. Breast dataset from TCGA is not appropriate as a test set for cancer detection task, because it is extremely imbalanced (more than 99% are invasive ductal carcinoma).

2. In the heat map in Figs. 3-7, some tiles have frames while others do not, even in areas where the color does not represent the lowest level. In addition, unnatural white to purple grids are seen in the areas without frames. These raise the suspicion of arbitrary image editing. Please provide original images for the evaluation by editor and reviewers and relevant explanations.

3. In validation and test step, how are the inference results at the patch level aggregated to the WSI level diagnosis?

4. Also, please explain how the thresholds for the inference results were set when calculating sensitivity and specificity.

Minor

1. Why TCGA colon cancer and TCGA gastric cancer datasets were not tested?

2. What is the “non-neoplastic lesion” in the TCGA dataset? The TCGA dataset is, by definition, dealing with neoplastic lesions. How the datasets chosen for the test set in this study were selected from the TCGA dataset? To clarify the dataset, the "manifest file" used to download the data from the GDC data portal should be made available.

3. Please explain what cases of metastatic colon cancer of the lungs are in Table 4. Such cases should rarely occur in usual clinical practice.

4. There should be cases where there are multiple WSIs in a single case. How these cases were handled?

5. On which dataset is the deep learning model pretrained?

6. In “code availability” section, please provide author-generated code.

Reviewer #3: The manuscript presents research with important impact and objective, e.g. to provide an auxiliary tool to diagnostic pathology and improve global adenocarcinoma primary screening. I found the approach to be original in aiming to identify adenocarcinoma with considerable accuracy in several primary sites and metastatic lymph nodes, hence matching the diagnostic practice and protocol in multidisciplinary centers with high throughput. Using weakly supervised learning is a strong point in this research foe a list of reasons: Less time in achieving the results without using pathologists annotation efforts, it still remains comprehensive and explainable for the target user , and most importantly achieving very good performance. Below are a list of minor improvements I would suggest :

- The writing used is straight to the point and gets the message across very well, however I would suggest to replace one sentence in the Introduction : Adenocarcinoma is the major cancer arise in these 7

organs, with -Adenocarcinoma is the most common type of cancer affecting these 7 organs.

-Throughout the manuscript is consistently used " histopathological cancer classification" which I understand is linked to the technical realm of computational pathology, but from a practicing pathologist perspective cancer classification goes beyond identification of adenocarcinoma and can be miss interpreted from the reader. I would suggest to replace with " tissue classification and adenocarcinoma detection".

- I appreciated the Method and materials section as it is detailed, well explained and figures show the right data at a glance, however here I was not clear about the validation set size. I found it to be small, but I am not an image analysis expert, and it would be great to add a sentence about the size of validation set criteria if possible.

- It was not entirely clear why the adenomas were pointed out in the class of non-adenocarcinoma for GI ( stomach and colon). I would suggest to explain in a sentence or two the rational for that. e.g. high risk feature , or most common differential etc.

- I would suggest to be more broad when listing the limitations of the study, in addition to the false positives and false negatives. Considering the aim of being impactful in primary screening, the use of only one type of scanner is another limitation to be mentioned for example.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Nov 23;17(11):e0275378. doi: 10.1371/journal.pone.0275378.r002

Author response to Decision Letter 0

18 Jul 2022

Reviewer #1: The authors use weakly supervised approach for training an algorithm for adenocarcinoma detection.

The study is of potential interest.

There are, however, some important issues:

- Page 2: “(e.g., Hospital-A, B, C, D, and E)”

Response: We had a set of five hospitals from which we obtained datasets. By anonymising the dataset, we mean that we do not explicitly mention which dataset originated from a given hospital. So as to avoid associating a particular result to an explicitly named hospital.

- Page 3: What kind of slides did the authors use from TCGA cohort?

Response: We’ve expanded the sentence to include that were H&E surgical FFPE specimens

Response: We have added a mention that the dataset from TCGA contained both FFPE and frozen sections.

- Table 4: Histopathological diagnosis for lymph nodes without metastasis is probably not “Non-neoplastic lesion”, just free of tumor.

Response: We have replaced all mention of non-neoplastic lesion with non-neoplastic

Response: The criteria for selection was simply whether the WSI contained adenocarcinoma. Otherwise, it was assumed to be non-neoplastic.

Response: We saw no further improvements from using a larger model, which is why we used the B1 model. In previous study, we have also used B1 [1]

[1] Masayuki Tsuneki and Fahdi Kanavati. "Deep learning models for poorly differentiated colorectal adenocarcinoma classification in whole slide images using transfer learning." Diagnostics 11.11 (2021): 2074.

Response: We have added a mention that it is ImageNet pretrained models. Based on extensive results provided in the partial fine-tuning paper [1], it can be sufficient to only fine-tune the affine weights of the batch normalization layers and the final classification layers. That paper includes comparisons of various fine-tuning/transfer learning approaches on histopathology images.

[1] https://proceedings.mlr.press/v143/kanavati21a.html

- Page 4: “We used k = 8, N = 256, and a batch size of 32”. The selection of exactly these hyperparameters should be clarified and supported by experimental evidence.

Response: These parameters had no impact on the final result. They were selected to simply make use of the maximum possible available GPU memory.

Response: It is the same parameter k that determines the number of selected patches. We are not proposing a new method here and simply using a method that was validated to work in previous studies. In addition, with deep learning with large training and test datasets, it is the general consensus that it is enough to do a single initial split of the data and perform a single run (See for example Machine Learning Yearning by Andrew Ng), especially when training a single model can take days.

- Page 4: “We used early stopping by tracking the performance of the model on a validation set; this allows stopping the training when no improvement was observed for more than 10 epochs. The model with the lowest validation loss was chosen as the final model.”

Response: This was done on the slide level. The loss is calculated as in any classification problem using binary cross entropy. For the cases in the validation set, we simply applied the model on the entire slide, then took the maximum probability. Now we have a single probability value for the entire slide to compare with the ground truth label. This allows us to compute a loss with the binary cross entropy. We have added a sentence to clarify this.

This should be clarified in all details.

- Code availability:

“To train the classification model in this study we adapted the publicly available 140

TensorFlow training script available at https://github.com/tensorflow/models/ 141

tree/master/official/vision/image_classification.”

Response: It seems that there have been changes to repo paths, the correct link is https://github.com/tensorflow/models/tree/master/official/vision

At this point, there exists various implementations that validate the weakly supervised training with multiple instance learning. For instance there is a publicly available implementation from a seminal Nature paper here https://github.com/MSKCC-Computational-Pathology/MIL-nature-medicine-2019

We are not proposing a new methodology in this paper. It is simply a clinical application paper, and code is not central to the claim. We do mention that we adapted the code from the tensorflow vision implementation. To avoid any confusion, we have removed this statement.

As far as we are aware, based on the policy of PLOS one, it is not a requirement if the code is not central to the manuscript. https://journals.plos.org/plosone/s/materials-software-and-code-sharing#loc-sharing-code

“In cases where code is central to the manuscript, we may require the code to be made available as a condition of publication. Authors are responsible for ensuring that the code is reusable and well documented”

Response: We hope the additional paragraph we added to clarify how the slide-level probabilities are computed also answer this question.

Authors show AUC = 0.991 for TCGA breast cancer cohort – however this seems to be totally non-believable as this is an absolutely different pathology domain.

- The same issue (exact method for calculation of AUC and log loss, and other metrics is unclear):

- Table 7, Table 8

Response: They do in fact stem from only using 4 images without tumor for the TCGA breast cancer cohort. We have removed this dataset from the results.

- Figures: Authors should clarify what different colors mean (probabilities?)

Response: We have this statement mentioned at the end of each figure caption “the heatmap uses the jet color map where blue indicates low probability and red indicates high probability”

All figures are barely interpretable (the tissue behind color maps cannot be seen)

Response: The exact same image on the right contains the tissue without the color map overlay which allows the reader to see the tissue.

Response: We have included the clarification paragraph that we had performed data augmentation on the brightness, contrast, hue and saturation, which would help account for differences in stain. This had previously been mentioned in the earlier paper that described the method in more detail.

- The main achievements of the study (for best models) are presented in Table 8.

Response: We have removed the breast TCGA dataset.

Response: We again refer the review to a previous publication on the partial transfer learning method https://proceedings.mlr.press/v143/kanavati21a.html for which code is also available (https://github.com/fk128/batchnorm-transferlearning) to reproduce the results of only fine-tuning the affine parameters of batch normalization layers. We have also added a clarification that we did perform data augmentation on the color to make the model less sensitive to stain.

Reviewer #2: The authors present a unique model that is able to classify adenocarcinomas originated from multiple organs in a single model.

Major

1. Breast dataset from TCGA is not appropriate as a test set for cancer detection task, because it is extremely imbalanced (more than 99% are invasive ductal carcinoma).

Response: We have removed the breast TCGA dataset.

Response: These images were directly extracted from a web-based viewer based on openseadragon (https://openseadragon.github.io/). The viewer was adapted to display tile overlays of the prediction heatmaps. The predictions were only generated for tissue regions while the rest was automatically assumed to be background. To make tiles easier to visualise at different zoom levels, they had borders added to delineate the tiles to make it easier to view the grid structure of tiles that had prediction. The border lines are proportional to the zoom level. At low magnifications, the border lines are thicker, making them more visible. The more one zooms, the thinner they become, and at maximum magnification, there are no border lines as one can easily see the grid structure of the tiles. This is more of a visual aid. Here is a video showcasing the viewer that shows these borders https://drive.google.com/file/d/167uW8JDOJyfzyRcMP8nxiXx8WuSBmQab/view?usp=sharing

3. In validation and test step, how are the inference results at the patch level aggregated to the WSI level diagnosis?

Response: We have added a paragraph to clarify this.

4. Also, please explain how the thresholds for the inference results were set when calculating sensitivity and specificity.

Response: We used the standard threshold of 0.5.

Minor

1. Why TCGA colon cancer and TCGA gastric cancer datasets were not tested?

Response: We only had the breast and lung TCGA datasets as we had downloaded months prior and reviewed by pathologists. At the time we carried out this study, The TCGA website was not allowing the download of any further images which is why we were unable to obtain other datasets. https://forum.image.sc/t/tcga-slides-not-available-anymore/52532/3

Response: We referred to non-neoplastic lesions as anything that was not neoplastic as reviewed by pathologists. This included things like inflammatory tissue. To further make this clearer we have removed “lesions” so that non-neoplastic simply refers to anything that’s not neoplastic, e.g. non-neoplastic would refer to inflammation and normal.

3. Please explain what cases of metastatic colon cancer of the lungs are in Table 4. Such cases should rarely occur in usual clinical practice.

Response: Here are two representative examples of metastatic colon adenocarcinoma of the lung that we used:

According to the histopathological report, with extensive necrosis, highly atypical columnar epithelial cells proliferate in a fused tubular structure. The patient suffered colon adenocarcinoma and stomach adenocarcinoma as well. Immunohistochemically, the metastatic lesion in lung was positive for CK20 but negative for CK7. In the same patient, stomach adenocarcinoma was positive for CK7 but negative for CK20; and colon adenocarcinoma was positive for CK20 but negative for CK7. Therefore, final diagnosis for the metastatic lesion was “metastatic colon adenocarcinoma of lung”.

According to the histopathological report, highly atypical columnar epithelial cells take on a fused tubular structure and proliferate nodularly with extensive necrosis. Histologically, this is a finding of colorectal cancer metastasis. The same patient suffered colon adenocarcinoma in his/her clinical history.

4. There should be cases where there are multiple WSIs in a single case. How these cases were handled?

Response: We only used a single WSI from a given case.

5. On which dataset is the deep learning model pretrained?

Response: Tables 1 and 2 show which datasets the model was trained on.

6. In “code availability” section, please provide author-generated code.

Response: At this point, there exists various implementations that validate the weakly supervised training with multiple instance learning. For instance there is a publicly available implementation from a seminal Nature paper here https://github.com/MSKCC-Computational-Pathology/MIL-nature-medicine-2019

organs, with -Adenocarcinoma is the most common type of cancer affecting these 7 organs.

Response: Done.

Response: Tables 1 and 2 show the size of the validation sets in parentheses. The test set represents the hold out sets and they’re detailed in Tables 3,4, and 5.

Response: We have included it because it’s the most common differential diagnosis as well as to ensure that the model predicts correctly the difference between adenoma and adenocarcinoma due to potential similarity in some features. We have added a sentence to highlight this.

Response: We have added a sentence about the limitations due to the scanner and false positives/negatives.

Attachment

Submitted filename: 20220716_Tsuneki_Response to Reviewers.pdf

Click here for additional data file.^{(150.6KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0275378.r003

Decision Letter 1

Andrey Bychkov

16 Aug 2022

PONE-D-22-13432R1Weakly supervised learning for multi-organ adenocarcinoma classification in whole slide imagesPLOS ONE

Dear Dr. Tsuneki,

Please submit your revised manuscript by Sep 30 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Andrey Bychkov

Academic Editor

PLOS ONE

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: No

Reviewer #2: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #1: Authors addressed most of the points and discussed those they are not able to address.

The manuscript can be accepted in the actual form.

Reviewer #2: Authors removed TCGA breast data this round, however, the data remain in manuscript. Authors should fix them all before submission.

As the reviewer comment before, colon cancer rarely metastasizes to lung lymph node. (Of cause, metastasis to lung is very common in contrast.) These two examples the authors had shown also seem the cases of metastasis to lung (rather than lymph node metastasis). Please discuss again with your consultant pathologists.

It is still unclear what is the “non-neoplastic” slides from TCGA. Authors need to open the lists of cases for both adenocarcinoma and non-neoplastic subgroup, ideally with the inference result for each case. Otherwise, there is no sense in using publicly available data. (Authors did not response the original comment to open manifest file of GDC data portal the authors used for this analysis.)

The result/figure authors claim are all based on an algorithm that authors have developed yourselves. (If there is nothing new in the methodology, then just use the existing code that is publicly available.) There is no doubt that codes are very important in this regard.

PLOS expect all researchers to share author-generated code for reproducibility and reuse. The reviewer hopes the authors to share your code.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

PLoS One. 2022 Nov 23;17(11):e0275378. doi: 10.1371/journal.pone.0275378.r004

Author response to Decision Letter 1

23 Aug 2022

6. Review Comments to the Author

Reviewer #1: Authors addressed most of the points and discussed those they are not able to address.

The manuscript can be accepted in the actual form.

Response: Thank you.

Reviewer #2: Authors removed TCGA breast data this round, however, the data remain in manuscript. Authors should fix them all before submission.

Response: Thank you so much. We found breast TCGA data, so we’ve removed it. The remaining mentions of breast refer to the needle biopsy set that’s not from TCGA.

Response: We’ve discussed it again with the pathologists. Though it might be rare, those two examples do correspond to colon cancer metastasis to lung lymph node.

Response: We had downloaded the data a while back and imported a copy into our cloud-based viewer. Unfortunately we didn’t keep a copy of the open manifest file. In line with that, we have removed the results as well from the lung TCGA dataset. We have amended Figure 2 by excluding the lung TCGA set.

PLOS expect all researchers to share author-generated code for reproducibility and reuse. The reviewer hopes the authors to share your code.

Response: We do believe that we are still in line with PLOS policy as the code is not central to the manuscript given that we are not proposing a new methodology and there already exists some publicly available code that implements a similar methodology [1].

https://journals.plos.org/plosone/s/materials-software-and-code-sharing#loc-sharing-code

[1] https://github.com/MSKCC-Computational-Pathology/MIL-nature-medicine-2019

Attachment

Submitted filename: 20220821_Tsuneki_Response to Reviewers.pdf

Click here for additional data file.^{(74.5KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0275378.r005

Decision Letter 2

Andrey Bychkov

12 Sep 2022

PONE-D-22-13432R2Weakly supervised learning for multi-organ adenocarcinoma classification in whole slide imagesPLOS ONE

Dear Dr. Tsuneki,

Thank you for submitting your manuscript to PLOS ONE. We invite you to submit a revised version of the manuscript that addresses the minor points raised during the review process.

Please submit your revised manuscript by Oct 27 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Andrey Bychkov

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #2: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #2:

- Authors need to clearly state the major limitation in the Discussion that the models were not validated in independent cohort(s) from different institutions.

- Fix in in the Abstract that the models were evaluated in five (not seven) independent test sets.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

**********

PLoS One. 2022 Nov 23;17(11):e0275378. doi: 10.1371/journal.pone.0275378.r006

Author response to Decision Letter 2

14 Sep 2022

Journal Requirements:

Response: It is complete and correct.

Reviewers' comments:

6. Review Comments to the Author

Reviewer #2:

- Authors need to clearly state the major limitation in the Discussion that the models were not validated in independent cohort(s) from different institutions.

Response: We have added this as the last sentence in the discussion.

- Fix in in the Abstract that the models were evaluated in five (not seven) independent test sets.

Response: Fixed. Thank you so much.

Attachment

Submitted filename: 20220914_Tsuneki_Response to Reviewers.pdf

Click here for additional data file.^{(53.3KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0275378.r007

Decision Letter 3

Andrey Bychkov

15 Sep 2022

Weakly supervised learning for multi-organ adenocarcinoma classification in whole slide images

PONE-D-22-13432R3

Dear Dr. Tsuneki,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Andrey Bychkov

Academic Editor

PLOS ONE

PLoS One. doi: 10.1371/journal.pone.0275378.r008

Acceptance letter

Andrey Bychkov

21 Sep 2022

PONE-D-22-13432R3

Weakly supervised learning for multi-organ adenocarcinoma classification in whole slide images

Dear Dr. Tsuneki:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Andrey Bychkov

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Attachment

Submitted filename: 20220716_Tsuneki_Response to Reviewers.pdf

Click here for additional data file.^{(150.6KB, pdf)}

Attachment

Submitted filename: 20220821_Tsuneki_Response to Reviewers.pdf

Click here for additional data file.^{(74.5KB, pdf)}

Attachment

Submitted filename: 20220914_Tsuneki_Response to Reviewers.pdf

Click here for additional data file.^{(53.3KB, pdf)}

Data Availability Statement

[pone.0275378.ref001] 1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a cancer journal for clinicians. 2021;71(3):209–249. [DOI] [PubMed] [Google Scholar]

[pone.0275378.ref002] 2. Yu KH, Zhang C, Berry GJ, Altman RB, Ré C, Rubin DL, et al. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nature communications. 2016;7:12474. doi: 10.1038/ncomms12474 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0275378.ref003] 3.Hou L, Samaras D, Kurc TM, Gao Y, Davis JE, Saltz JH. Patch-based convolutional neural network for whole slide tissue image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 2424–2433. [DOI] [PMC free article] [PubMed]

[pone.0275378.ref004] 4. Madabhushi A, Lee G. Image analysis and machine learning in digital pathology: Challenges and opportunities. Medical Image Analysis. 2016;33:170–175. doi: 10.1016/j.media.2016.06.037 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0275378.ref005] 5. Litjens G, Sánchez CI, Timofeeva N, Hermsen M, Nagtegaal I, Kovacs I, et al. Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Scientific reports. 2016;6:26286. doi: 10.1038/srep26286 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0275378.ref006] 6. Kraus OZ, Ba JL, Frey BJ. Classifying and segmenting microscopy images with deep multiple instance learning. Bioinformatics. 2016;32(12):i52–i59. doi: 10.1093/bioinformatics/btw252 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0275378.ref007] 7. Korbar B, Olofson AM, Miraflor AP, Nicka CM, Suriawinata MA, Torresani L, et al. Deep learning for classification of colorectal polyps on whole-slide images. Journal of pathology informatics. 2017;8. doi: 10.4103/jpi.jpi_34_17 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0275378.ref008] 8. Luo X, Zang X, Yang L, Huang J, Liang F, Rodriguez-Canales J, et al. Comprehensive computational pathological image analysis predicts lung cancer prognosis. Journal of Thoracic Oncology. 2017;12(3):501–509. doi: 10.1016/j.jtho.2016.10.017 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0275378.ref009] 9. Coudray N, Ocampo PS, Sakellaropoulos T, Narula N, Snuderl M, Fenyö D, et al. Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nature medicine. 2018;24(10):1559–1567. doi: 10.1038/s41591-018-0177-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0275378.ref010] 10. Wei JW, Tafe LJ, Linnik YA, Vaickus LJ, Tomita N, Hassanpour S. Pathologist-level classification of histologic patterns on resected lung adenocarcinoma slides with deep neural networks. Scientific reports. 2019;9(1):1–8. doi: 10.1038/s41598-019-40041-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0275378.ref011] 11. Gertych A, Swiderska-Chadaj Z, Ma Z, Ing N, Markiewicz T, Cierniak S, et al. Convolutional neural networks can accurately distinguish four histologic growth patterns of lung adenocarcinoma in digital slides. Scientific reports. 2019;9(1):1483. doi: 10.1038/s41598-018-37638-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0275378.ref012] 12. Bejnordi BE, Veta M, Van Diest PJ, Van Ginneken B, Karssemeijer N, Litjens G, et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Jama. 2017;318(22):2199–2210. doi: 10.1001/jama.2017.14585 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0275378.ref013] 13. Saltz J, Gupta R, Hou L, Kurc T, Singh P, Nguyen V, et al. Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell reports. 2018;23(1):181–193. doi: 10.1016/j.celrep.2018.03.086 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0275378.ref014] 14. Campanella G, Hanna MG, Geneslaw L, Miraflor A, Silva VWK, Busam KJ, et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature medicine. 2019;25(8):1301–1309. doi: 10.1038/s41591-019-0508-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0275378.ref015] 15. Iizuka O, Kanavati F, Kato K, Rambeau M, Arihiro K, Tsuneki M. Deep learning models for histopathological classification of gastric and colonic epithelial tumours. Scientific reports. 2020;10(1):1–11. doi: 10.1038/s41598-020-58467-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0275378.ref016] 16.Kanavati F, Tsuneki M. A deep learning model for gastric diffuse-type adenocarcinoma classification in whole slide images. arXiv preprint arXiv:210412478. 2021;. [DOI] [PMC free article] [PubMed]

[pone.0275378.ref017] 17. Kanavati F, Ichihara S, Rambeau M, Iizuka O, Arihiro K, Tsuneki M. Deep learning models for gastric signet ring cell carcinoma classification in whole slide images. Technology in Cancer Research & Treatment. 2021;20:15330338211027901. doi: 10.1177/15330338211027901 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0275378.ref018] 18. Tsuneki M, Kanavati F. Deep learning models for poorly differentiated colorectal adenocarcinoma classification in whole slide images using transfer learning. Diagnostics. 2021;11(11):2074. doi: 10.3390/diagnostics11112074 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0275378.ref019] 19. Kanavati F, Toyokawa G, Momosaki S, Takeoka H, Okamoto M, Yamazaki K, et al. A deep learning model for the classification of indeterminate lung carcinoma in biopsy whole slide images. Scientific Reports. 2021;11(1):1–14. doi: 10.1038/s41598-021-87644-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0275378.ref020] 20. Kanavati F, Tsuneki M. Breast invasive ductal carcinoma classification on whole slide images with weakly-supervised and transfer learning. bioRxiv. 2021;. doi: 10.3390/cancers13215368 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0275378.ref021] 21. Kanavati F, Ichihara S, Tsuneki M. A deep learning model for breast ductal carcinoma in situ classification in whole slide images. Virchows Archiv. 2022; p. 1–14. [DOI] [PubMed] [Google Scholar]

[pone.0275378.ref022] 22.Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning. PMLR; 2019. p. 6105–6114.

[pone.0275378.ref023] 23.Kanavati F, Tsuneki M. Partial transfusion: on the expressive influence of trainable batch norm parameters for transfer learning. arXiv preprint arXiv:210205543. 2021;.

[pone.0275378.ref024] 24. Kanavati F, Toyokawa G, Momosaki S, Rambeau M, Kozuma Y, Shoji F, et al. Weakly-supervised learning for lung carcinoma classification using deep learning. Scientific reports. 2020;10(1):1–11. doi: 10.1038/s41598-020-66333-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0275378.ref025] 25. Otsu N. A threshold selection method from gray-level histograms. IEEE transactions on systems, man, and cybernetics. 1979;9(1):62–66. doi: 10.1109/TSMC.1979.4310076 [DOI] [Google Scholar]

[pone.0275378.ref026] 26. Goode A, Gilbert B, Harkes J, Jukic D, Satyanarayanan M. OpenSlide: A vendor-neutral software foundation for digital pathology. Journal of pathology informatics. 2013;4. doi: 10.4103/2153-3539.119005 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0275378.ref027] 27.Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980. 2014;.

[pone.0275378.ref028] 28.Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, et al.. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems; 2015. Available from: https://www.tensorflow.org/.

[pone.0275378.ref029] 29. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–2830. [Google Scholar]

[pone.0275378.ref030] 30. Hunter JD. Matplotlib: A 2D graphics environment. Computing in Science & Engineering. 2007;9(3):90–95. doi: 10.1109/MCSE.2007.55 [DOI] [Google Scholar]

[pone.0275378.ref031] 31. Efron B, Tibshirani RJ. An introduction to the bootstrap. CRC press; 1994. [Google Scholar]

PERMALINK

Weakly supervised learning for multi-organ adenocarcinoma classification in whole slide images

Masayuki Tsuneki

Fahdi Kanavati

Roles

Abstract

Introduction

Materials and methods

Clinical cases and pathological records

Dataset

Table 1. Distribution of cases in the training sets obtained from different hospitals (A-D).

Table 2. Distribution of cases in the training sets and validation sets.

Table 3. Distribution of cases in the test sets obtained from hospitals (A-E).

Table 4. Distribution of whole slide images (WSIs) in the lymph nodes test sets.

Deep learning models

Fig 1. Overview of training method.

Software and statistical analysis

Compliance with ethical standards

Availability of data and material

Results

Insufficient AUC performance of WSI adenocarcinoma evaluation using existing stomach adenocarcinoma classification model

Table 5. ROC-AUC and log loss results for adenocarcinoma classification on test sets using existing stomach adenocarcinoma classification model.

Fig 2. ROC curves with AUCs from four different models (A-D) on the test sets: (A) Existing stomach adenocarcinoma classification model and weakly supervised (WS) learning models based on 1K (B), 2K (C), and 4K (D) training sets with tile size 224 px and magnification at x10.

High AUC performance of WSI evaluation of adenocarcinoma histopathology images

Table 6. ROC-AUC and log loss results for adenocarcinoma classification on test sets using trained models.

Table 7. Scores of accuracy, sensitivity, and specificity on test sets using the best model (WS-4K: 224, x10 EfficientNetB1).

Fig 3. Representative true positive adenocarcinoma classification of stomach, colon, lung, and breast biopsy test cases using the model (WS-4K: 224, x10 EfficientNetB1).

Fig 7. Representative example of metastatic adenocarcinoma false negative prediction output on a case from the radical lymph node dissection (lymphadenectomy) test set using the model (WS-4K: 224, x10 EfficientNetB1).

True positive adenocarcinoma prediction of stomach, colon, lung, and breast biopsy WSIs

True positive adenocarcinoma prediction of radical lymph node dissection (lymphadenectomy) WSIs

Fig 4. Representative examples of metastatic adenocarcinoma true positive prediction outputs on cases from radical lymph node dissection (lymphadenectomy) test sets using the model (WS-4K: 224, x10 EfficientNetB1).

True negative adenocarcinoma prediction of radical lymph node dissection (lymphadenectomy) WSIs

Fig 5. Representative true negative metastatic adenocarcinoma classification of radical lymph node dissection (lymphadenectomy) test sets using the model (WS-4K: 224, x10 EfficientNetB1).

False positive adenocarcinoma prediction of radical lymph node dissection (lymphadenectomy) WSIs

Fig 6. Representative example of metastatic adenocarcinoma false positive prediction outputs on a case from the radical lymph node dissection (lymphadenectomy) test set using the model (WS-4K: 224, x10 EfficientNetB1).

False negative adenocarcinoma prediction of radical lymph node dissection (lymphadenectomy) WSIs

Discussion

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Andrey Bychkov

Roles

Author response to Decision Letter 0

Decision Letter 1

Andrey Bychkov

Roles

Author response to Decision Letter 1

Decision Letter 2

Andrey Bychkov

Roles

Author response to Decision Letter 2

Decision Letter 3

Andrey Bychkov

Roles

Acceptance letter

Andrey Bychkov

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases