Skip to main content
PLOS One logoLink to PLOS One
. 2024 Aug 29;19(8):e0304702. doi: 10.1371/journal.pone.0304702

PathEX: Make good choice for whole slide image extraction

Xinda Yang 1,*,#, Ranze Zhang 2,3,#, Yuan Yang 4,#, Yu Zhang 1, Kai Chen 2,3,5,*
Editor: Alberto Marchisio6
PMCID: PMC11361590  PMID: 39208135

Abstract

Background

The tile-based approach has been widely used for slide-level predictions in whole slide image (WSI) analysis. However, the irregular shapes and variable dimensions of tumor regions pose challenges for the process. To address this issue, we proposed PathEX, a framework that integrates intersection over tile (IoT) and background over tile (BoT) algorithms to extract tile images around boundaries of annotated regions while excluding the blank tile images within these regions.

Methods

We developed PathEX, which incorporated IoT and BoT into tile extraction, for training a classification model in CAM (239 WSIs) and PAIP (40 WSIs) datasets. By adjusting the IoT and BoT parameters, we generated eight training sets and corresponding models for each dataset. The performance of PathEX was assessed on the testing set comprising 13,076 tile images from 48 WSIs of CAM dataset and 6,391 tile images from 10 WSIs of PAIP dataset.

Results

PathEX could extract tile images around boundaries of annotated region differently by adjusting the IoT parameter, while exclusion of blank tile images within annotated regions achieved by setting the BoT parameter. As adjusting IoT from 0.1 to 1.0, and 1—BoT from 0.0 to 0.5, we got 8 train sets. Experimentation revealed that set C demonstrates potential as the most optimal candidate. Nevertheless, a combination of IoT values ranging from 0.2 to 0.5 and 1-BoT values ranging from 0.2 to 0.5 also yielded favorable outcomes.

Conclusions

In this study, we proposed PathEX, a framework that integrates IoT and BoT algorithms for tile image extraction at the boundaries of annotated regions while excluding blank tiles within these regions. Researchers can conveniently set the thresholds for IoT and BoT to facilitate tile image extraction in their own studies. The insights gained from this research provide valuable guidance for tile image extraction in digital pathology applications.

Introduction

Whole slide image (WSI), also known as digital pathology, involves scanning pathological tissue sections and subsequently converting them into high-fidelity digital images [1]. The technology provides medical practitioners an extensive and high-resolution approach to anatomical assessment. The resultant digital assets are amenable to storage, transmission, and computer-aided analysis, thereby enabling remote diagnostics, distance learning, and digital curation [2]. It has become indispensable in modern clinical practice and is increasingly regarded as a technological prerequisite within laboratory settings [3]. WSIs are characterized by their exceptionally large image size and high resolution, often reaching dimensions of up to 100,000×100,000 pixels at a magnification of 40X. As the adoption of whole slide imaging continues to expand, there is a growing demand for effective and efficient gigapixel image analysis.

Deep learning (DL) is at the forefront of computer vision, showcasing significant improvements over previous methodologies on visual understanding [4]. This technology has found application across various domains, including tissue segmentation, mutation prediction and slide classification in WSI [59]. Artificial neural networks (ANNs) and convolutional neural networks (CNNs) are capable of learning in both supervised and unsupervised manners. Supervised learning relies on labeled data, such as histopathological diagnoses and manual annotations, to instruct the network on detecting and classifying features in unknown datasets. But the size and variability of manual annotations present challenges in training deep neural networks.

WSI generally can be as large as 100,000×100,000 pixels at a 40X magnification, presenting challenges in training deep neural networks end-to-end. Consequently, contemporary methods often adopt a tile-based approach, where WSIs are partitioned into numerous small image tiles, from which models extract and aggregate features to make slide-level predictions [1013]. Tile images are usually square regions with dimensions ranging from 32 × 32 to 10,000 × 10,000 pixels, with a majority of approaches employing tiles of around 256×256 pixels [4, 1416].

Grid tiling, which WSI is partitioned into multiple non-overlapping tile images in a grid-like pattern from top left corner of the slide image, is the most common tiling strategy and it’s well understood. Tools such as Histolab [17], PyHIST [18], and SliDL [19] are capable of performing holistic WSI grid tiling strategy.

However, the utilization of these tools presents challenges in the context of extracting annotated tile images, particularly in the case of tumor regions. The irregular shapes and variable dimensions of tumor regions complicate the process. Researchers have attempted to address these difficulties by employing methods such as the eight-points method for extracting candidate tile images within the boundaries of annotated regions [20]. However, the limitations of this method become apparent when dealing with concave annotated regions, where it fails to ensure the full containment of candidate tile images, shown in Fig 1a. This situation could be alleviated if extract tile images in a smaller tile size, but sometimes, smaller tile size would not be a good option.

Fig 1. Three types tile images we might overlook.

Fig 1

(a) Pink shaped region is annotated and the blue square is tile image, eight yellow dots shows eight-points, which shown eight-points method will not extract this tile image. (b) Green contour indicating the annotated region, and the blue squares are the candidate tile image. But the red squares and red arrow indicating that the tiles images are not candidate for extraction. (c) Green contour also indicating the annotated region, blue squire indicating candidate tile images. But some candidate tile image are lots of blank area or fat shown in the shaded region.

Despite the fact that eight-points method can extract tile images inside the boundaries of the annotated regions mostly, however how to attain tile images which in the case shown in Fig 1a and how to attain tiles around the boundaries shown in Fig 1b still remain unsettled. Moreover, within the annotated regions, there are still some blank background area or area with lots of fat or others shown in Fig 1c. If we don’t treat three types of noise tile images carefully, we may result in inclusion of some irrelevant background information, noise tissue into DL training dataset, affecting the performance of neural network [21].

In order to solve the above problems, we first introduced the concept of intersection over tile (IoT) for handling the extraction problem of tile images around boundaries of annotated regions, particularly in cases where the region was concave. Additionally, we proposed the concept of blank over tile (BoT) to mitigate issues related to tile images containing substantial blank areas. Subsequently, we incorporated our algorithm for tile image extraction. In the dend, we extracted tile images into datasets for evaluating the performance of DL model, and determined the most effective combination of IoT and BoT.

Related work

Several tools have been developed which provide a variety of functions. While providing a complete description and comparison of these tools is beyond the scope of this manuscript, here we briefly describe these tools and whether they can be used to extract tile images using manual annotation files or overall WSI extraction.

Histolab [17] is a Python library, offers a range of functionalities aimed at facilitating tissue analysis. These functions include classical image analysis techniques for tissue detection, artifact removal, and slide tiling. The library also supports the extraction of tile images from WSIs and includes its own tiling strategies, such as grid tiling, score tiling, and random tiling.

PyHIST [18] is a semi-automatic python command-line tool for WSI segmentation and preprocessing, producing a mask for input WSI and generating tile image with mask. However, same as Histolab, the tiling method requires a mask function to localize the tile, which entails computationally expensive intersection calculations between the tile and the mask.

MONAI [22] is an extensive Python library which is part of the PyTorch ecosystem and is designed as a unified framework for performing deep learning on medical imaging data. MONAI supports tiling of WSIs and provides extensive support model evaluation metrics. However it still does not provide tiling of WSI using manual annotation.

SliDL [19] is a Python library with a straightforward API for artefact and background detection, tile extraction from annotations, as well as model training and inference on WSIs. SliDL enables the extraction of tiles from annotations, which is particularly valuable for learning applications. It will extract all tile images within the annotations even if the tile image is background.

In contrast to the aforementioned tools, the proposed approach abstracts ideas into algorithms that could be applied to a broader range of applications, tiling for holistic WSI and for manual annotation files. With the proposed approach, we could control inclusion of how much the tiles intersection with the boundaries of annotated regions, and exclude the tile image based on the blank area of tile image.

Proposed method

Intersection over tile image

We have analyzed the tile images which are not extracted around the boundaries of the annotated regions. It is because the tile images are not 100% within the annotated regions, and these tile images are partial within the region, partial out of the regions, which indicate tiles could be a noise tile images. Similar to the eight-points method, in their research, they considered these tile images as noise images and they should be excluded. However, these noise tile images still partial within annotated region, with partial positive tissue, so it might be useful for training our DL model. Intuitively, to tackle this problem, we could use the proportion of intersection area between the tile image and the annotated region to decide whether to extract the tile images around boundaries or not. Intersection over union (IoU) [23] is commonly used to measure the accuracy of object detection segmentation. Inspired by IoU, we proposed intersection over tile (IoT), which was intersection of tile image and annotated region over the area of tile image.

IoT=area(AT)area(T)

Where A,TSRn, A represents arbitrary annotated region, and T is for tile image. We first calculate the area of T. Latter we would compare T and A, whether these two polygons were intersecting, if they intersected, then we will calculate the area of the intersection. Finally, we would calculate a proportion between the area of the intersection and the area of the tile. The calculation of IoT is summarized in Algorithm 1

Algorithm 1: Intersection over Tile

Input: Two arbitrary polygons: A,TSRn, A for annotation, T for tile

Output: IoT

1: ST = area (T)

2: state ← intersects (T, A)

3: if state then

4:  Sintersection = area (TA)

5: else

6:  Sintersection = 0

7: end if

8: IoT=SintersectionST

IoT as a metric has the following properties:

  1. IoT is invariant to the scale of tile image.

  2. IoT is non-negative, ranging from 0 to 1.0.
    • (a) The value 1 occurs for IoT only when the tile image intersects with the annotated region perfectly, this only happens when the tile image is within or touches the annotated region from inside. i.e. TA = TTA
    • (b) The value 0 occurs for IoT only when the tile image does not intersect with the annotated region or touches the annotated region from outside. i.e. TA = ⌀ ⇒ TA.
    • (c) The value is in (0, 1.0), when the tile image intersects with the annotated region. i.e. TA = S and S ≠ ⌀

Background area over tile image

For some tile images, although they were within the annotated regions, having a substantial amount of blank spaces or fats in the tile images, could render them unsuitable for use as the positive tile images. These areas, which we termed as the background, may have very little tissue or no cell at all. To address this issue, we introduced background over tile (BoT), similarly as IoT. BoT referred to the proportion of blank or background area over the entire tile image, which determined whether the tile image was eligible for extraction. Given a tile image, TRn, the tile image was an RGB image normally. To begin with we could apply Gaussian blur with a 5 × 5 kernel size or similar blur algorithm. Next, we would convert the blurred image from RGB color space to HSV color space and obtained a background mask using the HSV threshold setup. The background mask might have some minor defects, thus, we further refined it by applying a dilation operation with a 3 × 3 structuring element. On the refined background mask, we calculated the area of the mask which is the contour area of the mask. Finally, we would get the BoT by dividing the area of the background over the area of the tile image which is product of height and width of tile image. The calculation of BoT was summarized in Algorithm 2

BoT=area(B)area(T)

Where B,TSRn, B is for background of a tile image, and T is for tile image. We do not use BoT directly, we would use 1 − BoT, which means tissue over tile image (ToT).

BoT as a metric has the following properties:

  1. BoT is invariant to the scale of tile image.

  2. Similar to IoT, BoT is non-negative, ranging from 0 to 1.0.
    • (a) The value 1 occurs for BoT only when the tile image is full with background, which means there is no any tissue.
    • (b) The value 0 occurs for BoT only when the tile image is all covered with tissue.
    • (c) The value is in range (0, 1), when there is some background and tissue on tile image, this would be most common case for all tile image.

Algorithm 2: Background over Tile

Input: A tile: TSRn

Output: BoT

1: ST = area (T)

2: Gaussian blur, 5 × 5 kernel size on T

3: Convert RGB space to HSV space

4: Get background mask from HSV space

5: Dilate with 3 × 3 structuring element on mask

6: Sbg = area (background)

7: BoT=SbgST

PathEX: WSI tiling algorithm

According to the properties of both IoT and BoT, a floating threshold for both metrics would be required to determine tile image extraction. We incorporated IoT and BoT with some other operations into an algorithm which would extract tile images from holistic WSI and manual annotation files (ANO). Annotated region extraction would require an annotation file (ANO), which contained coordinates of the annotated region, while holistic WSI extraction would not require an ANO.

For the annotated region extraction

Holistic WSI tile image extraction would be using BoT only, while annotated region extration required both IoT and BoT. We would specify the annotated regions extraction in the following. In the beginning, we parsed the ANO to get coordinates. Subsequently, we partitioned WSI into the intended tile size and got the locations of the tile image. In the following, we would iterate all the locations, and calculate IoT and BoT respectively, if both metrics met the thresholds respectively, then the tile image would be extracted. WSI tiling algorithm (PathEX) was summarized in Algorithm 3.

Algorithm 3: Tiles extraction

Input: ANO, WSI, IoTthresh and BoTthresh ∈ [0, 1], W for WSI, A for Annotated coordinates, A,WSRn

Output: Tiles

1: parse ANO get A

2: locationstiles = partition (W)

3: for loc ← locationstiles do

4:  IoT = calculate_iot(A, loc)

5:  if IoT ≥ IoTthresh then

6:   T = get_tile(loc)

7:   BoT = calculate_bot(T)

8:   if 1 − BoT ≥ BoTthresh then

9:    Save tile

10:   end if

11:  end if

12: end for

As aforementioned, we incorporated IoT and BoT into algorithm so that we could extract tile images around boundaries of annotated regions and exclude the background tile images within annotated regions. One idea, which would be excluding both tile images around the boundaries and background tile images, would be better for comprising a clean nice datasets for training DL model. In this work, however we found some interesting results in section.

Experiments

Implementation

We implemented the algorithm using the Python library, version 3.9. Openslide [24] and tifffile [25] for low-level WSI operations, Numpy [26] for fast numerical computations, Shapely [27] for geometry processing, and OpenCV [28] and Pillow [29] for image processing algorithms.

Dataset

CAMELYON16

CAMELYON16 (CAM) [30] was a challenging aim to automated detection of metastases in hematoxylin and eosin (H&E) stained WSI of lymph node sections. The data in this challenge contains a total of 400 WSIs of sentinel lymph node from two independent datasets collected in Radboud University Medical Center (Nijmegen, the Netherlands), and the University Medical Center Utrecht (Utrecht, the Netherlands). The training set comprised 111 slides containing metastases and 159 normal slides, while testing set includes 129 slides with metastases and normal. The slides are all 40x magnification with handcraft annotation around the positive region. Four our experiment, we used 128 out of 159 negative and 111 positive slides for the comprising training set. 12 negative and 36 positive slides out of 129 are for the testing set.

PAIP

The PAIP 2021 (PAIP) challenge involved multiple organ cancers, including colon, prostate, and pancreatobiliary tract, with 150 training slides, 30 validation slides, and 60 testing slides. Slides are at 20x magnification. The challenge aims to promote the development of a common algorithm for automatic detection of perineural invasion in resected specimens of multi-organ cancers. The manual annotation regions are mostly rectangle within the designated type of tissue region and are different from CAM dataset. We took colon slides for our experiment. There are a total of 50 slides with annotation of colon cancer, 40 slides are for training set, and 10 slides are for testing set. Each slide has four layers annotations (nerve without tumor, perineural invasion junction, tumor without nerve, non tumor without nerve), which is rectangle annotations. We extracted tumor without nerve as the positive, and non-tumor without nerve as the negative, both together were for a two-classification dataset.

Training set

We incorporated IoT and BoT into tile extraction, and the tiles extracted were for training a classification model. We extracted tile image in size 512 × 512 pixels for both CAM and PAIP datasets, instead of common 256 × 256 pixels. For the CAM negative slides, we randomly extracted 10% tile images of each slide. And, for positive slides, we set up different IoT and BoT thresholds for tile images extraction for experiment. The threshold for IoT was varied at 0.1, 0.2, 0.5, and 1.0, corresponding to at least intersection percentages of 10%, 20%, 50%, and 100% with annotated regions would be inclusion for comprising positive data. Similarly, the 1—BoT threshold was set at 0.0, 0.2, and 0.5, denoting tile images inclusion with 0%, 20%, and 50% tissue content, respectively. Due to the WSI procedure, it was common that area without tissue or with fat is across the holistic WSI. We did not set 1—BoT to 1.0, as 1.0 indicating inclusion 100% tissue content, which would reduce the tile images quite a lot. In this point, we kept 1—BoT maximum at 0.5. All PAIP training set slides were also extracted tile images with these combinations. With these set up, we would have 8 combinations as following Table 1, resulting in a total of 8 training sets and 8 models for each dataset.

Table 1. Set up of IoT and 1 -BoT for tile images extraction of both training and testing set, number of tile images for both CAM and PAIP dataset.
Set IoT 1—BoT CAM PAIP
negative positive negative positive
training A 0.1 0.0 114,668 78,890 16,779 22,734
B 0.2 0.0 114,668 75,858 15,969 21,206
C 0.2 0.2 114,668 75,022 10,841 21,101
D 0.2 0.5 114,668 72,268 8,525 20,166
E 0.5 0.2 114,668 68,663 9,511 16,943
F 0.5 0.5 114,668 66,760 7,515 16,279
G 1.0 0.2 114,668 57,292 6,940 10,943
H 1.0 0.5 114,668 55,942 5,465 10,613
testing 0.3 0.3 8,453 4,623 1,816 4,575

Hold-out testing set

In the actual case, we need to identify all tile images of holistic WSI. And in our experiment, we would not use an uncertain threshold for the testing slides, so we specified the IoT threshold of 0.3 and 1—BoT of 0.3. Both thresholds were slightly different from the training set to suitably evaluate the model’s performance. After these settings, we extracted 4,623 tile images from 36 positive slides of the CAM testing set, and 8,453 tile images from 12 negative slides of the same set, employing identical tiling strategy as those used during training set. From 10 slides of the PAIP dataset, we got 1,816 negative and 4,575 positive tile images. These test sets were hold-out to evaluate the model performance. Detail summarized in Table 1.

Training protocol

During the model training phase, the ResNet-50 [31] architecture was utilized as the backbone, and optimization was performed using the Adam optimizer [32] with a weight decay value of 0.00002 and betas set to (0.9, 0.99). The training procedure was carried out on a computing infrastructure consisting of 8 Geforce RTX 3090 GPUs, employing a batch size of 80 for each GPUs. The initial learning rate was set at 0.0001, and a linear warm-up cosine decay schedule with 5 epochs was implemented. Mixup [33] alpha at 0.15 was applied during batch processing and discontinued after 60 epochs, with cross entropy serving as the loss function [34]. The total number of training epochs was set to 100. The training data was partitioned such that 80% of each class was used for training, and 20% of each class was allocated for validation.

Augmentation

A stochastic data augmentation module that randomly transforms any given data example which is denoted x. Data augmentation module include normal augmentation and strong augmentation.

Normal augmentations include the following: (1) Randomly crop and resize back to the original size. (2) Randomly choose between vertical horizontal flips. (3) Randomly choose between Gaussian and ISO noise injection. (4) Randomly gamma transformation. Normal augmentation is applied to every training iteration.

Strong augmentations include the following: (1) Randomly choose one of color jitter, random brightness with contrast and image to gray. (2) Randomly choose one of Gaussian blur, defocus and motion blur. (3) Affine transform. (4) Salter and pepper noise injection. (5) Coarse dropout. (6) Random grid shuffle. (7) Elastic transform. (8) Sharpen. (9) Image compression. For all transforms mentioned above, we will randomly sample two strong augmentations during each training iteration to avoid too intensive augmentation.

All the augmentations were done by albumentations library [35]. We would train all datasets generated with Algorithm 3 (PathEX) following the aforementioned protocol.

Result

Time required for tiling

Firstly, we reported the processing time for extraction of tile images by algorithm PathEX. IoT was set to 0.1 and 1—BoT was set to 0.0 same as set A (mentioned in section.). Total time in processing 111 positive slides was 1,729 seconds, which was approximately equivalent to 28 minutes. The variability in processing time stemmed from the diverse dimensions of the manually annotated regions, with the minimum processing time per slide being 3.24 seconds, the maximum being 95.19 seconds, and the mean being 15.58 seconds per WSI. Total 78,890 tile images were extracted. All processing time was summarized in Table 2.

Table 2. Processing Time for CAM 111 positive slides and PAIP 50 slides, unit is in seconds.

Q1 is 25th percentile, Median is 50th percentile, Q3 is 75th percentile.

min max mean ± std variance Q1 Median Q3 Total
CAM 3.24 95.19 15.58 ± 14.16 200.61 10.47 11.41 13.6 1729.48
PAIP 2.46 15.99 6.78 ± 2.75 7.57 4.99 6.32 7.6 339.16

For PAIP 50 slides, tumor without nerve layer was extracted in the same configuration as CAM dataset. The total processing time was 339 seconds, which was less than 6 minutes, and the average time for each slide was 6.78 seconds. Processing time was less than CAM dataset, mainly because the first PAIP dataset is 20x and the second annotated region is smaller than that of CAM dataset. 22,734 positive tile images were extracted. Processing time detail was summarized in Table 2.

Annotated region extraction

PathEX could extract tile images around boundaries of annotated region, if we set up the IoT properly. Shown in Fig 2 by adjusting IoT, we could extract tile images around the boundaries differently. Setting IoT from lower to higher, we could extract less tile images. For instance, if we wanted to extract tile images 100% within annotated areas, we could set IoT at 1.0, shown in Fig 2a. However, if we wished to extract all tile images mostly around the annotated boundaries, we could lower IoT setting to 0.1, shown in Fig 2b, in this way we could extract mostly the tile images. If we wanted to extract more tile images, but with less noise tile image, we could set IoT a little bit higher, such as 0.2, 0.5, etc, shown in Fig 2c and 2d.

Fig 2. Visualization for boundaries extraction.

Fig 2

Blur square are tile images we will extract. (a) IoT set at 1.0, square tile images are 100% within annotated regions, while there are some annotated region cannot be extracted. (b) IoT set at 0.1, greater than 10% intersecting area will be included. (c) IoT set at 0.2, which only very few difference between (b). (d) IoT set at 0.5, indicating inclusion tile image greater than 50% intersecting area. The red square is the tile not included compared wiht (b).

Furthermore, the exclusion of blank tile images within annotated regions could be achieved by setting parameter for BoT, effectively eliminating irrelevant or noise-containing tile images, shown in Fig 3, With BoT setting at 0.1, meaning that tile image with 90% blank area would be excluded (Fig 3b), showing few tile images not included. As we set up a little bit higher the BoT, more tile images would be excluded, shown in Fig 3c and 3d, BoT set at 0.2 and 0.5, respectively.

Fig 3. Visualization exclusion blank tile image within annotated region.

Fig 3

Blue square tile images are extracted. (a) Showing in the red shaped area, blank tile images also included, b, c and d are zoom-in of the red shaped area. (b) BoT set at 0.1, showing that if the tile image is 90% blank area will excluded. (c) BoT set at 0.2, tile image with 80% blank area will be excluded, showing that more tile images exclude from a, b. (d) BoT set at 0.5, showing a lot more tile images at the red shaped area are not included.

The combined utilization of IoT and BoT in PathEX provides a flexible approach to extract tile images around boundaries and exclude blank tile images, demonstrating its adaptability across diverse applications.

Results for training model

Shown in Table 1. As IoT set up from 0.1 to 1.0, and 1—BoT from 0.0 to 0.5, we had 8 combinations of these two parameters. We got 8 training sets from A to H. To find out the best combination of IoT and BoT, we evaluated performance of model trained on each training set and hold-out test set. We mainly looked at the accuracy, but also with precision, recall and f-score (F1 score), specificity. For each train set, we would look at the best accuracy of validation set, and the trained model would run inference on hold-out test set to get evaluation metrics. We have some findings as below:

Higher IoT and 1—BoT, less noise tile images

The number of positive tile images we got was decreasing, shown in Fig 4a, as we set up IoT and 1—BoT from lower to higher, revealing a reduction in the number of noisy tile images with some positive tissue and other tissue. The number of positive tile images extracted from set A was found to be the highest notably, while set H yielded the least. Furthermore, a comparison between set H and set A revealed a 30% and 47% reduction in tile images in CAM and PAIP, respectively. Additionally, the transition from set A to set H showed an overall improvement in all metrics alongside a decrease in noise tile images, indicating reduced uncertainty in the training and validation sets, shown in Fig 4b. The Pearson correlation coefficient Table 3 confirmed a strong negative correlation between the reduction in noise tile images and the metrics of the validation set on CAM dataset, with accuracy (r = −0.8918), precision (r = −0.8503), recall (r = −0.7565) and f-score (r = −0.8476), and all p-values less than 0.05. The Pearson correlation coefficient revealed a strong correlation in test set on PAIP dataseet, which showed in Table 4.

Fig 4. Metrics for CAM dataset.

Fig 4

(a) Number of the positive tile images from set A to set H is decreasing. (b)Visualize performance model on validation set. (c) Visualize performance on hold-out testing set. (d) The relative difference in the performance between validation set and test set (z-score).

Table 3. Pearson correlation coefficient for both validation set and testing set of CAM datasets.
number of positive
valid accuracy coefficient -0.8918**
p-value 0.0029
valid precision coefficient -0.8503**
p-value 0.0075
valid recall coefficient -0.7565*
p-value 0.0298
valid fscore coefficient -0.8476**
p-value 0.0079
test accuracy coefficient 0.2422
p-value 0.5633
test precision coefficient -0.9367**
p-value 0.0006
test recall coefficient 0.7543*
p-value 0.0305
test fscore coefficient 0.3108
p-value 0.4536

*p < 0.05

**p < 0.01

Table 4. Pearson correlation coefficient for both validation set and testing set of PAIP dataset.
number of positive
valid accuracy coefficient -0.5518
p-value 0.1561
valid precision coefficient -0.3026
p-value 0.4664
valid recall coefficient -0.6364
p-value 0.0898
valid fscore coefficient -0.6742
p-value 0.0667
test accuracy coefficient 0.9433**
p-value 0.0004
test precision coefficient -0.8509**
p-value 0.0074
test recall coefficient 0.9714**
p-value 0.5702e-4
test fscore coefficient 0.9474**
p-value 0.0004

*p < 0.05

**p < 0.01

Higher IoT and 1—BoT, not better performance in test set

As mentioned above, higher IoT and 1—BoT, we could get less uncertainty dataset, which model trained might perform well in validation set. However model might not perform well on the hold-out test set, which obviously showed over fitting. We visualized the difference of accuracy between validation set and the hold-out test set (z-score) shown in Fig 4d for CAM dataset and Fig 5d for PAIP dataset. For instance, the greatest difference of accuracy was up to 7.05% at set G, whereas the least was 4.96% at set C where IoT was 0.2 and 1—BoT was 0.2 for CAM dataset. Model trained on set C showed the least over fitting on test set for CAM dataset, and achieved better results in test set, though was not the best, set F was a little better than set C and was the best for CAM dataset. And the model train on set C from PAIP dataset showed similar result. Difference of accuracy for set C was about 2.098% which was second least in all eight sets and the accuracy was also second best. Both metrics only a little difference from the best during all eight sets. From the results obtained with these two datasets, we could conclude that set C was probably the best candidate for tile images extraction. Although we experimentally found that set C may be the best candidate. But we believed that the combination of IoT from 0.2 to 0.5 and 1—BoT from 0.2 to 0.5 would also be a nice trial.

Fig 5. Metrics for PAIP dataset.

Fig 5

(a) Number of the positive tile images from set A to set H is decreasing. (b)Visualize performance model on validation set. (c) Visualize performance on hold-out testing set. (d) The relative difference in the performance between validation set and test set (z-score).

Reducing 100% blank tile image, improve performance

The difference between set B and set C, was that 1—BoT threshold was different. 1- BoT for set B was 0 indicating that set B would have some 100% blank tile images. While, 1- BoT for set C is 0.2, showing that there was not any 100% blank tile images in the set C. Shown in Figs 4d and 5d, taking accuracy as a proxy metric, there was some improvement from set B to set C in both datasets, which indicated reducing 100% blank tile images would benefit the model performance. We should probably try to void including 100% blank tile images into our positive set in our work, as such kind tile image was definitely noise.

Some noise tile images in dataset, improved robust performance

As we mentioned in section., noise tile image is a tile image which image is partial positive and partial negative. In both CAM and PAIP datasets, from set A to set H, less noise tile images would reduce the quantity of tile images in total, especially in PAIP datasets, both negative and positive tile image were decreasing. However the less tile images we got, most likely that the model we trained was easier over fitting the data. While, with some noise tile images the quantity of the dateset would be larger, for instance set C, the model showed less over fitting indicating more robust model trained on the set C. We could consider extract some noise tile images in our research, especially when the dataset was too small.

Discussion

In our study, we proposed the PathEX algorithm, a novel approach that combines IoT and BoT algorithms. The primary objective of PathEX is to efficiently extract tile images from the boundaries of annotated regions while excluding blank tile images within these regions. Our Python library implementation catered to grid tiling and manually annotated files. Our study involved processing 111 digital pathology images from the CAM dataset, with an average processing time of 15.58 seconds per slide. This efficiency allows for the processing of 1000 slides in less than 5 hours, significantly accelerating the execution of downstream tasks.

In our study, we utilized the PathEX algorithm to extract tile image from two distinct data types, created eight training datasets for each type. Our experiments revealed that different IoT and BoT values have a significant impact on the downstream classification task. We found that Set C (with IoT and 1—BoT both set at 0.2) might be the optimal combination for tile image extraction from annotated files. However, we also suggest that combinations of IoT from 0.2 to 0.5 and 1—BoT from 0.2 to 0.5 could produce satisfactory results. These adjustable thresholds allow researchers to tailor the tile image extraction process to their specific studies.

In the conducted study, a comprehensive comparison of the PathEX algorithm with other extraction tools was not the primary objective. The main emphasis was placed on examining the influence of various extraction methodologies on the performance of downstream tasks and performance of the model trained on the dataset procured using the PathEX algorithm. The focal point of this investigation was predominantly on aspects related to performance, while other elements such as potential biases in models were not within the purview of this study.

As we look towards the future, there is an intention to broaden the utilization of PathEX algorithm across a more diverse array of downstream tasks. A significant objective on the horizon is development of a feature that enables the concurrent extraction of mask and tile images, a capability that could be particularly beneficial for image segmentation tasks. This enhancement would equip researchers with the capacity to employ both mask and tile images for their specific downstream tasks, thereby extending the potential applications of our algorithm. It is anticipated that these forthcoming improvements will further underscore the adaptability and efficiency of the PathEX algorithm.

Supporting information

S1 Table. Table of comparisons to related methods.

(PDF)

pone.0304702.s001.pdf (81.5KB, pdf)

Acknowledgments

We appreciate the assistance from the the Artificial Intelligence Lab and the Big Data Center of Sun Yat-sen Memorial Hospital, Sun Yat-sen University.

De-identified pathology images and annotations of PAIP dataset used in this research were prepared and provided by the Seoul National University Hospital by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI).

Data Availability

The code and associated documentation are uploaded to the GitHub repository. The repository URL is https://github.com/jasnei/PathEX.

Funding Statement

the Natural Science Foundation of China (#82271650), Guangdong Science and Technology Department (2020B1212060018) and Guangzhou Science Technology and Innovation Commission (#202102010221, #20212200003).

References

  • 1. Farahani N, Parwani AV, Pantanowitz L. Whole Slide Imaging in Pathology: Advantages, Limitations, and Emerging Perspectives. Pathology and Laboratory Medicine International. 2015;7:23–33. doi: 10.2147/PLMI.S59826 [DOI] [Google Scholar]
  • 2. Mercan E, Aksoy S, Shapiro LG, Weaver DL, Brunyé TT, Elmore JG. Localization of Diagnostically Relevant Regions of Interest in Whole Slide Images: A Comparative Study. Journal of Digital Imaging. 2016;29(4):496–506. doi: 10.1007/s10278-016-9873-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Niazi MKK, Parwani AV, Gurcan MN. Digital Pathology and Artificial Intelligence. The Lancet Oncology. 2019;20(5):e253–e261. doi: 10.1016/S1470-2045(19)30154-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Dimitriou N, Arandjelović O, Caie PD. Deep Learning for Whole Slide Image Analysis: An Overview. Frontiers in Medicine. 2019;6. doi: 10.3389/fmed.2019.00264 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Hou L, Samaras D, Kurc TM, Gao Y, Davis JE, Saltz JH. Patch-based convolutional neural network for whole slide tissue image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 2424–2433. [DOI] [PMC free article] [PubMed]
  • 6. Zheng Y, Gindra RH, Green EJ, Burks EJ, Betke M, Beane JE, et al. A Graph-Transformer for Whole Slide Image Classification. IEEE Transactions on Medical Imaging. 2022;41(11):3003–3015. doi: 10.1109/TMI.2022.3176598 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Wang X, Chen H, Gan C, Lin H, Dou Q, Tsougenis E, et al. Weakly supervised deep learning for whole slide lung cancer image analysis. IEEE transactions on cybernetics. 2019;50(9):3950–3962. doi: 10.1109/TCYB.2019.2935141 [DOI] [PubMed] [Google Scholar]
  • 8. Wang S, Yang DM, Rong R, Zhan X, Xiao G. Pathology image analysis using segmentation deep learning algorithms. The American journal of pathology. 2019;189(9):1686–1698. doi: 10.1016/j.ajpath.2019.05.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Coudray N, Ocampo PS, Sakellaropoulos T, Narula N, Snuderl M, Fenyö D, et al. Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nature medicine. 2018;24(10):1559–1567. doi: 10.1038/s41591-018-0177-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Zhang J, Zhang X, Ma K, Gupta R, Saltz J, Vakalopoulou M, et al. Gigapixel Whole-Slide Images Classification Using Locally Supervised Learning. In: Wang L, Dou Q, Fletcher PT, Speidel S, Li S, editors. Medical Image Computing and Computer Assisted Intervention—MICCAI 2022. vol. 13432. Cham: Springer Nature Switzerland; 2022. p. 192–201. [Google Scholar]
  • 11. Tellez D, Litjens G, Van Der Laak J, Ciompi F. Neural Image Compression for Gigapixel Histopathology Image Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2021;43(2):567–578. doi: 10.1109/TPAMI.2019.2936841 [DOI] [PubMed] [Google Scholar]
  • 12. Shao Z, Bian H, Chen Y, Wang Y, Zhang J, Ji X, et al. TransMIL: Transformer Based Correlated Multiple Instance Learning for Whole Slide Image Classification;. [Google Scholar]
  • 13.Li B, Li Y, Eliceiri KW. Dual-Stream Multiple Instance Learning Network for Whole Slide Image Classification with Self-supervised Contrastive Learning. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, TN, USA: IEEE; 2021. p. 14313–14323. [DOI] [PMC free article] [PubMed]
  • 14. Janowczyk A, Madabhushi A. Deep Learning for Digital Pathology Image Analysis: A Comprehensive Tutorial with Selected Use Cases. Journal of Pathology Informatics. 2016;7(1):29. doi: 10.4103/2153-3539.186902 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Aresta G, Araújo T, Kwok S, Chennamsetty SS, Safwan M, Alex V, et al. BACH: Grand Challenge on Breast Cancer Histology Images. Medical Image Analysis. 2019;56:122–139. doi: 10.1016/j.media.2019.05.010 [DOI] [PubMed] [Google Scholar]
  • 16. Chang HY, Jung CK, Woo JI, Lee S, Cho J, Kim SW, et al. Artificial Intelligence in Pathology. Journal of Pathology and Translational Medicine. 2019;53(1):1–12. doi: 10.4132/jptm.2018.12.16 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Marcolini A, Bussola N, Arbitrio E, Amgad M, Jurman G, Furlanello C. Histolab: A Python Library for Reproducible Digital Pathology Preprocessing with Automated Testing. SoftwareX. 2022;20:101237. doi: 10.1016/j.softx.2022.101237 [DOI] [Google Scholar]
  • 18. Muñoz-Aguirre M, Ntasis VF, Rojas S, Guigó R. PyHIST: A Histological Image Segmentation Tool. PLoS computational biology. 2020;16(10):e1008349. doi: 10.1371/journal.pcbi.1008349 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Berman AG, Orchard WR, Gehrung M, Markowetz F. SliDL: A Toolbox for Processing Whole-Slide Images in Deep Learning. PloS One. 2023;18(8):e0289499. doi: 10.1371/journal.pone.0289499 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Skrede OJ, De Raedt S, Kleppe A, Hveem TS, Liestøl K, Maddison J, et al. Deep Learning for Prediction of Colorectal Cancer Outcome: A Discovery and Validation Study. The Lancet. 2020;395(10221):350–360. doi: 10.1016/S0140-6736(19)32998-8 [DOI] [PubMed] [Google Scholar]
  • 21. Liang J, Zhang W, Yang J, Wu M, Dai Q, Yin H, et al. Deep Learning Supported Discovery of Biomarkers for Clinical Prognosis of Liver Cancer. Nature Machine Intelligence. 2023;5(4):408–420. doi: 10.1038/s42256-023-00635-3 [DOI] [Google Scholar]
  • 22. Cardoso MJ, Li W, Brown R, Ma N, Kerfoot E, Wang Y, et al. MONAI: An Open-Source Framework for Deep Learning in Healthcare. 2022. doi: 10.48550/ARXIV.2211.02701 [DOI] [Google Scholar]
  • 23.Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA: IEEE; 2019. p. 658–666.
  • 24. Goode A, Gilbert B, Harkes J, Jukic D, Satyanarayanan M. OpenSlide: A vendor-neutral software foundation for digital pathology. Journal of pathology informatics. 2013;4(1):27. doi: 10.4103/2153-3539.119005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Gohlke C. cgohlke/tifffile: v2023.9.26. 2023;.
  • 26. Harris CR, Millman KJ, Van Der Walt SJ, Gommers R, Virtanen P, Cournapeau D, et al. Array programming with NumPy. Nature. 2020;585(7825):357–362. doi: 10.1038/s41586-020-2649-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Gillies S. The shapely user manual. URL https://pypi.org/project/Shapely. 2013;.
  • 28. Bradski G. The openCV library. Dr Dobb’s Journal: Software Tools for the Professional Programmer. 2000;25(11):120–123. [Google Scholar]
  • 29. Umesh P. Image processing in python. CSI Communications. 2012;23(2). [Google Scholar]
  • 30. Bejnordi BE, Veta M, van Diest PJ, van Ginneken B, Karssemeijer N, Litjens G, et al. Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer. JAMA. 2017;318(22):2199–2210. doi: 10.1001/jama.2017.14585 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE; 2016. p. 770–778.
  • 32. Kingma DP, Ba J. Adam: A Method for Stochastic Optimization. CoRR. 2014;. [Google Scholar]
  • 33. Zhang H, Cisse M, Dauphin YN, Lopez-Paz D. Mixup: Beyond Empirical Risk Minimization. 2017. doi: 10.48550/ARXIV.1710.09412 [DOI] [Google Scholar]
  • 34. De Boer PT, Kroese DP, Mannor S, Rubinstein RY. A tutorial on the cross-entropy method. Annals of operations research. 2005;134:19–67. doi: 10.1007/s10479-005-5724-z [DOI] [Google Scholar]
  • 35. Buslaev A, Iglovikov VI, Khvedchenya E, Parinov A, Druzhinin M, Kalinin AA. Albumentations: Fast and Flexible Image Augmentations. Information. 2020;11(2):125. doi: 10.3390/info11020125 [DOI] [Google Scholar]

Decision Letter 0

Alberto Marchisio

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

25 Mar 2024

PONE-D-24-08457PathEX: Make Good Choice for Whole Slide Image ExtractionPLOS ONE

Dear Dr. Yang,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The reviewers have raised several comments. Please address all of them in the revised version.

Please submit your revised manuscript by May 09 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Alberto Marchisio

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf.

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. We suggest you thoroughly copyedit your manuscript for language usage, spelling, and grammar. If you do not know anyone who can help you do this, you may wish to consider employing a professional scientific editing service.

 Whilst you may use any professional scientific editing service of your choice, PLOS has partnered with both American Journal Experts (AJE) and Editage to provide discounted services to PLOS authors. Both organizations have experience helping authors meet PLOS guidelines and can provide language editing, translation, manuscript formatting, and figure formatting to ensure your manuscript meets our submission guidelines. To take advantage of our partnership with AJE, visit the AJE website (http://aje.com/go/plos) for a 15% discount off AJE services. To take advantage of our partnership with Editage, visit the Editage website (www.editage.com) and enter referral code PLOSEDIT for a 15% discount off Editage services. If the PLOS editorial team finds any language issues in text that either AJE or Editage has edited, the service provider will re-edit the text for free.

 Upon resubmission, please provide the following:

 The name of the colleague or the details of the professional service that edited your manuscript

 A copy of your manuscript showing your changes by either highlighting them or using track changes (uploaded as a *supporting information* file)

 A clean copy of the edited manuscript (uploaded as the new *manuscript* file).

4. Please update your submission to use the PLOS LaTeX template. The template and more information on our requirements for LaTeX submissions can be found at http://journals.plos.org/plosone/s/latex.

5. Thank you for stating the following in the Acknowledgments Section of your manuscript:

“This work was supported by grants from the Natural Science Foundation of China (#82271650), Guangdong Science and Technology Department (2020B1212060018) and Guangzhou Science Technology and Innovation Commission (#202102010221, #20212200003).”

We note that you have provided additional information within the Acknowledgements Section that is not currently declared in your Funding Statement. Please note that funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

“the Natural Science Foundation of China (#82271650), Guangdong Science and Technology Department (2020B1212060018) and Guangzhou Science Technology and Innovation Commission (#202102010221, #20212200003).”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: 1.The explanation of BOT in the methodology section is lacking. Compared to IoT, obtaining BOT requires additional algorithms to calculate the area of adipocytes. The main text and pseudocode only contain textual explanations. Please provide more detailed explanations, such as adding formulas and principle explanations. Especially, if using algorithms that have already been developed, they should be referenced, and the contribution should be insufficient.

2.What is the impact of the patch extraction algorithm proposed in this study on downstream tasks? For example, common WSI classification tasks. The author should add comparative experiments in this section to demonstrate the contribution of this study

Reviewer #2: Dear Authors,

Your manuscript titled “PathEX: Make Good Choice for Whole Slide Image Extraction” presents an innovative framework for improving tile image extraction in whole slide image (WSI) analysis. The introduction of Intersection over Tile (IoT) and Background over Tile (BoT) as effective metrics to tackle specific challenges in digital pathology is noteworthy and represents a significant contribution to the field. Below are some comments and suggestions aimed at further strengthening your paper.

While the originality of PathEX are clear, and the note mentioned in the conclusion mentioning that this algorithm is not compared to other methods, the paper would greatly benefit from a more explicit comparison with existing tile extraction methods. In enhancing the comparative analysis of PathEX's performance, algorithms ranging from traditional grid tiling and state-of-the-art DL-based segmentation techniques can be utilized. Moreover, tools like Histolab and SliDL, noted for their specific functionalities in WSI analysis, also serve as good benchmarks. Additionally, evaluating PathEX alongside adaptive tiling methods and any novel, annotation-driven approaches could offer comprehensive insights into its efficacy.

The results section is well-detailed and demonstrates the efficacy of the PathEX algorithm through a variety of metrics. To further bolster these findings, consider including statistical analyses such as p-values or confidence intervals, especially when comparing the performance of different IoT and BoT settings. This would provide a stronger empirical basis for the claims made and help validate the optimal parameter configurations.

The paper touches on the implementation details and the computational resources used, which is appreciated. However, a more thorough discussion on the practical aspects of deploying PathEX, particularly its scalability to larger datasets or its integration into existing digital pathology workflows, would be valuable. Discuss any encountered limitations or challenges and how they might be overcome in future iterations of the framework.

In conclusion, your manuscript makes a valuable contribution to the field of digital pathology. Addressing these points will not only strengthen the paper but also broaden its appeal and applicability. I look forward to seeing the evolution of your work and its impact on the community.

Best regards.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Abdul Basit

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Aug 29;19(8):e0304702. doi: 10.1371/journal.pone.0304702.r002

Author response to Decision Letter 0


1 May 2024

Dear Editor and Review Committee,

I hope this letter finds you well. I am Xinda Yang, and I am writing to address the concerns raised by the reviewers in the evaluation of my research paper titled “PathEX: Make Good Choice for Whole Slide Image Extraction,” which was recently required revision.

The primary concers are the several comments from reviwers. To address these concerns, I have made some revisions to the paper. I have also provided additional information to support my claim. All the changes will be listed in the listed after the main text, which are in same order as the comments received from the reviewers.

To support my claim, I have attached supplementary information to the paper. And make some revisions to the paper to address the concerns raised by the reviewers. All the changes are marked up in the "Revised Manuscript with Track Changes" file. And the marked up color is ORANGE.

Understanding the importance of rigorous academic standards, I deeply respect the review process and appreciate the opportunity to clarify these points. I kindly request that the committee reconsider its decision, taking into account the additional information provided.

Thank you for your time and consideration. I am hopeful for a positive re-evaluation and am available for any further discussion or clarification that may be needed.

Sincerely,

Xinda Yang

Responses as listed below:

Respoinses to letter dated 2024-05-01

1.Please upload a Response to Reviewers letter which should include a point by point response to each of the points made by the Editor and / or Reviewers. (This should be uploaded as a 'Response to Reviewers' file type.) Please follow this link for more information: http://blogs.PLOS.org/everyone/2011/05/10/how-to-submit-your-revised-manuscript/

Response:

Point by Point response to each of the points made in the following Section "Respoinses to letter dated 2024-03-26"

1.When completing the data availability statement of the submission form, you indicated that you will make your data available on acceptance. We strongly recommend all authors decide on a data sharing plan before acceptance, as the process can be lengthy and hold up publication timelines. Please note that, though access restrictions are acceptable now, your entire data will need to be made freely accessible if your manuscript is accepted for publication. This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If you are unable to adhere to our open data policy, please kindly revise your statement to explain your reasoning and we will seek the editor's input on an exemption. Please be assured that, once you have provided your new statement, the assessment of your exemption will not hold up the peer review process.

Response:

The data availability statement has been revised ""Respoinses to letter dated 2024-03-26". All authors agreed to make their data available upon acceptance.

1.Please ensure that you refer to Table 4 in your text as, if accepted, production will need this reference to link the reader to the Table.

Response:

The reference to Table 4 has been added to the text in manuscript.

Respoinses to letter dated 2024-03-26

1.Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOneformattingsamplemainbody.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOneformattingsampletitleauthors_affiliations.pdf.

Response:

We checked the formatting sample and made changes accordingly, which you could find the marked up version in the "Revised Manuscript with Track Changes" file.

1.Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

Response:

We have reviewed the PLOS ONE guidelines on sharing code and made sure that all author-generated code is shared in a way that follows best practice and facilitates reproducibility and reuse. All code and associated documentation are uploaded to the GitHub repository. The repository URL is https://github.com/jasnei/PathEX, which is private at the moment and added to the manuscript. We will make it publice once the manuscript is accepted. If you need reproducibility first, we could make it public at any convient. You could find the marked up version in the "Revised Manuscript with Track Changes" file.

1.We suggest you thoroughly copyedit your manuscript for language usage, spelling, and grammar. If you do not know anyone who can help you do this, you may wish to consider employing a professional scientific editing service.

Response:

We have carefully reviewed the manuscript and made corrections to improve the language, spelling, and grammar. We have also made some minor changes to the text to improve the flow and readability. We have also added some additional explanations to the methods section to make it more clear.

1.Please update your submission to use the PLOS LaTeX template. The template and more information on our requirements for LaTeX submissions can be found at http://journals.plos.org/plosone/s/latex.

Response:

We have updated manuscript to use the PLOS LaTeX template.

1.Thank you for stating the following in the Acknowledgments Section of your manuscript:

"This work was supported by grants from the Natural Science Foundation of China (#82271650), Guangdong Science and Technology Department (2020B1212060018) and Guangzhou Science Technology and Innovation Commission (#202102010221, #20212200003)."

Response:

We already remove the funding information from the Acknowledgments section. We apologize for any inconvenience caused.

Comments to the Author

The following quote is the comments from Reviewer #1

Reviewer #1: 1.The explanation of BOT in the methodology section is lacking. Compared to IoT, obtaining BOT requires additional algorithms to calculate the area of adipocytes. The main text and pseudocode only contain textual explanations. Please provide more detailed explanations, such as adding formulas and principle explanations. Especially, if using algorithms that have already been developed, they should be referenced, and the contribution should be insufficient.

2.What is the impact of the patch extraction algorithm proposed in this study on downstream tasks? For example, common WSI classification tasks. The author should add comparative experiments in this section to demonstrate the contribution of this study

Response to Reviewer #1’s comments

1.Yes, obtaining BoT requires algorithms to calculate the area of adipocytes (or we call in blank area in the paper). And the BoT algorithm itself is the whole point for doing that. We already explain quiet clearly in the paper. However, there might be some not clear enough, we add how to calculate the area of the blank, which is the contour area of the mask. I hope this can dispel your doubts about BoT computation.

2.In our research, we aimed to investigate whether different combinations of IoT and BoT have an impact on downstream tasks, specifically the classification of WSI. As our experiments demonstrated, varying combinations of IoT and BoT do indeed influence WSI classification tasks, as evident in the Results section. Firstly, Higher IoT and 1 - BoT, less noise tile images, which will make the dataset is clean. More over, clean dataset (without some noisy patch images), which the classification algorithm will easy to over fit the dataset with such combination. Further more, reducing 100% blank patch images will indeed positive improve the WSI classification task. We also found out the with some noisy patch images (partially positive and partially negative patch images), which will improve the the WSI classification task model more robust in proformance.

The following quote is the comments from Reviewer #2

Reviewer #2: Dear Authors,

Your manuscript titled “PathEX: Make Good Choice for Whole Slide Image Extraction” presents an innovative framework for improving tile image extraction in whole slide image (WSI) analysis. The introduction of Intersection over Tile (IoT) and Background over Tile (BoT) as effective metrics to tackle specific challenges in digital pathology is noteworthy and represents a significant contribution to the field. Below are some comments and suggestions aimed at further strengthening your paper.

While the originality of PathEX are clear, and the note mentioned in the conclusion mentioning that this algorithm is not compared to other methods, the paper would greatly benefit from a more explicit comparison with existing tile extraction methods. In enhancing the comparative analysis of PathEX's performance, algorithms ranging from traditional grid tiling and state-of-the-art DL-based segmentation techniques can be utilized. Moreover, tools like Histolab and SliDL, noted for their specific functionalities in WSI analysis, also serve as good benchmarks. Additionally, evaluating PathEX alongside adaptive tiling methods and any novel, annotation-driven approaches could offer comprehensive insights into its efficacy.

The results section is well-detailed and demonstrates the efficacy of the PathEX algorithm through a variety of metrics. To further bolster these findings, consider including statistical analyses such as p-values or confidence intervals, especially when comparing the performance of different IoT and BoT settings. This would provide a stronger empirical basis for the claims made and help validate the optimal parameter configurations.

The paper touches on the implementation details and the computational resources used, which is appreciated. However, a more thorough discussion on the practical aspects of deploying PathEX, particularly its scalability to larger datasets or its integration into existing digital pathology workflows, would be valuable. Discuss any encountered limitations or challenges and how they might be overcome in future iterations of the framework.

In conclusion, your manuscript makes a valuable contribution to the field of digital pathology. Addressing these points will not only strengthen the paper but also broaden its appeal and applicability. I look forward to seeing the evolution of your work and its impact on the community.

Best regards.

Response to Reviewer #2’s comments

Thank you very much for your insightful comments on our research, which have highlighted some shortcomings in our paper. We have proactively adopted your suggestions and made corresponding amendments in our manuscript. Below are our responses to your comments.

Our research mainly wants to find out whether different combinations of IoT and BoT have different performances on downstream classification tasks. Therefore, we have only implemented the algorithm. Although it is an algorithm that can perform some tasks, we have not added more functions to these algorithms to become a wider-use application. And I think your suggestion is a good one, and I have added this explicit comparison to the Supporting information.

The results section do include the statistical analyses such as p-values. Such statistical analyses already included in table 3 and table 4.

Thank you for your advice. We have accepted your suggestions and diligently rewritten the discussion section, including the deployment of PathEX and its application to larger datasets. Furthermore, we have also explained the tasks that we plan to undertake in the future under the PathEX framework.

Responds to reviewer and editor comments are including in the file labeled 'Response to Reviewers'. Please do check the file. Thank you.

Attachment

Submitted filename: Response to Reviewers.pdf

pone.0304702.s002.pdf (91.3KB, pdf)

Decision Letter 1

Alberto Marchisio

17 May 2024

PathEX: Make Good Choice for Whole Slide Image Extraction

PONE-D-24-08457R1

Dear Dr. Yang,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Please note that all the minor comments raised by the reviewers should be addressed.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Alberto Marchisio

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Thank you to the authors for the meticulous revisions and improvements made to the manuscript. However, there are still some minor issues that need to be addressed:

1. I recommend adding more recent references to enrich the "related work" section to ensure the timeliness and relevance of the citations. This would not only strengthen the theoretical foundation of your paper but also provide a more comprehensive overview of the latest advancements in the field. Please consider including, but not limited to, the following articles:

-Quan H, Li X, Hu D, et al. Dual-Channel Prototype Network for Few-Shot Pathology Image Classification[J]. IEEE Journal of Biomedical and Health Informatics, 2024.

-Zheng T, Chen W, Li S, et al. Learning how to detect: A deep reinforcement learning method for whole-slide melanoma histopathology images[J]. Computerized Medical Imaging and Graphics, 2023, 108: 102275.

-Ma Y, Luo X, Fu K, et al. Transformer-Based Video-Structure Multi-Instance Learning for Whole Slide Image Classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2024, 38(13): 14263-14271.

By referencing these publications, your paper could more fully reflect the current state of research in the field and potentially draw further inspiration for your studies.

Reviewer #2: Dear Authors,

Thank you for your resubmission of the manuscript titled “PathEX: Make Good Choice for Whole Slide Image Extraction.” I am pleased to inform you that the reviewers comments have been adequately addressed and your work represents a valuable contribution to the field.

I would like to bring to your attention a small edit required on page 9 in Table 2, where the 75th percentile heading is clipping out of the table. Please ensure this formatting issue is corrected for the final version.

Looking forward to your forthcoming publication.

Best regards.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Abdul Basit

**********

Acceptance letter

Alberto Marchisio

7 Jun 2024

PONE-D-24-08457R1

PLOS ONE

Dear Dr. Yang,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Alberto Marchisio

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Table of comparisons to related methods.

    (PDF)

    pone.0304702.s001.pdf (81.5KB, pdf)
    Attachment

    Submitted filename: Response to Reviewers.pdf

    pone.0304702.s002.pdf (91.3KB, pdf)

    Data Availability Statement

    The code and associated documentation are uploaded to the GitHub repository. The repository URL is https://github.com/jasnei/PathEX.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES