Multiresolution semantic segmentation of biological structures in digital histopathology

Sina Salsabili; Adrian D C Chan; Eranga Ukwatta

doi:10.1117/1.JMI.11.3.037501

. 2024 May 9;11(3):037501. doi: 10.1117/1.JMI.11.3.037501

Multiresolution semantic segmentation of biological structures in digital histopathology

Sina Salsabili ^a, Adrian D C Chan ^a,^b,^c,^*, Eranga Ukwatta ^a,^d

PMCID: PMC11086667 PMID: 38737492

Abstract.

Purpose

Semantic segmentation in high-resolution, histopathology whole slide images (WSIs) is an important fundamental task in various pathology applications. Convolutional neural networks (CNN) are the state-of-the-art approach for image segmentation. A patch-based CNN approach is often employed because of the large size of WSIs; however, segmentation performance is sensitive to the field-of-view and resolution of the input patches, and balancing the trade-offs is challenging when there are drastic size variations in the segmented structures. We propose a multiresolution semantic segmentation approach, which is capable of addressing the threefold trade-off between field-of-view, computational efficiency, and spatial resolution in histopathology WSIs.

Approach

We propose a two-stage multiresolution approach for semantic segmentation of histopathology WSIs of mouse lung tissue and human placenta. In the first stage, we use four different CNNs to extract the contextual information from input patches at four different resolutions. In the second stage, we use another CNN to aggregate the extracted information in the first stage and generate the final segmentation masks.

Results

The proposed method reported 95.6%, 92.5%, and 97.1% in our single-class placenta dataset and 97.1%, 87.3%, and 83.3% in our multiclass lung dataset for pixel-wise accuracy, mean Dice similarity coefficient, and mean positive predictive value, respectively.

Conclusions

The proposed multiresolution approach demonstrated high accuracy and consistency in the semantic segmentation of biological structures of different sizes in our single-class placenta and multiclass lung histopathology WSI datasets. Our study can potentially be used in automated analysis of biological structures, facilitating the clinical research in histopathology applications.

Keywords: multiresolution semantic segmentation, histopathology whole slide image, placenta, mouse lung tissue

1. Introduction

In the modern pathology, analysis of digital histopathology images plays a vital role in assessing the status and function of organisms and in the diagnosis of different diseases. Manual assessment of the digital histopathology images by expert pathologists is expensive, time-consuming, and subject to inter- and intrarater variability.¹ As such, extensive efforts have been made to automate the various analysis steps, including the segmentation of regions of interest. Digital histopathology whole slide images (WSIs) are very high resolution in nature, and often we need to segment both the small and large biological structures for subsequent analysis. For example, in lung tissue WSIs, there is a large size difference between the major biological structures, such as alveoli sacs, bronchi, and blood vessels. Segmentation of the various biological structures in histopathology images of lung tissue is a necessary step for automated estimation of mean linear intercept score,² which is a common metric used for quantification of lung injury in respiratory diseases, such as bronchopulmonary dysplasia.³^,⁴ Another example is placental WSIs, where the size gap between different placental structures can be noticeably large (a few μm² in syncytial knots to several mm² in large terminal villi). The segmentation of villi structures in placenta histopathology images facilitates the extraction of important features that are used in the analysis of placenta-mediated diseases, such as preeclampsia.⁵

In recent years, deep convolutional neural networks (CNNs) have shown promising performance in segmenting biological structures in histopathological WSIs.⁶^,⁷ However, there are various challenges associated with the semantic segmentation of WSIs using CNNs. One challenge is the image size; high-resolution WSIs (e.g., $20 \times$ magnification, corresponding to $0.5 μ m / pixel$ ) can often have image sizes around $80,000 \times 80,000 pixels$ . Large images cannot be directly fed into the CNN because of the need for large memory resources and high computational cost of the resulting model. As a result, patch-based approaches are often used⁸^–¹⁰ where the WSI is divided into small patches (typically around $256 \times 256 pixels$ ), the patches are individually fed into the CNN, and then the processed patches are combined to create a segmentation of the entire WSI.

Selecting an optimal patch size can be difficult. Small patches have a limited field-of-view, which can be problematic for the model to make accurate predictions, particularly during the segmentation of larger biological structures, because it may lack context from a broader range of texture patterns. Increasing the patch size will provide a wider field-of-view but at the cost of reducing the computation efficiency of the model, which can potentially lead to memory issues and a longer model training time. An alternative solution is to downsample the input WSI before extracting patches, providing a wider field-of-view for a fixed patch size; however, the reduced resolution of the patches may affect the performance of the model, particularly in the segmentation of small biological structures.

In an attempt to overcome the threefold trade-off between the field-of-view, computational efficiency, and spatial resolution, various multiresolution approaches for WSI analyses have been proposed, which aggregate the contextual information from each resolution. Multiresolution approaches mimic the technique often taken by pathologists, analyzing the WSI at different magnifications. Alsubaie et al.¹¹ performed tumor classification by training a single CNN using image patches at different magnifications and compared the performance of their proposed method against CNNs that were trained on $40 \times$ , $20 \times$ , and $10 \times$ magnifications, which improved the performance by 5%, 1%, and 1% in terms of pixel accuracy, respectively. Kosaraju et al.¹² proposed a multimagnification deep-learning model to classify adenocarcinoma in high-resolution histopathology images of the colon and stomach. The authors extracted $\sim 160 k$ patches of $256 \times 256 pixels$ from $20 \times$ and $5 \times$ magnification WSIs. Due to the high complexity of the model, they were not able to process $40 \times$ magnification WSIs. Their proposed method achieved an $F 1$ -score of 93.4% in the classification of patches, which showed 3%, 7%, and 6% improvement against well-known deep-learning models, CAT-Net,¹³ MRD-Net,¹¹ and Dense-Net,¹⁴ respectively. These works demonstrated the benefits of a multiresolution approach, even though they were focused on histopathology image classification rather than segmentation.

Sirinukunwattana et al.¹⁵ implemented different CNN architectures and trained the CNNs using patches of different resolutions, addressing the challenges associated with a narrow field-of-view in patch-based semantic segmentation approaches of prostate and breast histopathology WSIs. The authors applied multiple setups of the long short-term memory (LSTM) units to integrate the results of each CNN to improve the overall segmentation performance. Their proposed bidirectional LSTM setup was able to achieve class-average $F 1$ -score of 0.789 and 0.523 for prostate and breast histopathology WSIs. Tokunaga et al.¹⁶ proposed a multiresolution approach for semantic segmentation of adenocarcinoma in lung WSIs, which extracted contextual information from different magnifications using multiple expert CNNs (ECNNs) and generated the results by adaptively weighting the heatmaps from the ECCNNs and aggregating the extracted information using a shallow CNN. Their proposed method achieved 82.1% and 53.6% mean intersection over union (IOU), which showed 3% and 19% improvement against hard-switch-CNN¹⁷ method for single-class and multiclass segmentation tasks, respectively. van Rijthoven et al.¹⁸ proposed HookNet, which attempted to address the trade-off between resolution and field-of-view in lung and breast histopathology WSIs, which utilized skip connections between two parallel CNNs to aggregate the contextual information from different magnifications. Their proposed method achieved 0.91 and 0.72 $F 1$ -score for semantic segmentation in breast and lung datasets, respectively. Although the multiresolution schemes of these methods enable them to improve the performance of the segmentation model by resolving the trade-off between field-of-view and resolution, the segmentation of biological structures with large size differences was neither the objective of these works nor evaluated in histopathology WSIs.

In this work, we propose a two-stage multiresolution approach for the semantic segmentation of biological structures in histopathological WSIs. In the first stage, we use multiple ECNNs to extract contextual information at multiple resolutions of the WSIs. In the second stage, we use another CNN to aggregate the extracted contextual information, using structure size to weight the ECNN heatmaps. The proposed multiresolution approach addresses the trade-off between field-of-view, computational efficiency, and spatial resolution in conventional patch-based CNN models. The proposed method utilizes small image patch sizes ( $256 \times 256 pixels$ ) for training, which enables the model to run on small GPUs. The performance of the proposed method is evaluated over two different histopathology image datasets: (1) the histopathology WSIs of human placenta (single-class dataset) and (2) histopathology WSIs of mouse lung tissue (multiclass dataset). Both of these datasets consist of biological structures with drastic size variability, which can potentially demonstrate the performance of our proposed method in the segmentation of large and small biological structures. The proposed method is compared against different state-of-the-art CNN baselines at different resolutions.

2. Multiresolution Semantic Segmentation

There is a trade-off between field-of-view, computational efficiency, and resolution in patch-based CNNs. Figure 1 illustrates image patch selection with respect to patch size and resolution. Increasing the patch size (moving down the vertical axis in Fig. 1), with a fixed resolution, increases the field-of-view, with the trade-off of decreasing computational efficiency. Small patches may have a limited field-of-view, lacking sufficient contextual information for the CNN model to make accurate decisions. Large patches, however, reduce the computational efficiency of the CNN model, which may result in memory issues and slow training speed. To resolve this trade-off between field-of-view and computational efficiency, one can reduce the resolution of the image in order to obtain an adequately large field-of-view while maintaining a computationally efficient patch size. Decreasing the resolution of the input image (right to left in the horizontal axis in Fig. 1), with a fixed patch size, increases the field-of-view; however, the lower spatial resolution can reduce the segmentation performance, particularly with smaller biological structures.

Fig. 1 — Trade-off between the field-of-view, computational efficiency, and spatial resolution. $D_{1}$ is used to indicate a resolution that is the same as the original image (i.e., $20 \times$ magnification), and $D_{2}$ and $D_{4}$ indicate the decreased resolutions obtained by downsampling the original image by factors of 2 and 4, respectively. Increasing the spatial resolution (moving from left to right), with a fixed patch size, narrows the field-of-view. Increasing the patch size (moving from top to bottom), with a fixed spatial resolution, widens the field-of-view while decreasing the computational efficiency.

We propose a two-stage, multiresolution semantic segmentation method to address the trade-off between the field-of-view, computational efficiency, and spatial resolution (Fig. 2). The first stage is comprised of a number of parallel CNNs, each using the same fixed patch size but at different spatial resolutions, which provides different field-of-view sizes for each CNN. Each of these CNNs extracts the contextual information from the input WSI, generating a segmentation heatmap (probability map) for its particular spatial resolution. In the second stage, we weigh the structures in each heatmap based on the size of the structure. The weighting process enables the efficient inclusion of required contextual information for accurate segmentation of a structure based on its size. The weighted heatmaps are concatenated and an aggregating CNN is used to generate the final segmentation map (binary images associated with each class).

Fig. 2 — Overview of the proposed multiresolution semantic segmentation pipeline. In contextual information extraction stage, four different ECNNs are used to generate the heatmaps associated with each class at different resolutions as shown in green, yellow, purple, blue, and red for background, alveoli lumen, alveoli border wall, bronchi, and blood vessel, respectively. In data aggregation stage, the size-based structure weighting step is applied to the heatmaps associated with each class at each resolution. The generated weighted heatmaps associated with all classes are processed using the aggregating CNN (ACNN) to generate the final heatmaps for each class.

2.1. Contextual Information Extraction

In this stage, contextual information is extracted using parallel CNNs, each using a different resolution of the WSI; these CNNs will be referred to as ECNNs. The number of ECNNs ( $N$ ) can be adjusted to vary the range of resolutions appropriate for the given application. In this work, we use $N = 4$ resolutions that are separated by a factor of $2 \times$ , obtained by downsampling. By downsampling the original WSI, which had a $20 \times$ magnification, four images with different resolutions are generated. $D_{1}$ is used to indicate the resolution of the original WSI. $D_{2}$ , $D_{4}$ , and $D_{8}$ are used to indicate the resolution of the images obtained by downsampling the original image by factors of 2, 4, and 8, respectively. For a particular resolution, the WSI is divided into nonoverlapping image patches of a fixed size (in this work, we used a patch size of $256 \times 256$ ), and the associated ECNN is used to generate heatmaps for each image patch. Given an input patch of shape $H \times W \times 3$ , the output of each ECNN is a $H \times W \times N_{c}$ heatmap patch, where $H$ is the patch height, $W$ is the patch width, and the $N_{c}$ corresponds to the number of classes in each dataset (e.g., the output of the ECNNs in lung dataset is $H \times W \times 5$ heatmap patches, which corresponds to blood vessel, bronchus, alveoli sac, alveoli wall, and background classes). The generated heatmap patches of each class from a particular ECNN are put together to construct the heatmaps of the classes for that resolution (see Fig. 2).

The vanilla U-Net¹⁹ architecture is used for the ECNN models. The effective use of skip connections between downsampling layers to upsampling layers in U-Net architecture enables the model to preserve the spatial resolution. This is particularly important for our application because our aim is to segment the large biological structures as well as small biological structures, where resolution loss may result in misclassification of the smaller structures. The U-Nets were trained from scratch, with four convolution layers in the contracting path and four transpose convolution layers in the expansive path. Hyperparameter tuning was performed over the training data. The U-Net was trained using the Nadam optimizer²⁰ with an initial learning rate of $10^{- 3}$ . The Nadam offers adaptive learning rate adjustment, which simplifies the hyperparameter tuning and accelerates gradient calculations, which moderately increases convergence speed. The weighted categorical cross entropy was used for calculating the loss:

{\begin{cases} L^{wcce} = - \frac{1}{M} \sum_{c = 1}^{N_{c}} \sum_{n = 1}^{N_{p}} \sum_{m = 1}^{M} w_{c} \times y_{m}^{c} (n) \log ({\hat{y}}_{m}^{c} (n)), \\ w_{c} = \frac{N_{p}}{M \times N_{p c}}, \end{cases}

(1)

where $L^{wcce}$ is the weighted categorical cross-entropy loss function; $y_{m}^{c} (n)$ and ${\hat{y}}_{m}^{c} (n)$ are the $n$ ’th pixels of the $m$ ’th training patch from $c$ ’th class of the ground truth and predicted heatmap, respectively; $M$ is the number of training patches; $N_{p}$ is the number of pixels in the training patch; $N_{c}$ is the number of classes; $w_{c}$ is the weight for $c$ ’th class; and $N_{p c}$ is the number of pixels in the $c$ ’th class.

To account for the overfitting problem, we perform dropout by a factor of 0.25 at each layer. The dataset was broken down into train-validation sets with the ratio of 10:1 (train on 90% of the training data and validation on the remaining 10%). At each epoch, the validation set loss was monitored to save the best model with the minimum validation loss. Batch normalization is applied to each layer—to reduce the training time and prevent diverging gradients—followed by a rectified linear unit activation function. We train our model for 50 epochs, with a batch size of 20. A total number of 8,643,458 trainable parameters are optimized in each ECNN network.

2.2. Data Aggregation

In this stage, we aggregate the extracted contextual information from different resolutions to produce the final heatmaps, which are performed in three steps: (1) upsampling, (2) size-based structure weighting, and (3) aggregating using a CNN.

2.2.1. Upsampling

The heatmaps, generated by the ECNNs from the contextual information extraction stage, have different sizes. Heatmaps of lower resolutions are upsampled to the same size as the largest heatmap, which corresponds to the size of the original WSI (e.g., $D_{4}$ is upsampled by a factor of 4). The bicubic interpolation method with $4 \times 4$ kernel size is used to upsample the heatmaps.

2.2.2. Size-based structure weighting

As previously discussed, the segmentation of the larger structures benefits from a larger field-of-view, whereas the segmentation of smaller structures benefits from a higher spatial resolution. As such, we weight the heatmaps to favor larger structures from lower resolutions and smaller structures from higher resolutions, generating a weighted heatmap for each class (see Fig. 3). In addition, the size-based structure weighting step enables to efficiently integrate the extracted contextual information from different resolutions to a single weighted heatmap, which improves the scalability of inputs in second stage. Although the main objective for the size-based weighting step is to integrate the heatmaps containing structures with large size variations, it can still be effective if the size of the structures does not change drastically. As such, Eq. (2) is used to generate the weights for the structures of different size at each resolution:

{\begin{cases} W_{M_{i}}^{j} = \frac{1}{1 + e^{- Δ (A_{M_{i}}^{j} - τ_{i})}} - \frac{1}{1 + e^{- Δ (A_{M_{i}}^{j} - τ_{i + 1})}} \\ i \in {0, 1, 2, 3} \\ M_{i} \in {D_{1}, D_{2}, D_{4}, D_{8}} \\ τ_{i} \in {τ_{0} = - \infty, τ_{1}, τ_{2}, τ_{3}, τ_{4} = + \infty} \end{cases},

(2)

Fig. 3 — Size-based structure weighting of the heatmaps of different resolutions for each class. An example WSI of lung dataset is presented, which contains five different classes. For simplicity, only the heatmap weighting process for one class (blood vessel) is visualized.

In Eq. (2), $W_{M_{i}}^{j}$ is the weight that applies to the $j$ ’th structure at resolution $M_{i}$ , which has the size of $A$ . $τ_{1}$ , $τ_{2}$ , and $τ_{3}$ are hyperparameters used to adjust the weighting at each resolution. $Δ$ is a constant value, representing the slope of attenuation at threshold $τ_{i}$ .

Equation (2) ensures that the weighted heatmaps maintain the same range of values (between 0 and 1). We heuristically optimized the values of $τ_{1}$ , $τ_{2}$ , and $τ_{3}$ as part of the hyperparameter tuning process in the second stage.

At each resolution, the calculated weights in Eq. (2) are multiplied by the pixel values of the corresponding structures to weight the heatmap of that resolution. The weighted heatmaps, from each resolution, are superimposed to generate the overall weighted heatmap [see Eq. (3)]:

{\begin{cases} H^{weighted} (x, y) = \sum_{i = 0}^{3} \sum_{j = 1}^{N_{M_{i}}} W_{M_{i}}^{j} \times H_{M_{i}} (x, y), \\ (x, y) \in pixel coordinates of the j^{'} th structure in H_{M_{i}}, \end{cases}

(3)

where $H_{M_{i}}$ is upsampled heatmap outputted by the ECNN at $M_{i}$ resolution, $H^{weighted}$ is the weighted heatmap, $W_{M_{i}}^{j}$ is the weight for $j$ ’th structure at $M_{i}$ resolution [Eq. (2)], and $N_{M_{i}}$ is the number of structures in $H_{M_{i}}$ .

2.2.3. Aggregating CNN

In the size-based structure weighting step, we integrated the contextual information on resolution and field-of-view of four different resolutions (i.e., $D_{1}$ , $D_{2}$ , $D_{4}$ , and $D_{8}$ ) to generate an overall weighted heatmap at the highest resolution (i.e., $D_{1}$ ) for each class. Here our aim is to improve the overall segmentation performance of our proposed method by developing an aggregating CNN (ACNN), which utilizes the corresponding overall weighted heatmaps of each class and the original WSI and generates the final heatmap for that class. As such, for each class, we train a separate ACNN model, which is an expert in the segmentation of the biological structures in that class. For our single-class placenta dataset, only one ACNN is used. For our multiclass lung dataset, which consists of five classes (i.e., blood vessel, bronchus, alveoli sac, alveoli wall, and background), five ACNNs are used.

To prepare the input data for the ACNNs, for each class, the original WSIs and the corresponding overall weighted heatmap of that class are concatenated to form a four-dimensional matrix (i.e., one dimension corresponds to the weighted heatmap and three dimensions correspond to the color channels of the input WSI). ACNN inputs are patches of size $256 \times 256 pixels$ from the 4D matrix (i.e., the patch size is $256 \times 256 \times 4$ ). Although the extracted patches from highest resolution offer a limited field-of-view for the segmentation of larger biological structures, the limited field-of-view is mitigated by the information provided by the overall weight heatmaps.

To construct the final heatmap for each class, the outputs of the corresponding ACNN model (i.e., the heatmap patches) are put together. For our single-class placenta dataset and multiclass lung dataset, one and five final heatmaps are constructed for each input WSI by training a single ACNN and five ACNNs, respectively. To counter the discontinuities in the output heatmap patches, the input patches are extracted with 50% overlap in vertical and horizontal axis, in the testing phase. The final heatmaps are constructed by averaging over the intersecting regions in heatmap patches.

To generate the final segmentation maps (i.e., binary masks) for each input subimage in our single-class placenta dataset, we use 0.5 threshold to capture villi and nonvilli pixels from the output of ACNN (probability heatmap). For segmentation maps in our multiclass lung dataset, we use score voting among the outputs of ACNNs of each class (i.e., classifying a pixel by which class has the highest $H^{weighted} (x, y)$ value).

In our experiments, we realized that due to the difficulty of the segmentation task in this step, the vanilla U-Net is not a suitable architecture for the ACNN models. As such, we implemented the ACNNs with U-Net as the core segmentation architecture and EfficientNetB0 backbone as the encoder of our U-Net model. The EfficientNet²¹ variants are the state-of-the-art classification models, which demonstrated the best classification performance on ImageNet dataset²² while maintaining a small number of parameters in comparison to other state-of-the-art classification models, such as ResNet²³ variants. Although EfficientNetB0 is the simplest variant within the EfficientNet family, EfficinetNetB0 is a suitable architecture for our ACNN models due to the lower complexity of the segmentation task in our datasets in comparison to ImageNet (i.e., 1000 class classifications on ImageNet in contrast to a single-class classification in our dataset). To implement the EfficientNetB0-UNet model, the fully connected layers of the EfficientNetB0 are removed, and the output of the EfficientNetB0 tower is connected to the input of the upsampling units in the contracting path of the U-Net model. The ACNN architecture consists of five skip connections, which corresponds to the five main convolutional layers in EfficientNetB0 architecture.

The ACNNs are trained from scratch using the Nadam optimizer with an adaptive learning rate with an initial learning rate of 10⁻³. We noticed that our datasets were large enough for optimizing the ACNN network. We implemented the weighted binary cross-entropy loss [see Eq. (4)] for the training of our ACNN model, which enables us to address the class imbalance in our datasets:

L^{wbce} = - \frac{1}{M} \sum_{n = 1}^{N} \sum_{m = 1}^{M} [w \times y_{m} (n) \times \log ({\hat{y}}_{m} (n)) + (1 - y_{m} (n)) \times \log (1 - {\hat{y}}_{m} (n))],

(4)

In Eq. (4), $L^{wbce}$ is the weighted binary cross-entropy loss function. $y_{m} (n)$ and ${\hat{y}}_{m} (n)$ are the $n$ ’th pixels of the $m$ ’th training patch of the ground truth and predicted heatmap, respectively. The $M$ , $N$ , $N_{p}$ , and $w$ are the number of training examples, the total number pixels, the number of pixels of the positive class, and the weight for villi class, respectively.

To validate the training process, the dataset is broken down with the ratio of 3:1 (training on 75% of the training data and validation on the remaining 25%). At each epoch, the validation set loss was monitored to save the best model with the minimum validation loss. Batch normalization was applied to each layer to reduce the training time and prevent diverging gradients. We trained our model for 100 epochs with a batch size of 10. A total number of 22,851,229 trainable parameters were optimized in our ACNN network.

3. Experiment

3.1. Dataset

The performance of the proposed multiresolution semantic segmentation method is evaluated on two different datasets: (1) high-resolution histopathology WSIs of human placenta and (2) high-resolution histopathology WSIs of mouse lung tissue. The human placenta images are used to evaluate the method in single-class semantic segmentation (i.e., villi and nonvillous). The mouse lung images are used to evaluate the method in multiclass semantic segmentation (i.e., blood vessel, bronchus, alveoli sac, alveoli wall, and background).

3.1.1. Histopathology images of human placenta

The single-class dataset comprises high-resolution digital scans of 10 placental histopathology specimens obtained from the Research Centre for Women’s and Infants Health Biobank (Mount Sinai Hospital, Toronto, Ontario, Canada). The ethics approval to perform subanalyses on the Biobank samples was obtained from the Carleton University Research Ethics Board, Ottawa Health Science Network Research Ethics Board, and the Children’s Hospital of Eastern Ontario (CHEO) Research Ethics Board. The placental specimens were fixed in paraffin wax, stained with hematoxylin, washed in a 0.3% acid alcohol solution, and counterstained with eosin following the standard protocol for hematoxylin and eosin (H&E) staining in the Department of Pathology and Laboratory Medicine at the CHEO. The slides were scanned using an Aperio CS2 slide scanner (Leica), and high-resolution color images at $20 \times$ magnification (i.e., $0.5 μ m / pixel$ ) were obtained. All images were taken from different patients and the sizes of the images were up to $39,963 \times 31,959 pixels$ . From each placental WSI, five subimages of size $2740 \times 3964 pixels$ were extracted, amounting to a total of 50 subimages, which were used as our single-class dataset.

To generate the ground-truth segmentation for the placenta dataset, each placental subimage was segmented using a previously published algorithm.²⁴ Then the algorithm-generated segmentation maps were refined by the first author Sina Salsabili using ImageJ software.²⁵ The average (±standard deviation) IOU between the algorithm-generated segmentation maps and refined segmentation maps was 0.688 (±0.094). In total, 10,809 Villi structures were annotated for the ground-truth segmentation of our single-class placenta dataset.

3.1.2. Histopathology images of mice lung tissue

The multiclass semantic segmentation dataset comprises high-resolution WSIs of 20 lung histopathology specimens of mice obtained from the Sinclair Centre for Regenerative Medicine (Ottawa Hospital Research Institute, Ottawa, Ontario, Canada). All animal experiments were conducted in accordance with protocols approved by the University of Ottawa animal care committee. The lung specimens were inflation fixed through the trachea with 10% buffered formalin, under 20 cm $H_{2} O$ pressure, for 5 min. After the trachea was ligated, the lungs were immersion fixed in 10% buffered formalin for 48 h at room temperature and then immersed in 70% ethanol for 24 h at room temperature. Lungs were then paraffin-embedded, cut into $4 μ m$ sections, and stained with H&E. The slides were scanned using an Aperio CS2 slide scanner (Leica), and high-resolution color images at $20 \times$ magnification (i.e., $0.5 μ m / pixel$ ) were obtained. All WSIs were taken from different mice, and the WSI size was as large as $22,430 \times 22,900 pixels$ .

Manual segmentation for these images was generated using a semiautomated approach by the first author Sina Salsabili. To generate the labels, as an initial step, the background regions were manually segmented and then, the bronchi and blood vessels were manually annotated using the ImageJ software.²⁵ Using the segmentation pipeline in a previously published algorithm,²⁶ the labels for the remaining structures (i.e., alveoli sacs and alveoli walls) are automatically generated at $20 \times$ resolution and then manually refined. The average (±standard deviation) IOU between the algorithm-generated segmentation maps and refined segmentation maps was 0.687 ( $\pm 0.084$ ). In total, 8447 blood vessel structures, 1071 bronchus structures, and 620,646 alveoli sac structures were annotated for the manual segmentation of our multiclass mouse lung tissue dataset.

3.2. Preprocessing

We apply a preprocessing step to address color variations and imaging artifacts, which may negatively affect segmentation performance. A number of factors can contribute to variations of the color content in histological images (e.g., histochemical staining time, amount of histology stain used) across different WSIs. We apply color normalization²⁷ to the input WSIs to mitigate such variations. In our dataset, there exist imaging artifacts that appear as darkened areas of the image, which result in misclassification of the region. Examples of these regions in our placental villi dataset and the approach taken to suppress these darkened regions are based on a previously published work.²⁴

3.3. Evaluation

Semantic segmentation was performed using the proposed multiresolution approach (Sec. 2) and compared against three state-of-the-art segmentation models: (1) U-Net,¹⁹ (2) SegNet,²⁸ and (3) DilatedNet.²⁹ We trained our proposed methods and each of these models on our single-class and multiclass datasets using fivefold cross validation, which enables the models to be tested on our entire dataset.

For single-class segmentation, within each fold, we used subimages of eight WSIs from the dataset (i.e., 40 subimages) as the training dataset, and the remaining two WSIs (i.e., 10 subimages) were used as the test dataset. This was repeated five times such that the subimages of each WSI was used as the test set. As our proposed method consists of two main training stages, at each fold, the training dataset is randomly divided into two subtraining datasets (i.e., each subtraining dataset contains 20 subimages). The first subtraining dataset is used for training of ECNNs and the second subtraining dataset is used for training of the ACNN. Similar datasets are utilized to train the evaluation models (i.e., U-Net, SegNet, and DilatedNet) at each fold; however, we used both subtraining datasets (40 subimages) in the training phase. The distribution of the classes in our single-class dataset was [villi, nonvillous] = [0.38, 0.62].

For multiclass segmentation, within each fold, we used 16 WSIs from the dataset as the training dataset, and the remaining four WSIs were used as the test dataset. This was repeated five times such that each WSI was used in the test set. For the proposed method, at each fold, the training dataset is randomly divided into two subtraining datasets (i.e., each subtraining dataset contains eight WSIs) and each subtraining dataset is used for training of ECNN and ACNN stages. For evaluation of U-Net, SegNet, and DilatedNet models, both subtraining datasets are used for training. The distribution of the classes in our multiclass dataset was [blood vessel, bronchus, alveoli sac, alveoli wall, and background] = [0.02, 0.03, 0.26, 0.14, 0.55] (see Fig. 4). Due to the large class imbalance in our multiclass dataset, multiple steps had been undertaken. First, we excluded patches that were mainly extracted from the background class by removing patches with a mean pixel value close to white ([R, G, B] = [255, 255, 255], where R, G, and B are the red, green, and blue components in RGB color mode). Second, we used data augmentation (90 deg rotations and image flipping) to increase the number of patches of minority classes (i.e., blood vessel and bronchus classes). The distribution of the classes used for training was [blood vessel, bronchus, alveoli sac, alveoli wall, and background] = [0.11, 0.17, 0.31, 0.16, 0.25].

Fig. 4 — Example of histopathology WSI of mouse lung tissue and the distribution of different classes in multiclass lung dataset.

3.4. Performance Metrics

Segmentation performance is evaluated in terms of pixel-wise accuracy (PA),³⁰ mean Dice similarity coefficient (DSC),³¹ and mean of the mean positive predictive value (PPV):³⁰

PA = \sum_{c} \frac{{TP}_{c} + {TN}_{c}}{{TP}_{c} + {TN}_{c} + {FP}_{c} + {FN}_{c}},

(5)

DSC = 1 / C \times \sum_{c} \frac{2 \times {TP}_{c}}{2 \times {TP}_{c} + {FP}_{c} + {FN}_{c}},

(6)

V = 1 / C \times \sum_{c} \frac{{TP}_{c}}{{TP}_{c} + {FP}_{c}} .

(7)

In Eqs. (57), $C$ is the number of classes, and ${TP}_{c}$ , ${TN}_{c}$ , ${FP}_{c}$ , and ${FN}_{c}$ are the numbers of true positives, true negatives, false positives, and false negatives for class $c$ , respectively.

We also evaluated the performance of the proposed method in terms of its ability to segment the structures based on their size, on a natural log scale. To evaluate our proposed method in this setting, we initially define $N$ different size bins, where each bin represents the structures in the ground truth masks that their size falls in the range of the bin. The number of bins may vary between our single-class dataset and different classes in our multiclass dataset because the distribution of the structures’ size are different in these classes. Then, for the final segmentation map of each class, the average sensitivity (SE) of the class for each bin is calculated using Eq. (8). We report the average SE with respect to the structures’ size:

{\begin{cases} {SE}_{c}^{B_{i}} = \frac{{TP}_{c}^{B_{i}}}{{TP}_{c}^{B_{i}} + {FN}_{c}^{B_{i}}}, i \in {1,2, \dots, N}, \\ B_{i} \in {S_{j} | b_{i - 1} \leq A_{j} < b_{i}}, S_{j} \in {S_{1}, S_{2}, \dots, S_{M}}, b_{i} \in {b_{0}, b_{1}, \dots, b_{N}}, \end{cases}

(8)

where $B_{i}$ represents $i$ ’th bin, which contains the structures $S_{j}$ whose size $A_{j}$ is larger or equal to $b_{i - 1}$ and smaller than $b_{i}$ , $N$ denotes the total number of bins, and $M$ is the total number of structures, respectively. ${SE}_{c}^{B_{i}}$ , ${TP}_{c}^{B_{i}}$ , and ${FN}_{c}^{B_{i}}$ are the average SE, the number of true positives, and false negatives for class $c$ for the structures in $B_{i}$ , respectively.

4. Results

Figure 5 visualizes examples of semantic segmentation of our single-class dataset (i.e., histopathology subimages of placental villi) at the highest resolution (i.e., $D_{1}$ ). All the segmentation models show comparably high accuracy in the segmentation of placental villi structures. This behavior is also confirmed by the performance metrics calculated over our single-class placenta dataset in Table 1. Based on the results in Table 1, our proposed method shows the best performance in terms of PA and DSC metrics, and the PPV metric is 0.017 lower than the best performing model (U-Net $D_{2}$ ). In comparison to the previously published work,²⁴ the proposed method improved the results by 12.7%, 21.8%, and 24.6% in terms of PA, DSC, and PPV, respectively. Figure 6 visualizes the average SE of each method with respect to the size of the placental villi structures. In Fig. 6, our proposed method shows a consistently high performance in the segmentation of large structures as well as small structures. However, the performance of the baseline methods is subject to a high variability as the size of the structures changes, specifically in the segmentation of small structures at lower resolution.

Fig. 5 — Example semantic segmentation results in single-class placental villi dataset. The ground-truth segmentation and the predictions for villi class are labeled as green and blue, respectively.

Table 1.

PA, DSC, and PPV metrics for our single-class placenta dataset. The metrics are reported for mean ± standard deviation over each sample. PM refers to our proposed method. The highest calculated values for each metric is shown in bold.

Model/resolution	PA	DSC	PPV
U-Net ( $D_{1}$ )	0.917 ± 0.028	0.846 ± 0.016	0.986 ± 0.011
U-Net ( $D_{2}$ )	0.925 ± 0.024	0.861 ± 0.020	0.988 ± 0.080
U-Net ( $D_{4}$ )	0.941 ± 0.018	0.895 ± 0.013	0.982 ± 0.012
U-Net ( $D_{8}$ )	0.901 ± 0.032	0.810 ± 0.035	0.981 ± 0.011
Dilated-Net ( $D_{1}$ )	0.948 ± 0.016	0.910 ± 0.014	0.982 ± 0.011
Dilated-Net ( $D_{2}$ )	0.930 ± 0.019	0.874 ± 0.023	0.965 ± 0.045
Dilated-Net ( $D_{4}$ )	0.945 ± 0.021	0.906 ± 0.013	0.970 ± 0.025
Dilated-Net ( $D_{8}$ )	0.911 ± 0.036	0.840 ± 0.041	0.955 ± 0.031
SegNet ( $D_{1}$ )	0.934 ± 0.026	0.883 ± 0.015	0.983 ± 0.010
SegNet ( $D_{2}$ )	0.936 ± 0.023	0.892 ± 0.021	0.936 ± 0.047
SegNet ( $D_{4}$ )	0.910 ± 0.032	0.833 ± 0.022	0.970 ± 0.022
SegNet ( $D_{8}$ )	0.841 ± 0.053	0.681 ± 0.031	0.903 ± 0.045
PM ( $D_{1}$ )	0.927 ± 0.017	0.902 ± 0.047	0.949 ± 0.028
PM ( $D_{1}$ and $D_{2}$ )	0.941 ± 0.024	0.888 ± 0.043	0.965 ± 0.031
PM ( $D_{1}$ , $D_{2}$ , and $D_{4}$ )	0.949 ± 0.043	0.917 ± 0.012	0.979 ± 0.043
PM ( $D_{1}$ , $D_{2}$ , $D_{4}$ , and $D_{8}$ )	0.956 ± 0.021	0.925 ± 0.025	0.971 ± 0.018

Open in a new tab

Fig. 6 — Average SE of each method with respect to the structures’ size in single-class placental villi dataset.

Figure 7 visualizes an example of multiclass semantic segmentation of a lung histopathology WSI at the highest resolution, in comparison to baseline methods at $D_{1}, D_{2}, D_{4}$ , and $D_{8}$ . As shown in Fig. 7, unlike the baseline methods that perform well in segmentation of large structures at lower resolutions and small structures at higher resolutions, our proposed method is capable of accurate segmentation of both large and small structures. Table 2 contains the performance metrics for semantic segmentation of our multiclass lung dataset, which confirms the superior performance of our proposed method against state-of-the-art segmentation methods at different resolutions over PA, DSC, and PPV performance metrics. The average SE of the bronchus, the blood vessel, and the alveoli sac classes with respect to the structure size are visualized in Fig. 8. In Fig. 8(a), large differences can be observed between the performances of the baseline methods, particularly at lower resolutions (i.e., $D_{4}$ and $D_{8}$ ) when segmenting small structures in the alveoli class. The proposed method shows consistently high performance for the segmentation of small and large structures in this class. As it can be seen in Figs. 8(b) and 8(c), by moving along the size axis, the performance of the baseline models drastically changes for the segmentation of structures in bronchus and blood vessel classes. The proposed method demonstrates a comparably consistent and superior performance in the segmentation across different size structures.

Fig. 7 — Example semantic segmentation of a lung histopathology WSI using proposed method and SegNet,²⁸ DilatedNet,²⁹ and U-Net¹⁹ at different resolutions. The blood vessel, bronchus, alveoli sac, alveoli wall, and background classes are labeled as red, green, blue, yellow, and gray, respectively. The black arrow and the black rectangle point out to a large and small blood vessel structures, respectively. By comparing the segmentation of these structures across our proposed method and the baseline algorithms, it is evident that unlike baseline algorithms that show better performance in segmentation of small blood vessel structure at higher resolutions and vice versa, the proposed method shows a more balanced segmentation for both small and large structures.

Table 2.

PA, DSC, and mean PPV metrics for our multiclass lung dataset. The metrics are reported for mean ± standard deviation over each sample. PM refers to our proposed method. The highest calculated values for each metric is shown in bold.

Model/resolution	PA	DSC	PPV
U-Net ( $D_{1}$ )	0.959 ± 0.034	0.786 ± 0.044	0.783 ± 0.037
U-Net ( $D_{2}$ )	0.962 ± 0.028	0.815 ± 0.031	0.829 ± 0.029
U-Net ( $D_{4}$ )	0.962 ± 0.021	0.804 ± 0.017	0.826 ± 0.019
U-Net ( $D_{8}$ )	0.955 ± 0.043	0.764 ± 0.038	0.806 ± 0.031
Dilated-Net ( $D_{1}$ )	0.959 ± 0.028	0.793 ± 0.026	0.751 ± 0.048
Dilated-Net ( $D_{2}$ )	0.961 ± 0.022	0.804 ± 0.024	0.775 ± 0.033
Dilated-Net ( $D_{4}$ )	0.965 ± 0.017	0.831 ± 0.019	0.806 ± 0.024
Dilated-Net ( $D_{8}$ )	0.958 ± 0.038	0.789 ± 0.031	0.773 ± 0.051
SegNet ( $D_{1}$ )	0.945 ± 0.027	0.723 ± 0.033	0.670 ± 0.057
SegNet ( $D_{2}$ )	0.960 ± 0.037	0.796 ± 0.027	0.767 ± 0.047
SegNet ( $D_{4}$ )	0.963 ± 0.025	0.817 ± 0.028	0.787 ± 0.039
SegNet ( $D_{8}$ )	0.961 ± 0.054	0.811 ± 0.051	0.782 ± 0.061
PM ( $D_{1}$ )	0.965 ± 0.041	0.814 ± 0.033	0.808 ± 0.019
PM ( $D_{1}$ and $D_{2}$ )	0.964 ± 0.029	0.844 ± 0.048	0.835 ± 0.023
PM ( $D_{1}$ , $D_{2}$ , and $D_{4}$ )	0.969 ± 0.055	0.859 ± 0.042	0.840 ± 0.037
PM ( $D_{1}$ , $D_{2}$ , $D_{4}$ , and $D_{8}$ )	0.971 ± 0.024	0.873 ± 0.033	0.833 ± 0.028

Open in a new tab

Fig. 8 — Average SE of each method with respect to the structures’ size for (a) Alveoli class, (b) bronchus class, and (c) blood vessel class in multiclass lung dataset.

5. Discussion

In this work, we aim to resolve the trade-off between field-of-view, spatial resolution, and computational complexity in semantic segmentation of histopathology WSIs, which contain biological structures with drastic size variations. To enable our model to yield high accuracy in segmentation of very large as well as very small biological structures, we extracted the patches of the same size ( $256 \times 256 pixels$ ) from the WSI at its original resolution (i.e., $D_{1}$ ) as well as three lower resolutions, obtained by downsampling the original WSI by the factor of 2, 4, and 8 (i.e., $D_{2}$ , $D_{4}$ , and $D_{8}$ , respectively).

We proposed a two-stage multiresolution semantic segmentation approach, which extracts the contextual information from different resolutions in the first stage and aggregates the extracted information in the second stage to improve the overall segmentation performance. Implementation of the proposed method in two separate stages enables us to independently use all possible contextual information at different resolutions for training of our model. A common approach in end-to-end multiresolution approaches is to match the field-of-view by center cropping the heatmaps at lower resolutions,¹⁶^,³² which makes the usage of the contextual information at lower resolution inefficient and may limit the overall performance.³³^,³⁴

For the single-class dataset, our proposed method outperformed the baseline methods in terms of DSC and PA metrics. The PPV score of our proposed method was slightly lower than the highest PPV score (1.7% lower than U-Net at $D_{2}$ ). Due to the low complexity of the segmentation task in our single-class dataset (i.e., distinct visual differences between villi structures and the background regions, and relatively low size variability of the villi structures), the baseline methods perform comparably well when using patches extracted at $D_{4}$ . However, by analyzing the performance of the models based on the size of the structures in Fig. 6, it can be observed that the size of the structures influences the performance of the baseline methods. This behavior is more evident in the segmentation of small structures, where lowering the resolution of the input patches caused the performance of the SegNet and U-Net models at resolutions $D_{4}$ and $D_{8}$ to drastically decline. The SE of most baseline methods steadily increases with increasing structure’s size, and SegNet and U-Net at resolution $D_{4}$ show a comparable performance against our proposed method in the segmentation of large structures ( $\sim 0.5 {mm}^{2}$ ). Overall, our proposed method was able to demonstrate a slight improvement against the baseline methods, by offering a consistently high performance in segmentation of different size structures in the single-class dataset.

The overall performance of the semantic segmentation in our multiclass lung dataset indicates that our proposed method considerably outperforms other baseline methods and the previously published works,²⁴^,²⁶ obtaining the highest scores in PA, DSC, and PPV performance metrics (Table 2). The size-based SE scores showing similar trends as of the segmentation in our single-class dataset [Fig. 8(a)]. This can be justified based on similar segmentation complexities in our single-class dataset and Alveoli class, where the structures’ size distribution and visual distinguishability are similar. However, in the segmentation of structures in the blood vessel [Fig. 8(b)] and bronchus class [Fig. 8(c)], the baseline methods show high variability in the segmentation performance of structures of different sizes. In Fig. 8(b), the baseline methods tend to segment small bronchus structures better at higher resolutions, and segment larger Bronchus structures better at lower resolutions. This confirms the SE of the segmentation task in bronchus class to trade-off between resolution and field-of-view of the extracted patches. Our proposed method shows a consistently higher performance against the baseline methods across all structure sizes. For very large structures (i.e., $\sim 2 {mm}^{2}$ ), the performance increase is smaller. This can be explained by complexity of the segmentation task in the bronchus class, where large terminal bronchi are being segmented and not enough visual cues are present even for human interpretation. This can also be confirmed by the sharp decline in all of our baseline methods’ performance in segmentation of very large Bronchus structures.

As shown in Fig. 8(c), similar to segmentation in bronchus class, the baseline methods at high resolutions tend to show comparably higher performance in segmentation of small structures. However, different trends can be observed in average SE of the methods as the size of the structures increase. By increasing the size in blood vessel class, all of the methods demonstrating a steady increase followed by a sharp decline for structures of larger than $0.2 {mm}^{2}$ . However, for structures larger than $0.2 {mm}^{2}$ , the performance of most baseline methods drastically increases. This may be related to the complexity of the segmentation task in blood vessel class, where resolution plays a key role to capture complex patterns of the structures in this class. This can be confirmed by the controversial behavior of baseline methods at lower resolution, where U-Net at resolution $D_{8}$ scores the best average SE among other baseline methods. Despite following a similar trend, our proposed method outperforms baseline methods in segmentation of the blood vessel class across all structure sizes by a large margin and exhibits lower variability as the size of the structures change. This remarkable performance is influenced by two main factors: (1) accessing contextual information from different resolutions and (2) using EfficientNetB0 encoder in our ACNN model.

In our experiment to develop the ACNN stage, the EfficientNetB0 was used as the encoder. Our experiment showed that by using the vanilla U-Net encoder, the proposed segmentation method acquires 0.947, 0.871, and 0.929 in placenta dataset and 0.962, 0.822, and 0.794 in lung dataset, in terms of PA, DCS, and PPV metrics, respectively. By comparing these results with implementation of the proposed method with EfficientNetB0 encoder (see Tables 1 and 2), we can conclude that the utilization of EfficientNetB0 encoder resulted in considerable improvements across all performance metrics in both datasets. This improvement may be related to the fact that the EfficientNetB0-UNet model offers a more convex objective function, which makes the optimization process faster and prevents diverging gradients.

An important limitation of our work was to implement our proposed method in two separate stages, which prevents optimization of our model and makes the training process inefficient and time-consuming. We expect further improvements in terms of training speed and performance can be achieved by restructuring our model into a single stage. Another limitation of this work was using the sample images as input to perform segmentation in our single-class placenta dataset instead of using the entire WSIs. Each of our placental histopathology WSIs consisted of thousands villous structures, which could take months to be completely annotated.

6. Conclusion

The proposed multiresolution semantic segmentation method is capable of addressing the trade-off between field-of-view, spatial resolution, and computational efficiency in the analysis of high-resolution histopathology. High resolution is helpful for the segmentation of smaller structures. Decreasing the resolution, increases the field-of-view, which is helpful for the segmentation of larger structures. Using a small, fixed patch size ( $256 \times 256$ ) enables to train the model on small size GPUs. We demonstrated that the performance of the patch-based CNNs in segmentation of histopathological structures with drastic size variations is limited by the trade-off between field-of-view and resolution of the extracted patches. Our method outperformed the state-of-the-art patch-based CNNs at different resolutions on two histopathology datasets. We demonstrated that our proposed method is able to segment histopathological structures with drastic size variations with high accuracy and consistency.

Acknowledgments

We would like to thank Dr. Bernard Thébaud and Marissa Lithopoulos, from the University of Ottawa, Ottawa Hospital Research Institute, and Children’s Hospital of Eastern Ontario Research Institute, for providing the histopathology WSIs of mouse lung tissue, and Dr. Shannon Bainbridge-White, from the University of Ottawa, for providing the histopathology WSIs of human placenta.

Biographies

Sina Salsabili: Biography is not available.

Adrian D.C. Chan is a professor with the Department of Systems and Computer Engineering at Carleton University. He is a biomedical engineering researcher with expertise in biomedical signal processing, biomedical image processing, noninvasive sensor systems, assistive devices, and accessibility. He is a registered professional engineer, a senior member of IEEE, a member of the Canadian Medical and Biological Engineering Society, a member of the Biomedical Engineering Society, and a 3M teaching fellow.

Eranga Ukwatta received the master’s and PhD degrees in electrical and computer engineering and biomedical engineering from Western University, Canada, in 2009 and 2013, respectively. From 2013 to 2015, he was a multicenter postdoctoral fellow with Johns Hopkins University and University of Toronto. He is currently an associate professor with the School of Engineering, University of Guelph, Canada, and an adjunct professor in the Department of Systems and Computer Engineering with Carleton University, Canada.

Contributor Information

Sina Salsabili, Email: sina.salsabili@carleton.ca.

Adrian D. C. Chan, Email: adrianchan@cunet.carleton.ca.

Eranga Ukwatta, Email: eukwatta@uoguelph.ca.

Disclosures

The authors declare no conflicts of interest.

Code and Data Availability

The archived version of the code described in this article can be freely accessed at https://github.com/ssalsabili/Multi-Resolution-Histopathology-Image-Segmentation. The data utilized in this study were obtained from the Research Centre for Women’s and Infants Health (RCWIH) Biobank (Mount Sinai Hospital, Toronto, Ontario, Canada) for placenta dataset, and the Sinclair Centre for Regenerative Medicine (Ottawa Hospital Research Institute, Ottawa, Ontario, Canada) for lung dataset. Data are available from the authors upon request and permission from Ottawa Hospital Research Institute and Research Centre for Women’s and Infants’ Health.

References

1.Mosli M. H., et al. , “Reproducibility of histological assessments of disease activity in UC.” Gut 64(11), 1765–1773 (2014). 10.1136/gutjnl-2014-307536 [DOI] [PubMed] [Google Scholar]
2.Politis D., Salsabili S., Chan A. D. C.. “An automated tool to assess air space size in histopathology images of lung tissue,” in IEEE Int. Instrum. and Meas. Technol. Conf. (I2MTC), IEEE, pp. 1–6 (2022). 10.1109/I2MTC48687.2022.9806556 [DOI] [Google Scholar]
3.Jobe A. H., “The new bronchopulmonary dysplasia,” Curr. Opin. Pediatr. 23, 167–172 (2011). 10.1097/MOP.0b013e3283423e6b [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Thébaud B., et al. , “Bronchopulmonary dysplasia,” Nat. Rev. Dis. Primers 5(1), 78 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Longtine M. S., et al. , “Villous trophoblast apoptosis is elevated and restricted to cytotrophoblasts in pregnancies complicated by preeclampsia, IUGR, or preeclampsia with IUGR,” Placenta 33(5), 352–359 (2012). 10.1016/j.placenta.2012.01.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Chen L., et al. , “A review for cervical histopathology image analysis using machine vision approaches,” Artif. Intell. Rev. 53(7), 4821–4862 (2020). 10.1007/s10462-020-09808-7 [DOI] [Google Scholar]
7.Zhou X., et al. , “A comprehensive review for breast histopathology image analysis using classical and deep neural networks,” IEEE Access 8, 90931–90956 (2020). 10.1109/ACCESS.2020.2993788 [DOI] [Google Scholar]
8.Wang D., et al. , “Deep learning for identifying metastatic breast cancer,” arXiv:1606.05718 (2016).
9.Qaiser T., et al. , “Fast and accurate tumor segmentation of histology images using persistent homology and deep convolutional features,” Med. Image Anal. 55, 1–14 (2019). 10.1016/j.media.2019.03.014 [DOI] [PubMed] [Google Scholar]
10.Xing F., Xie Y., Yang L., “An automatic learning-based framework for robust nucleus segmentation,” IEEE Trans. Med. Imaging 35, 550–566 (2016). 10.1109/TMI.2015.2481436 [DOI] [PubMed] [Google Scholar]
11.Alsubaie N., et al. , “A multi-resolution deep learning framework for lung adenocarcinoma growth pattern classification,” Commun. Comput. Inf. Sci. 894, 3–11(2018). 10.1007/978-3-319-95921-4_1 [DOI] [Google Scholar]
12.Kosaraju S. C., et al. , “Deep-Hipo: multi-scale receptive field deep learning for histopathological image analysis,” Methods 179, 3–13 (2020). 10.1016/j.ymeth.2020.05.012 [DOI] [PubMed] [Google Scholar]
13.Tsaku N. Z., et al. , “Texture-based deep learning for effective histopathological cancer image classification,” in Proc. IEEE Int. Conf. Bioinf. Biomed. BIBM 2019, pp. 973–977 (2019). 10.1109/BIBM47256.2019.8983226 [DOI] [Google Scholar]
14.Huang G., et al. , “Densely connected convolutional networks,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., pp. 4700–4708 (2017). [Google Scholar]
15.Sirinukunwattana K., et al. , “Improving whole slide segmentation through visual context - a systematic study,” Lect. Notes Comput. Sci. 11071, 192–200 (2018). 10.1007/978-3-030-00934-2_22 [DOI] [Google Scholar]
16.Tokunaga H., et al. , “Adaptive weighting multi-field-of-view CNN for semantic segmentation in pathology,” in IEEE/CVF Conf. Comput. Vision and Pattern Recognit. (CVPR) (2019). [Google Scholar]
17.Sam D. B., Surya S., Babu R. V., “Switching convolutional neural network for crowd counting,” in IEEE Conf. Comput. Vision and Pattern Recognit. (CVPR) (2017). 10.1109/CVPR.2017.429 [DOI] [Google Scholar]
18.van Rijthoven M., et al. , “HookNet: multi-resolution convolutional neural networks for semantic segmentation in histopathology whole-slide images,” Med. Image Anal. 68, 101890 (2021). 10.1016/j.media.2020.101890 [DOI] [PubMed] [Google Scholar]
19.Ronneberger O., Fischer P., Brox T., “U-Net: convolutional networks for biomedical image segmentation,” Lect. Notes Comput. Sci. 9351, 234–241 (2015). 10.1007/978-3-319-24574-4_28 [DOI] [Google Scholar]
20.Dozat T., “Incorporating nesterov momentum into adam,” in Proc. 4th Int. Conf. Learn. Represent. (ICLR) Workshop, pp. 1–4 (2016). [Google Scholar]
21.Tan M., Le Q., “EfficientNet: rethinking model scaling for convolutional neural networks,” in PMLR, pp. 6105–6114 (2019). [Google Scholar]
22.Deng J., et al. , “ImageNet: a large-scale hierarchical image database,” in IEEE Conf. Comput. Vision and Pattern Recognit., pp. 248–255 (2010). [Google Scholar]
23.He K., et al. , “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., pp. 770–778 (2016). [Google Scholar]
24.Salsabili S., et al. , “Automated segmentation of villi in histopathology images of placenta,” Comput. Biol. Med. 113, 103420 (2019). 10.1016/j.compbiomed.2019.103420 [DOI] [PubMed] [Google Scholar]
25.Schneider C. A., et al. , “NIH Image to ImageJ: 25 years of image analysis,” Nat. Methods 9, 671–675 (2012). 10.1038/nmeth.2089 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Salsabili S., et al. , “Fully automated estimation of the mean linear intercept in histopathology images of mouse lung tissue,” J. Med. Imaging 8(2), 027501 (2021). 10.1117/1.JMI.8.2.027501 [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Reinhard E., et al. , “Color transfer between images,” IEEE Comput. Graphics Appl. 21(5), 34–41 (2001). 10.1109/38.946629 [DOI] [Google Scholar]
28.Badrinarayanan V., Kendall A., Cipolla R., “SegNet: a deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017). 10.1109/TPAMI.2016.2644615 [DOI] [PubMed] [Google Scholar]
29.Yu F., Koltun V., Funkhouser T., “Dilated residual networks,” in IEEE Conf. Comput. Vision and Pattern Recognit. (CVPR) (2017). [Google Scholar]
30.Csurka G., et al. , “What is a good evaluation measure for semantic segmentation?” in Proc. Br. Mach. Vision Conf. 2013 (2013). [Google Scholar]
31.Crum W. R., Camara O., Hill D. L. G., “Generalized overlap measures for evaluation and validation in medical image analysis,” IEEE Trans. Med. Imaging 25(11), 1451–1461 (2006). 10.1109/TMI.2006.880587 [DOI] [PubMed] [Google Scholar]
32.Gu F., et al. , “Multi-resolution networks for semantic segmentation in whole slide images,” Lect. Notes Comput. Sci. 11039, 11–18 (2018). 10.1007/978-3-030-00949-6_2 [DOI] [Google Scholar]
33.Ho D. J., et al. , “Deep multi-magnification networks for multi-class breast cancer image segmentation,” Comput. Med. Imaging Graphics 88, 101866 (2021). 10.1016/j.compmedimag.2021.101866 [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Takahama S., et al. , “Multi-stage pathological image classification using semantic segmentation,” in IEEE/CVF Int. Conf. Comput. Vision (ICCV) (2019). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[r1] 1.Mosli M. H., et al. , “Reproducibility of histological assessments of disease activity in UC.” Gut 64(11), 1765–1773 (2014). 10.1136/gutjnl-2014-307536 [DOI] [PubMed] [Google Scholar]

[r2] 2.Politis D., Salsabili S., Chan A. D. C.. “An automated tool to assess air space size in histopathology images of lung tissue,” in IEEE Int. Instrum. and Meas. Technol. Conf. (I2MTC), IEEE, pp. 1–6 (2022). 10.1109/I2MTC48687.2022.9806556 [DOI] [Google Scholar]

[r3] 3.Jobe A. H., “The new bronchopulmonary dysplasia,” Curr. Opin. Pediatr. 23, 167–172 (2011). 10.1097/MOP.0b013e3283423e6b [DOI] [PMC free article] [PubMed] [Google Scholar]

[r4] 4.Thébaud B., et al. , “Bronchopulmonary dysplasia,” Nat. Rev. Dis. Primers 5(1), 78 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r5] 5.Longtine M. S., et al. , “Villous trophoblast apoptosis is elevated and restricted to cytotrophoblasts in pregnancies complicated by preeclampsia, IUGR, or preeclampsia with IUGR,” Placenta 33(5), 352–359 (2012). 10.1016/j.placenta.2012.01.017 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r6] 6.Chen L., et al. , “A review for cervical histopathology image analysis using machine vision approaches,” Artif. Intell. Rev. 53(7), 4821–4862 (2020). 10.1007/s10462-020-09808-7 [DOI] [Google Scholar]

[r7] 7.Zhou X., et al. , “A comprehensive review for breast histopathology image analysis using classical and deep neural networks,” IEEE Access 8, 90931–90956 (2020). 10.1109/ACCESS.2020.2993788 [DOI] [Google Scholar]

[r8] 8.Wang D., et al. , “Deep learning for identifying metastatic breast cancer,” arXiv:1606.05718 (2016).

[r9] 9.Qaiser T., et al. , “Fast and accurate tumor segmentation of histology images using persistent homology and deep convolutional features,” Med. Image Anal. 55, 1–14 (2019). 10.1016/j.media.2019.03.014 [DOI] [PubMed] [Google Scholar]

[r10] 10.Xing F., Xie Y., Yang L., “An automatic learning-based framework for robust nucleus segmentation,” IEEE Trans. Med. Imaging 35, 550–566 (2016). 10.1109/TMI.2015.2481436 [DOI] [PubMed] [Google Scholar]

[r11] 11.Alsubaie N., et al. , “A multi-resolution deep learning framework for lung adenocarcinoma growth pattern classification,” Commun. Comput. Inf. Sci. 894, 3–11(2018). 10.1007/978-3-319-95921-4_1 [DOI] [Google Scholar]

[r12] 12.Kosaraju S. C., et al. , “Deep-Hipo: multi-scale receptive field deep learning for histopathological image analysis,” Methods 179, 3–13 (2020). 10.1016/j.ymeth.2020.05.012 [DOI] [PubMed] [Google Scholar]

[r13] 13.Tsaku N. Z., et al. , “Texture-based deep learning for effective histopathological cancer image classification,” in Proc. IEEE Int. Conf. Bioinf. Biomed. BIBM 2019, pp. 973–977 (2019). 10.1109/BIBM47256.2019.8983226 [DOI] [Google Scholar]

[r14] 14.Huang G., et al. , “Densely connected convolutional networks,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., pp. 4700–4708 (2017). [Google Scholar]

[r15] 15.Sirinukunwattana K., et al. , “Improving whole slide segmentation through visual context - a systematic study,” Lect. Notes Comput. Sci. 11071, 192–200 (2018). 10.1007/978-3-030-00934-2_22 [DOI] [Google Scholar]

[r16] 16.Tokunaga H., et al. , “Adaptive weighting multi-field-of-view CNN for semantic segmentation in pathology,” in IEEE/CVF Conf. Comput. Vision and Pattern Recognit. (CVPR) (2019). [Google Scholar]

[r17] 17.Sam D. B., Surya S., Babu R. V., “Switching convolutional neural network for crowd counting,” in IEEE Conf. Comput. Vision and Pattern Recognit. (CVPR) (2017). 10.1109/CVPR.2017.429 [DOI] [Google Scholar]

[r18] 18.van Rijthoven M., et al. , “HookNet: multi-resolution convolutional neural networks for semantic segmentation in histopathology whole-slide images,” Med. Image Anal. 68, 101890 (2021). 10.1016/j.media.2020.101890 [DOI] [PubMed] [Google Scholar]

[r19] 19.Ronneberger O., Fischer P., Brox T., “U-Net: convolutional networks for biomedical image segmentation,” Lect. Notes Comput. Sci. 9351, 234–241 (2015). 10.1007/978-3-319-24574-4_28 [DOI] [Google Scholar]

[r20] 20.Dozat T., “Incorporating nesterov momentum into adam,” in Proc. 4th Int. Conf. Learn. Represent. (ICLR) Workshop, pp. 1–4 (2016). [Google Scholar]

[r21] 21.Tan M., Le Q., “EfficientNet: rethinking model scaling for convolutional neural networks,” in PMLR, pp. 6105–6114 (2019). [Google Scholar]

[r22] 22.Deng J., et al. , “ImageNet: a large-scale hierarchical image database,” in IEEE Conf. Comput. Vision and Pattern Recognit., pp. 248–255 (2010). [Google Scholar]

[r23] 23.He K., et al. , “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., pp. 770–778 (2016). [Google Scholar]

[r24] 24.Salsabili S., et al. , “Automated segmentation of villi in histopathology images of placenta,” Comput. Biol. Med. 113, 103420 (2019). 10.1016/j.compbiomed.2019.103420 [DOI] [PubMed] [Google Scholar]

[r25] 25.Schneider C. A., et al. , “NIH Image to ImageJ: 25 years of image analysis,” Nat. Methods 9, 671–675 (2012). 10.1038/nmeth.2089 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r26] 26.Salsabili S., et al. , “Fully automated estimation of the mean linear intercept in histopathology images of mouse lung tissue,” J. Med. Imaging 8(2), 027501 (2021). 10.1117/1.JMI.8.2.027501 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r27] 27.Reinhard E., et al. , “Color transfer between images,” IEEE Comput. Graphics Appl. 21(5), 34–41 (2001). 10.1109/38.946629 [DOI] [Google Scholar]

[r28] 28.Badrinarayanan V., Kendall A., Cipolla R., “SegNet: a deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017). 10.1109/TPAMI.2016.2644615 [DOI] [PubMed] [Google Scholar]

[r29] 29.Yu F., Koltun V., Funkhouser T., “Dilated residual networks,” in IEEE Conf. Comput. Vision and Pattern Recognit. (CVPR) (2017). [Google Scholar]

[r30] 30.Csurka G., et al. , “What is a good evaluation measure for semantic segmentation?” in Proc. Br. Mach. Vision Conf. 2013 (2013). [Google Scholar]

[r31] 31.Crum W. R., Camara O., Hill D. L. G., “Generalized overlap measures for evaluation and validation in medical image analysis,” IEEE Trans. Med. Imaging 25(11), 1451–1461 (2006). 10.1109/TMI.2006.880587 [DOI] [PubMed] [Google Scholar]

[r32] 32.Gu F., et al. , “Multi-resolution networks for semantic segmentation in whole slide images,” Lect. Notes Comput. Sci. 11039, 11–18 (2018). 10.1007/978-3-030-00949-6_2 [DOI] [Google Scholar]

[r33] 33.Ho D. J., et al. , “Deep multi-magnification networks for multi-class breast cancer image segmentation,” Comput. Med. Imaging Graphics 88, 101866 (2021). 10.1016/j.compmedimag.2021.101866 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r34] 34.Takahama S., et al. , “Multi-stage pathological image classification using semantic segmentation,” in IEEE/CVF Int. Conf. Comput. Vision (ICCV) (2019). [Google Scholar]

PERMALINK

Multiresolution semantic segmentation of biological structures in digital histopathology

Sina Salsabili

Adrian D C Chan

Eranga Ukwatta

Abstract.

Purpose

Approach

Results

Conclusions

1. Introduction

2. Multiresolution Semantic Segmentation

Fig. 1.

Fig. 2.

2.1. Contextual Information Extraction

2.2. Data Aggregation

2.2.1. Upsampling

2.2.2. Size-based structure weighting

Fig. 3.

2.2.3. Aggregating CNN

3. Experiment

3.1. Dataset

3.1.1. Histopathology images of human placenta

3.1.2. Histopathology images of mice lung tissue

3.2. Preprocessing

3.3. Evaluation

Fig. 4.

3.4. Performance Metrics

4. Results

Fig. 5.

Table 1.

Fig. 6.

Fig. 7.

Table 2.

Fig. 8.

5. Discussion

6. Conclusion

Acknowledgments

Biographies

Contributor Information

Disclosures

Code and Data Availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases