Abstract.
Purpose
Differentiating primary central nervous system lymphoma (PCNSL) and glioblastoma (GBM) is crucial because their prognosis and treatment differ substantially. Manual examination of their histological characteristics is considered the golden standard in clinical diagnosis. However, this process is tedious and time-consuming and might lead to misdiagnosis caused by morphological similarity between their histology and tumor heterogeneity. Existing research focuses on radiological differentiation, which mostly uses multi-parametric magnetic resonance imaging. By contrast, we investigate the pathological differentiation between the two types of tumors using whole slide images (WSIs) of postoperative formalin-fixed paraffin-embedded samples.
Approach
To learn the specific and intrinsic histological feature representations from the WSI patches, a self-supervised feature extractor is trained. Then, the patch representations are fused by feeding into a weakly supervised multiple-instance learning model for the WSI classification. We validate our approach on 134 PCNSL and 526 GBM cases collected from three hospitals. We also investigate the effect of feature extraction on the final prediction by comparing the performance of applying the feature extractors trained on the PCNSL/GBM slides from specific institutions, multi-site PCNSL/GBM slides, and large-scale histopathological images.
Results
Different feature extractors perform comparably with the overall area under the receiver operating characteristic curve value exceeding 85% for each dataset and close to 95% for the combined multi-site dataset. Using the institution-specific feature extractors generally obtains the best overall prediction with both of the PCNSL and GBM classification accuracies reaching 80% for each dataset.
Conclusions
The excellent classification performance suggests that our approach can be used as an assistant tool to reduce the pathologists’ workload by providing an accurate and objective second diagnosis. Moreover, the discriminant regions indicated by the generated attention heatmap improve the model interpretability and provide additional diagnostic information.
Keywords: primary central nervous system lymphoma and glioblastoma, pathological differentiation, whole slide images, computer-aided diagnosis, weakly supervised deep learning
1. Introduction
Primary central nervous system lymphoma (PCNSL) and glioblastoma (GBM) are two important but quite different malignant brain tumors, accounting for 1.9% and 14.6% of all brain neoplasms, respectively.1 The identification of the two is crucial as their prognosis and treatment differ substantially. For patients with PCNSL, high-dose methotrexate-based chemotherapy is the standard treatment, whereas patients with GBM usually undergo surgical resection followed by chemotherapy and radiotherapy with temozolomide.2,3 The 5-year survival rate for patients diagnosed with PCNSL is 35.3%, whereas the rate is only 6.8% for patients with GBM.1 Histopathology is considered the gold standard and mandatory in clinical diagnosis of PCNSL and GBM.4,5 Precise and timely differentiation of PCNSL and GBM in histopathology may avoid unnecessary craniotomy through pre-operative stereotactic biopsy, assist in making intra-operative surgical strategies, and guide the postoperative immunohistochemistry and molecular testing. Morphologically, the hematoxylin and eosin (H&E)-stained PCNSL tissue exhibits the angiocentric growth pattern with sheets of tumor cells clustering within and around the blood vessels.2 The PCNSL cells have monotonous nuclei with prominent nucleoli and scant basophilic cytoplasm.6,7 By contrast, the GBM tissue often shows a glial fibrillary background and is composed of pleomorphic tumor cells with predominant astrocytic differentiation.8 Microscopically, GBM appears extremely heterogeneous and shows characteristics of cellularity, nuclear atypia, cellular pleomorphism, mitotic activity, microvascular proliferation, and palisading necrosis.9 Figure 1 shows typical PCNSL and GBM at the magnification of . Manual examination of the tissues obtained in the surgical resection is challenging and might cause misdiagnosis because of the similarities between the histological characteristics of the two types of tumors. First, both PCNSL and GBM are highly proliferating, rich in cells, and diffusely infiltrate the brain parenchyma.2,10 Second, in H&E-stained histology images, the mottled pattern of dense hyperchromatic tumor nuclei alternating with lightly stained neurofibrous stroma is typical for PCNSL but this feature imitates the necrosis which is common in GBM (Fig. 2). Third, unlike the lightly stained appearance of most GBM owing to the glial fibrillary background, small cell GBM appears hyperchromatic, which is easily confused with PCNSL (Fig. 2). All these issues pose challenges to the pathologists. Particularly, most pathologists in China are overworked and responsible for the diagnosis covering various systems of the human body due to the pathologist shortage;11 the inadequate subspecialty training further increases the misdiagnosis rate, especially for rare diseases such as PCNSL.
Fig. 1.
Typical PCNSL (left) and GBM (right), both at .
Fig. 2.
Typical mottled pattern in PCNSL (a) and necrosis in GBM (b) at . Typical PCNSL (c) and small cell GBM (d) at .
Computer-aided differentiation of PCNSL and GBM has received great attention in the last decade. However, the research mainly focuses on radiological differentiation, which mostly uses multi-parametric magnetic resonance imaging.12–17 In addition, very recent work discriminates PCNSL from non-PCNSL lesions, especially glioma, using frozen whole slide images (WSIs).18 By contrast, pathological differentiation of PCNSL and GBM by conducting computational analysis of histology imaging is of high significance in assisting clinical diagnosis but has not yet been fully investigated.
Deep learning has been widely applied in histology image classification. To tackle the gigapixel size of the WSIs, the background has to be excluded and the remaining tissue region has to be cropped into small patches.18,19 Some researchers assume that all patches share the same label with the WSI from which they are extracted.18,20–22 Therefore, a patch-level classifier can be trained by feeding the extracted patches into deep learning algorithms such as Inception V3,20 ResNet18,21 and other convolutional neural networks (CNNs).18,22 Finally, the patch-level predictions are aggregated to generate the slide-level classification.18,20 By contrast, more research considers the WSI classification problem under the theory of multiple instance learning (MIL), which assumes that the WSI should be classified as positive if at least one patch is positive and the WSI belongs to the negative class only if all patches are negative.23 Many MIL variants have been proposed to make it more flexible and adaptive to binary and multi-class classification tasks.19,24–30 These approaches often consist of two stages: patch feature extraction using CNN to reduce the data dimensionality and patch feature aggregation by applying attention-based MIL19,24–26 or transformer27 approaches for WSI diagnosis. Besides, some approaches select top-ranking patches with more representative features manually or by employing some selection strategies before applying the MIL-based patch aggregation for the WSI classification.24,26,28–30 Recently, self-supervised learning strategies such as SimCLR31 and SimSiam32 have drawn attention for pretraining a feature extractor on unlabeled patches.33–38 Compared with a deep learning model pretrained on ImageNet such as ResNet5019,25,27 or a custom CNN pretrained on the patches with WSI labels,26 self-supervised feature extractor is expected to extract domain-invariant and intrinsic patch features, which potentially mitigates the data imbalance problem and benefits the generalization of the WSI classification model.39
In this work, we propose a weakly supervised deep learning approach to differentiate the PCNSL and GBM WSIs obtained from postoperative formalin-fixed paraffin-embedded (FFPE) samples. To extract inherent morphological features from the patches and handle the data imbalance problem, a self-supervised feature extractor is trained in the SimSiam framework. Then, the patch features are fed into an attention-based MIL algorithm for the WSI classification. We validate our approach on the WSI datasets collected from three hospitals. Excellent and interpretable PCNSL/GBM differentiation is achieved in both the multi-center study and the validation of the combined dataset. The high performance suggests that our approach can be used as an assistant tool to reduce the pathologists’ workload by providing accurate and objective second diagnosis, which helps guide the postoperative immunohistochemistry and molecular testing. In addition, it is straightforward to extend our approach to differentiate the PCNSL and GBM WSIs of the biopsy and frozen samples, which potentially benefits the treatment planning in the pre- and intra-operative scenarios.
2. Methods
2.1. Datasets
Three WSI datasets were collected, namely JiNan, QingDao, and SiChuan datasets, respectively, according to the location of the hospital each dataset was collected from. All slides were obtained from H&E-stained postoperative FFPE samples with tumors clinically proven by histopathological and immunohistochemical examinations. Table 1 displays the number of PCNSL/GBM cases and slides included in each dataset. The WSI resolutions of the JiNan, QingDao, and SiChuan datasets are about 0.27, 0.27, and 0.24 microns per pixel (mpp), respectively. The size of the WSIs reaches the gigapixel level.
Table 1.
PCNSL/GBM data distribution and splitting for CLAM training/testing on each and the combined datasets.
| Dataset | PCNSL cases/slides | GBM cases/slides | Folds | PCNSL cases for training/testing | GBM cases for training/testing |
|---|---|---|---|---|---|
| JiNan | 45/163 | 416/417 | 4 | 36/9 | 315/101 |
| QingDao | 59/59 | 50/50 | 2 | 35/24 | 30/20 |
| SiChuan | 30/30 | 60/60 | 2 | 20/10 | 35/25 |
| Combined | 134/252 | 526/527 | 5 | 110/24 | 424/102 |
2.2. Tissue Segmentation and Patching
The tissue segmentation and patching process is illustrated in Fig. 3. To exclude the background including the whitespace, dirt, and artifact from the WSI, a tissue segmentation process is applied. The WSI is firstly downsampled to 1/32 of its original size and converted from RGB to the hue-saturation-value (HSV) color space. The tissue contour is then generated by successively applying median blurring, saturation channel thresholding, morphological closing, and area thresholding.19 Because the WSIs are gigapixel and the cellular morphology is the key to differentiate the PCNSL and GBM, the segmented tissue region is cropped into non-overlapping patches at the highest magnification for further processing. In our study, the number of patches extracted from the tissue region of each slide at the magnification of about varies between 248 and 107,475.
Fig. 3.
Tissue segmentation and patching.
2.3. Self-Supervised Feature Learning
The unlabeled patches are used to train the feature extractor in a self-supervised learning scheme named SimSiam,32 which has fewer requirements on the training data and a simpler architecture but achieves competitive performance compared with other schemes such as SimCLR,31 BYOL,40 and SwAV.41 As illustrated in Fig. 4, SimSiam works with Siamese networks. For each training patch, two augmented views are generated and passed through the shared Siamese encoder consisting of a backbone and a projector. The standard ResNet50,42 specifically the five convolutional stages and the max-pooling layer, is used as the backbone, and thus, its output dimension is 2048. The projector consists of three fully connected layers and each layer contains 2048 neurons. Then, a predictor is followed on one branch and a stop-gradient operation is applied on the other branch. The predictor has two fully connected layers with 512 and 2048 neurons, respectively. During training, the model is optimized by maximizing the similarity between the outputs of both branches. When the model completes training, the first four convolutional stages of the backbone plus the max-pooling layer, as shown in Fig. 4, are used as the feature extractor in the subsequent WSI classification. It encodes each patch into a 1024-dimensional feature.
Fig. 4.
Self-supervised feature learning using SimSiam.
To enrich the training dataset and capture intrinsic patch representation, we design a sophisticated data augmentation strategy. After extensively investigating the commonly used data augmentation techniques,43–45 a series of options relevant to histology image processing are selected and categorized into three groups. The first group contains basic transforms such as rotation, flipping, and identity (which means no transform). The second group consists of identity and transforms in sharpness (e.g., adding Gaussian noise and Gaussian blurring), contrast, and brightness. The third group includes image color transforms in the original RGB, HSV, and hematoxylin-eosin-DAB color spaces. For an augmentation operation, three transforms randomly selected from each of the three groups are successively applied to generate the augmented image. Figure 5 illustrates the effect of our augmentation strategy.
Fig. 5.
An original patch and its augmented versions.
For SimSiam training, the number of epochs was empirically set to 100, during which time the training loss tends to be stable as we observed. The batch size was set to 128 considering the limited computation resources. The stochastic gradient descent optimizer was adopted to update the model parameters by minimizing the cosine similarity loss. The initial learning rate was 0.025 and gradually decreased with a certain schedule while training proceeded. The momentum was set to 0.9 and the weight decay was 0.0001.
2.4. WSI Classification
For a WSI consisting of patches extracted from the tissue region, the ’th patch is encoded into its instance-level representation by the previously trained feature extractor. Then, a set of feature representations are passed into the clustering-constrained-attention multiple instance learning (CLAM)19 method to perform the WSI classification. CLAM is a data-efficient algorithm with a simple architecture for weakly supervised WSI classification. As illustrated in Fig. 6, it employs an attention network to rank the feature representations and assigns scores according to their respective importance to the final prediction. First, a fully connected layer compresses into a 512-dimensional vector . Then, the feature embeddings pass through two parallel fully connected layers and , and the element-wise product is calculated between their outputs. After that, an additional fully connected layer is followed to generate the attention score for the ’th patch, which can be expressed as
| (1) |
where and are tanh and sigmoid activation functions, respectively. Finally, the slide-level representation is calculated as
| (2) |
Fig. 6.
WSI classification using CLAM.
The slide-level PCNSL/GBM prediction is given by
| (3) |
where is a fully connected layer with two neurons.
During inference, the trained CLAM model makes the diagnosis and the attention scores can be used to generate a slide heatmap, where high scores indicate the more relative regions to reveal the essential morphology for the diagnosis.
Empirically, the CLAM model was trained for at least 200 epochs and up to a maximum of 400 epochs, depending on the performance improvement within 20 continuous epochs. The batch size was set to 8. The Adam optimizer was adopted to update the model parameters by minimizing the cross-entropy loss. The learning rate was 0.0001, and the weight decay was 0.00001.
2.5. Dataset Splitting in Model Training and Validation
At first, we trained a feature extractor on the JiNan dataset and applied it to the other two datasets for feature extraction. Then, considering the variation in the WSI scanning and staining across different institutions, the models (i.e., the feature extractor and the prediction model) trained on one dataset might not generalize well on the other datasets. Hence, we performed a multi-center study, in which case the institution-specific feature extractor was trained for each individual dataset. For the SimSiam model training, to mitigate the data imbalance problem between the PCNSL and GBM classes in each dataset, the same number (specifically, 10 in our experiment) of PCNSL and GBM WSIs were randomly selected to extract patches. Each selected slide was taken from a unique patient to increase the data diversity. To capture the inherent morphological characteristics that are essential for PCNSL/GBM differentiation, the patches were extracted at the highest magnification of about . Finally, 498,480, 547,902, and 703,348 patches were extracted to train SimSiam models for the three datasets, respectively.
To fully validate the algorithms, we also combined the three datasets and trained a multi-institution feature extractor on the entire dataset. A similar training procedure was followed as training the institution-specific feature extractors, except that 15 PCNSL and 15 GBM slides (five PCNSL and five GBM slides from each dataset) were used for SimSiam model training.
After feature extraction, the cross-validation was applied to each and the combined dataset to achieve the PCNSL/GBM classification. To avoid bias in the CLAM model training/prediction, the cases used for SimSiam training were first excluded and the remaining data were split into the training and test sets. Then, the excluded slides were combined into the training set because their labels were not exposed in training the feature extractor. Because of the very limited number of slides included in our datasets, we did not create a validation split to tune the hyper-parameters. The -fold cross-validation was applied to reduce the impact of the variability in training and testing data splits, allowing for a more accurate assessment of the model’s generalization capabilities and thus avoiding potential overfitting. The test data were kept unseen in each fold model training. For the JiNan, QingDao, and SiChuan datasets, four-, two-, and two-fold cross-validation were applied, respectively, considering the different number of cases included in each dataset. For the combined dataset, five-fold cross-validation was applied. The patient-wise splitting makes sure that all slides belonging to one patient are included in either the training or test set. Table 1 displays the approximate number of PCNSL/GBM cases for training and testing in each fold.
2.6. Computational Hardware and Software
We implemented all the algorithms in Python and took advantage of image processing and deep learning libraries such as OpenSlide, PIL, OpenCV, and PyTorch. All the data including the WSI raw files, the temporary and results files, which take up about 2.5 TB, were stored on the hard drives of our local workstations.
The WSI segmentation and patching were performed on Intel Xeon CPUs (central processing units) on our local workstation. For a typical slide of size in our datasets, the tissue segmentation from the thumbnail takes only 0.14 s. 46,233 patches can be extracted from this slide, and the patching takes about 0.61 s if only the patch coordinates are saved. If the patch images are saved, the patching takes about 6 min. The SimSiam model training was accelerated by utilizing two NVIDIA Tesla V100 for PCIe GPUs (graphic processing units) with 32 GB memory on each card. Training a model takes about 7 to 9 days. Both patch feature extraction and CLAM training/prediction were accelerated by two NVIDIA TITAN V GPUs with 12 GB memory on each card. For a typical slide, it takes about 65 s for feature extraction. The five-fold cross-validation of the CLAM model on the combined dataset takes about 35 h. Once the training is completed, the slide label prediction can be made in an instant.
3. Results
We validated the PCNSL/GBM differentiation of our approach on each individual and the combined datasets. Based on the same dataset splitting in the cross-validation of CLAM, we compared the performance of the JiNan-specific, institution-specific, and multi-institution feature extractors described in Sec. 2.5. Besides, we also compared our feature extractors trained through SimSiam with the clustering-guided contrastive learning (CCL)-based feature extractor pretrained on large-scale unlabeled histopathological images.46 The CCL-based feature extractor also used ResNet50 as the backbone model.
The classification results of the PCNSL/GBM WSIs were measured by the area under the receiver operating characteristic curve (AUC) value and the accuracy (AC). Table 2 lists the overall classification performance measured by the AUC and AC and the classification AC of the PCNSL and GBM on each and the combined datasets using different feature extractors. By comparing the performance of the JiNan-specific and institution-specific feature extractors, it is demonstrated that for most metrics, especially for the discrimination accuracy of the PCNSL, training more specific feature extractors for different datasets produces better classification. The results also show that the institution-specific, multi-institution, and CCL-based feature extractors perform comparably with respective strengths in terms of certain metrics. We can find that the CCL-based feature extractor performs best in differentiating the PCNSL but worst in GBM while applying the multi-institution feature extractor generates the best discrimination in GBM in most cases but worst in PCNSL. By contrast, using the institution-specific feature extractors generally obtains the best overall classification AUC and AC with both the PCNSL and GBM classification accuracies reaching 80% for all three datasets. Regardless of the feature extractor used, for each dataset, the overall classification AUC exceeds 85% and AC is close to or exceeds 80%. When evaluating the combined dataset, employing either the multi-institution or the CCL-based feature extractor produces the overall AUC and AC close to 95% and 90%, respectively. The best classification results are achieved for the JiNan dataset, whichever feature extractor is used, probably because it has much more slides compared with the other two datasets and is more homogeneous than the combined dataset. For the JiNan, SiChuan, and the combined datasets, the classification AC of the GBM is much higher than that of the PCNSL, most probably owing to the severe data imbalance between the two classes.
Table 2.
PCNSL/GBM classification results on each and the combined datasets using different feature extractors.
| Feature extractor | Dataset | AUC | AC | PCNSL_AC | GBM_AC |
|---|---|---|---|---|---|
| Institution-specific | JiNan | 0.976 ± 0.011 | 0.940 ± 0.011 | 0.840 ± 0.075 | 0.968 ± 0.008 |
| QingDao | 0.854 ± 0.004 | 0.809 ± 0.009 | 0.817 ± 0.017 | 0.800 ± 0.000 | |
| SiChuan | 0.934 ± 0.026 | 0.871 ± 0.014 | 0.800 ± 0.100 | 0.900 ± 0.020 | |
| JiNan-specific | QingDao | 0.838 ± 0.002 | 0.764 ± 0.014 | 0.735 ± 0.015 | 0.800 ± 0.050 |
| SiChuan | 0.914 ± 0.042 | 0.871 ± 0.014 | 0.750 ± 0.150 | 0.920 ± 0.040 | |
| Multi-institution | JiNan | 0.972 ± 0.007 | 0.923 ± 0.047 | 0.781 ± 0.127 | 0.976 ± 0.015 |
| QingDao | 0.865 ± 0.025 | 0.787 ± 0.031 | 0.736 ± 0.056 | 0.850 ± 0.000 | |
| SiChuan | 0.868 ± 0.096 | 0.829 ± 0.029 | 0.800 ± 0.100 | 0.840 ± 0.000 | |
| Combined | 0.940 ± 0.046 | 0.897 ± 0.048 | 0.778 ± 0.087 | 0.951 ± 0.023 | |
| CCL-based | JiNan | 0.966 ± 0.019 | 0.909 ± 0.038 | 0.852 ± 0.145 | 0.949 ± 0.047 |
| QingDao | 0.891 ± 0.016 | 0.809 ± 0.009 | 0.838 ± 0.038 | 0.775 ± 0.025 | |
| SiChuan | 0.872 ± 0.028 | 0.829 ± 0.029 | 0.800 ± 0.100 | 0.840 ± 0.080 | |
| Combined | 0.954 ± 0.033 | 0.904 ± 0.027 | 0.808 ± 0.109 | 0.947 ± 0.030 |
Each value represents the average and standard deviation within folds.
The value shown in bold indicates the best performance obtained across various feature extractors for the corresponding dataset with respect to the corresponding metric.
Figures 7 and 8 illustrate the attention heatmaps generated for the PCNSL and GBM slides of three datasets and representative patches from the selected region with the highest and lowest attention scores. The three slide-level attention heatmaps were produced by applying the institution-specific, multi-institution, and CCL-based feature extractors, respectively, for training the CLAM models on the corresponding dataset. It is shown that for each slide, the three heatmaps demonstrate generally consistent highly and slightly attended areas, which indicates that the models using the three different feature extractors produced consistent results. For both PCNSL and GBM, the hyperchromatic areas which contain mainly dense tumor cells are the most highly attended, whereas the relatively hypochromatic areas that are composed of neurofibrous stroma, blood vessels, hemorrhage, artifacts, necrosis, etc., receive the lowest attention. This indicates that our models primarily relied on the characteristics of the tumor cells to differentiate the PCNSL and GBM. This is understandable because the models were trained on the patches at the highest magnification, and the features at the cell level were more likely to be captured than those at the tissue level. From the representative patches with high attention scores displayed in the figures, we can see that the PCNSL cells have monotonous nuclei and high nuclear-to-cytoplasmic ratios, whereas the GBM cells exhibit high degrees of cellular pleomorphism and nuclear atypia.
Fig. 7.
Attention heatmaps and representative patches generated for example PCNSL slides of three sites. For each site, the top row shows the WSI thumbnail with the segmentation contours and three slide attention heatmaps respectively generated by the models trained on the corresponding dataset by applying the institution specific (the 2nd column), the multi-institution (the 3rd column) and the CCL-based (the 4th column) feature extractors. The slide heatmaps were produced by computing the attention scores for PCNSL over patches tiled with a spatial overlap of 50%, where the high-ranking and low-ranking patches are indicated in red and blue colors, respectively. To see more details, a representative region labeled by the black box in both the slide thumbnail and the heatmap (the 3rd column) is enlarged (the 5th column). Its corresponding regional attention heatmap (the 6th column) was generated using a 95% overlap and overlaid onto the original image. The bottom row displays the representative patches from the selected region with the highest and lowest attention scores indicated by the red and blue borders, respectively.
Fig. 8.
Attention heatmaps and representative patches generated for example GBM slides of three sites. Please refer to the caption of Fig. 7 for interpretation.
4. Discussion
Owing to the differences in scanners and staining protocols used for collecting the multi-site histopathology images, the classification model trained on one dataset did not perform comparably when directly transferred to other datasets. Therefore, we trained the models separately for each site and the combined multi-site datasets. To investigate the effect of the feature extraction on the final prediction, we compared the results generated by applying the institution-specific, multi-institution, and CCL-based feature extractors. Quantitative and qualitative results have shown that the models performed comparably and achieved satisfactory and interpretable classification.
Although our models have achieved good overall performance in all experiments, the misclassification is still non-negligible which is caused by several reasons. First of all, the PCNSL/GBM differentiation task itself is very challenging as explained before. A non-specialist without adequate training can hardly make the differentiation simply by examining the WSIs. Even the pathologists need to verify their diagnosis by conducting immunohistochemical analysis. It is expected that for deep learning models, it is also difficult to learn the essential differentiation characteristics from a huge amount of image content, especially considering that only the WSI labels are given for supervising the model training. Second, our datasets are relatively small for training deep learning models and severe data imbalance exists in the datasets due to the difficulty in histology image collection and the low PCNSL incidence rate. Moreover, it has to be noted that because all slides used in our experiments are taken from the postoperative FFPE samples, only the PCNSL cases which are misdiagnosed in all previous examinations, such as radiological imaging, pre-operative biopsy, and intra-operative frozen section examination, could be collected for our study. Compared with other feature extractors pretrained on thousands of WSIs,46 only a small number of slides were selected for the self-supervised learning of our institution-specific and multi-institution feature extractors, which might be insufficient to capture the histopathological diversity of PCNSL and GBM. Furthermore, the datasets for training the classification models were also relatively small and severely imbalanced. These issues seriously affect the classification performance. In addition, some slides contain only a small amount of tissue and the artifacts within the tissue are difficult to remove in the preprocessing step. These problems also have an effect on the differentiation.
In this work, the attention heatmaps generated by the classification model improve the model interpretability. In the future, for the PCNSL/GBM differentiation, more specific and sophisticated cellular features, tissue patterns, and slide appearances have to be annotated by the pathologists to enhance the model explainability.47 In terms of the methodology, we would like to investigate the domain generalization and stain normalization algorithms, which potentially benefit the domain adaptation in the multi-site study. The histopathological images can also be fused with radiological data such as the multi-parametric magnetic resonance images and the medical reports if available for training a more robust model. Besides, hierarchical attention mechanisms can be employed to focus on not only the discriminant patches but also the areas within the patches, which further enhances the model’s interpretability.
In summary, we investigated the computer-aided pathological differentiation of PCNSL and GBM using WSIs taken from the postoperative FFPE samples. The results generated for each site and the combined multi-site datasets have demonstrated that our approach has achieved excellent performance on the PCNSL/GBM differentiation task. The generated attention heatmaps enhanced the model interpretability and could provide the pathologists with additional diagnostic information. Our study has established a good foundation for the pathological differentiation of the PCNSL and GBM. By applying some domain adaptation algorithms, the approach can potentially be transferred to differentiate the PCNSL/GBM slides of the stereotactic biopsy samples and the intra-operative frozen sections, which are more challenging due to the limited size of the samples and the poor quality in histomorphology but more meaningful with regard to avoiding unnecessary craniotomy and making surgical strategies.
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (Grant No. 82473485) and the Taishan Scholars Program of Shandong Province (Grant No. tstp20231252). We would like to acknowledge Dr. Shibing Guan and Kun Wang for their valuable suggestions and assistance in revising the paper.
Biographies
Liping Wang is a lecturer at the School of Information Science and Engineering, Shandong Normal University. Her research interest lies in artificial intelligence in medical image analysis.
Lin Chen graduated from the Southwest Medical University of China. He is a resident neurosurgeon working at Provincial Hospital Affiliated to Shandong First Medical University and now is a PhD student supervised by Prof. Yingchao Liu. His research interest is neuro-pathology and deep learning applications in clinical medicine.
Kaixi Wei was a postgraduate at the Department of Neurosurgery of the Affiliated Hospital of Southwest Medical University. His research interests involve intracranial tumor, neuro-functional disease, and cerebral hemorrhage. He is currently working at Hejiang County Traditional Chinese Medicine Hospital.
Huiyu Zhou heads the AI and Machine Learning Group and leads the Biomedical Image Processing Lab at the University of Leicester, United Kingdom. He is the PGR director and deputy director of the Research Center for Artificial Intelligence, Data Analytics, and Modeling. Prior to this appointment, he was employed as a reader at the University of Leicester and a lecturer at the School of Electronics, Electrical Engineering, and Computer Science, Queen’s University Belfast. He has published widely in the field.
Reyer Zwiggelaar’s research interest concentrates on the development of computer vision and machine learning techniques applied to medical image analysis. He has published over 350 publications in JCR journals and peer-reviewed conferences. According to Google Scholar, he has an h-index equal to 38. He has attracted more than £5,000,000 in research funding, mainly as the principal investigator, from RCUK/HEFCW. He is an associate editor of the IEEE Journal of Biomedical and Health Informatics and Pattern Recognition.
Weiwei Fu, MD, is an associate deputy chief physician in the Department of Pathology, Affiliated Hospital of Qingdao University. She has been engaged in pathological diagnosis for nearly 20 years. Her current research interests are mainly in areas of neuropathology.
Yingchao Liu is a chief neurosurgeon in the Provincial Hospital Affiliated to Shandong First Medical University in Jinan, China. Besides the daily neurosurgical operation, he has also actively engaged in the field of developing brain functional MRI and Gama-knife radiosurgery treatment since 2013 and has published more than 40 peer-reviewed research articles in journals such as the nature biomedical engineering, radiology, and neuro-oncology related to the glioma, brain metastasis, and functional MRI.
Contributor Information
Liping Wang, Email: wangliping19872011@gmail.com.
Lin Chen, Email: chenlin3589@163.com.
Kaixi Wei, Email: 1370077136@qq.com.
Huiyu Zhou, Email: hz143@leicester.ac.uk.
Reyer Zwiggelaar, Email: rrz@aber.ac.uk.
Weiwei Fu, Email: eer-df@163.com.
Yingchao Liu, Email: yingchaoliu@email.sdu.edu.cn.
Disclosures
The authors declare that they have no conflict of interest.
Ethics Approval
This research study was conducted retrospectively from data obtained for clinical purposes. Ethical approval was waived by the Institutional Review Board (IRB) in view of the retrospective nature of the study.
Consent to Participate
This study was performed on histopathological slides, where the participants’ information had been anonymized and the paper does not include any images that may identify the participants. In this case, the consent is not required according to the guidelines of the journal.
Code and Data Availability
The data are not currently accessible to the public due to their use in an ongoing study. It is anticipated that the interested parties can obtain the dataset from the corresponding author upon reasonable request once the study is completed. The code can be shared upon request, please contact Liping Wang at wangliping19872011@gmail.com.
References
- 1.Ostrom Q. T., et al. , “CBTRUS statistical report: primary brain and other central nervous system tumors diagnosed in the United States in 2012–2016,” Neuro Oncol. 21 (Suppl. 5): v1–v100 (2019). 10.1093/neuonc/noz150 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Grommes C., DeAngelis L. M., “Primary CNS lymphoma,” J. Clin. Oncol. 35, 2410–2418 (2017). 10.1200/JCO.2017.72.7602 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Stupp R., et al. , “Radiotherapy plus concomitant and adjuvant temozolomide for glioblastoma,” N. Engl. J. Med. 352(10), 987–996 (2005). 10.1056/NEJMoa043330 [DOI] [PubMed] [Google Scholar]
- 4.Hoang-Xuan K., et al. , “Diagnosis and treatment of primary CNS lymphoma in immunocompetent patients: guidelines from the European Association for Neuro-Oncology,” Lancet Oncol. 16(7), e322 (2015). 10.1016/S1470-2045(15)00076-5 [DOI] [PubMed] [Google Scholar]
- 5.Weller M., et al. , “EANO guideline for the diagnosis and treatment of anaplastic gliomas and glioblastoma,” Lancet Oncol. 15(9), e395–e403 (2014). 10.1016/S1470-2045(14)70011-7 [DOI] [PubMed] [Google Scholar]
- 6.Commins D. L., “Pathology of primary central nervous system lymphoma,” Neurosurgical Focus 21(5), E2 (2006). 10.3171/foc.2006.21.5.3 [DOI] [PubMed] [Google Scholar]
- 7.Cha Y. J., Choi J., Kim S. H., “Presence of apoptosis distinguishes primary central nervous system lymphoma from glioblastoma during intraoperative consultation,” Clin. Neuropathol. 37(05), 105 (2018). 10.5414/NP301075 [DOI] [PubMed] [Google Scholar]
- 8.Figarella-Branger D., et al. , “Morphological classification of glioblastomas,” Neuro-chirurgie 56(6), 459–463 (2010). 10.1016/j.neuchi.2010.07.014 [DOI] [PubMed] [Google Scholar]
- 9.Wirsching H. G., Weller M., “Glioblastoma. Malignant brain tumors: state-of-the-art treatment,” pp. 265–288, 2017.
- 10.Puchalski R. B., et al. , “An anatomic transcriptional atlas of human glioblastoma,” Science 360(6389), 660–663 (2018). 10.1126/science.aaf2666 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Xu C., et al. , “A survey on the attitudes of Chinese medical students towards current pathology education,” BMC Med. Educ. 20(1), 259 (2020). 10.1186/s12909-020-02167-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Malikova H., et al. , “Can morphological MRI differentiate between primary central nervous system lymphoma and glioblastoma?,” Cancer Imaging 16(1), 40 (2016). 10.1186/s40644-016-0098-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Kickingereder P., et al. , “Primary central nervous system lymphoma and atypical glioblastoma: multiparametric differentiation by using diffusion-, perfusion-, and susceptibility-weighted MR imaging,” Radiology 272(3), 843–850 (2014). 10.1148/radiol.14132740 [DOI] [PubMed] [Google Scholar]
- 14.Nakagawa M., et al. , “Machine learning based on multi-parametric magnetic resonance imaging to differentiate glioblastoma multiforme from primary cerebral nervous system lymphoma,” Eur. J. Radiol. 108, 147–154 (2018). 10.1016/j.ejrad.2018.09.017 [DOI] [PubMed] [Google Scholar]
- 15.Xia W., et al. , “Multiparametric–MRI–based radiomics model for differentiating primary central nervous system lymphoma from glioblastoma: development and cross-vendor validation,” J. Magn. Reson. Imaging 53(1), 242–250 (2021). 10.1002/jmri.27344 [DOI] [PubMed] [Google Scholar]
- 16.Xia W., et al. , “Deep learning for automatic differential diagnosis of primary central nervous system lymphoma and glioblastoma: multi-parametric magnetic resonance imaging based convolutional neural network model,” J. Magn. Reson. Imaging 54(3), 880–887 (2021). 10.1002/jmri.27592 [DOI] [PubMed] [Google Scholar]
- 17.Han Y., et al. , “Differentiation between primary central nervous system lymphoma and atypical glioblastoma based on MRI morphological feature and signal intensity ratio: a retrospective multicenter study,” Front. Oncol. 12, 71 (2022). 10.3389/fonc.2022.811197 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Zhang X., et al. , “A multicenter proof-of-concept study on deep learning-based intraoperative discrimination of primary central nervous system lymphoma,” Nat. Commun. 15(1), 3768 (2024). 10.1038/s41467-024-48171-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Lu M. Y., et al. , “Data-efficient and weakly supervised computational pathology on whole-slide images,” Nat. Biomed. Eng. 5(6), 555–570 (2021). 10.1038/s41551-020-00682-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Coudray N., et al. , “Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning,” Nat. Med. 24(10), 1559–1567 (2018). 10.1038/s41591-018-0177-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kather J. N., et al. , “Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer,” Nat. Med. 25(7), 1054–1056 (2019). 10.1038/s41591-019-0462-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Li D., et al. , “A deep learning diagnostic platform for diffuse large B-cell lymphoma with high accuracy across multiple hospitals,” Nat. Commun. 12(1), 6004 (2020). 10.1038/s41467-020-19817-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Maron O., Lozano-Pérez T., “A framework for multiple-instance learning,” Adv. Neural Inf. Process. Sys. 10, 570–576 (1997). [Google Scholar]
- 24.Kalra S., et al. , “Pay attention with focus: a novel learning scheme for classification of whole slide images,” in Proc. 24th Int. Conf. Med. Image Comput. Comput. Assisted Interv., Part VIII 24, pp. 350–359. Strasbourg, France, September 27-October 1, 2021. Springer International Publishing. [Google Scholar]
- 25.Li W., et al. , “Patch transformer for multi-tagging whole slide histopathology images,” in Proc. 22nd Int. Conf. Med. Image Comput. Comput. Assisted Intervention, Part I 22, pp. 532–540. Shenzhen, China, October 13-17, 2019. Springer International Publishing. [Google Scholar]
- 26.Su Z., et al. , “Attention2majority: weak multiple instance learning for regenerative kidney grading on whole slide images,” Med. Image Anal. 79, 102462 (2022). 10.1016/j.media.2022.102462 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Shao Z., et al. , “TransMIL: transformer based correlated multiple instance learning for whole slide image classification,” Adv. Neural Inf. Process. Sys. 34, 2136–2147 (2021). [Google Scholar]
- 28.Campanella G., et al. , “Clinical-grade computational pathology using weakly supervised deep learning on whole slide images,” Nat. Med. 25(8), 1301–1309 (2019). 10.1038/s41591-019-0508-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Gao Z., et al. , “Instance-based vision transformer for subtyping of papillary renal cell carcinoma in histopathological image,” in Proc. 24th Int. Conf. Med. Image Comput. Comput. Assisted Interv., Part VIII 24, pp. 299–308. Strasbourg, France, September 27-October 1, 2021. Springer International Publishing. [Google Scholar]
- 30.Zhang J., et al. , “A joint spatial and magnification based attention framework for large scale histopathology classification,” in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recognit. Workshops, pp. 3776–3784, 2021. [Google Scholar]
- 31.Chen T., et al. , “A simple framework for contrastive learning of visual representations,” in Proc. 37th Int. Conf. Machine Learning, PMLR, 119, 1597–1607, November, 2020. [Google Scholar]
- 32.Chen X., He K., “Exploring simple siamese representation learning,” in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recognit., pp. 15750–15758, 2021. [Google Scholar]
- 33.Huang Z., et al. , “Integration of patch features through self-supervised learning and transformer for survival analysis on whole slide images,” in Proc. 24th Int. Conf. Med. Image Comput. Comput. Assisted Interv., Part VIII 24, pp. 561–570. Strasbourg, France, September 27-October 1, 2021. Springer International Publishing. [Google Scholar]
- 34.Koohbanani N. A., et al. , “Self-path: self-supervision for classification of pathology images with limited annotations,” IEEE Trans. Med. Imaging, 40(10), 2845–2856 (2021). 10.1109/TMI.2021.3056023 [DOI] [PubMed] [Google Scholar]
- 35.Schirris Y., et al. , “DeepSMILE: contrastive self-supervised pre-training benefits MSI and HRD classification directly from H&E whole-slide images in colorectal and breast cancer,” Med. Image Anal. 79, 102464 (2022). 10.1016/j.media.2022.102464 [DOI] [PubMed] [Google Scholar]
- 36.Chen R. J., et al. , “Scaling vision transformers to gigapixel images via hierarchical self-supervised learning,” in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recognit., pp. 16144–16155, 2022. [Google Scholar]
- 37.Wang X., et al. , “Transpath: transformer-based self-supervised learning for histopathological image classification,” in Proc. 24th Int. Conf. Med. Image Comput. Comput. Assisted Interv., Part VIII 24, pp. 186–195. Strasbourg, France, September 27-October 1, 2021. Springer International Publishing. [Google Scholar]
- 38.Wang X., et al. , “Transformer-based unsupervised contrastive learning for histopathological image classification,” Med. Image Anal. 81, 102559 (2022). 10.1016/j.media.2022.102559 [DOI] [PubMed] [Google Scholar]
- 39.Liu H., et al. , “Self-supervised learning is more robust to dataset imbalance,” in Proc. 10th Int. Conf. Learning Represent., 2022. [Google Scholar]
- 40.Grill J. B., et al. , “Bootstrap your own latent-a new approach to self-supervised learning,” Adv. Neural Inf. Process. Sys. 33, 21271–21284 (2020). [Google Scholar]
- 41.Caron M., et al. , “Unsupervised learning of visual features by contrasting cluster assignments,” Adv. Neural Inf. Process. Sys. 33, 9912–9924 (2020). [Google Scholar]
- 42.He K., et al. , “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 770–778, 2016. [Google Scholar]
- 43.Tellez D., et al. , “Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology,” Med. Image Anal. 58, 101544 (2019). 10.1016/j.media.2019.101544 [DOI] [PubMed] [Google Scholar]
- 44.Faryna K., van der Laak J., Litjens G., “Tailoring automated data augmentation to H&E-stained histopathology,” Medical Imaging with Deep Learning, PMLR, pp. 168–178, August, 2021. [Google Scholar]
- 45.Cubuk E. D., et al. , “Randaugment: practical automated data augmentation with a reduced search space,” in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recognit. Workshops, pp. 702–703, 2020. [Google Scholar]
- 46.Wang X., et al. , “RetCCL: clustering-guided contrastive learning for whole-slide image retrieval,” Med. Image Anal. 83, 102645 (2023). 10.1016/j.media.2022.102645 [DOI] [PubMed] [Google Scholar]
- 47.Tavolara T. E., et al. , “One label is all you need: interpretable AI-enhanced histopathology for oncology,” Semin. Cancer Biol. 97, 70–85 (2023). 10.1016/j.semcancer.2023.09.006 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data are not currently accessible to the public due to their use in an ongoing study. It is anticipated that the interested parties can obtain the dataset from the corresponding author upon reasonable request once the study is completed. The code can be shared upon request, please contact Liping Wang at wangliping19872011@gmail.com.








