Abstract
Medical image processing and analysis on whole slide imaging (WSI) are notoriously difficult due to its giga-pixel high-resolution nature. Multiplex immunofluorescence (MxIF), a spatial single-cell level iterative imaging technique that collects dozens of WSIs on the same histological tissue, makes the data analysis an order of magnitude more complicated. The rigor of downstream single-cell analyses (e.g., cell type annotation) depends on the quality of the image processing (e.g., multi-WSI alignment and cell segmentation). Unfortunately, the high-resolutional and high-dimensional nature of MxIF data prevent the researchers from performing comprehensive data curations manually, thus leads to misleading biological findings. In this paper, we propose a learning based MxIF quality score (MxIF Q-score) that integrates automatic image segmentation and single-cell clustering methods to conduct biology-informed MxIF image data curation. To the best of our knowledge, this is the first study to provide an automatic quality assurance score of MxIF image alignment and segmentation from an automatic and biological knowledge-informed standpoint. The proposed method was validated on 245 MxIF image regions of interest (ROIs) from 49 WSIs and achieved 0.99 recall and 0.86 precision when compared with manual visual check on spatial alignment validation. We present extensive experimental results to show the efficacy of the Q-score system. We conclude that a biological knowledge driven scoring framework is a promising direction of assessing the complicated MxIF data.
Keywords: MxIF, Data curation, scRNA-seq, Quality assurance
1. Introduction
Crohn’s disease (CD), leads to chronic, relapsing and remitting bowel inflammation [3], with high prevalence [5]. To study such disease, we acquired formalin-fixed paraffin-embedded tissues from the terminal ileum (TI) and ascending colon (AC), followed by Multiplexed Immunofluorescence (MxIF) and imaging [2]. The MxIF provides a unique opportunity to understand cell composition, functional state, cell to cell mapping, cell distribution, and protein expression profiles as a function of the spatial and molecular information about the inflammation associated with CD.
MxIF performs iterative staining and imaging using dozens of protein markers on a whole slide tissue section to investigate spatial and single-cell level cellular types and function. Our MxIF acquisition pipeline starts with staining cell nuclei using DAPI for each round. All DAPI images were computationally co-registered, allowing each rounds of fluorescent images to be overlaid. The registered structural markers are projected and merged together to supply complimentary cellular structure information for the further cell segmentation tasks. However, the unprecedented information obtained via MxIF is accompanied by challenges for imaging due to the image alignment or registration, segmentation, and stains quality. These variabilities may introduce biases to downstream cell type annotation tasks. Manual/visual quality control (QC) on all aligned marker images and cell segmentation has been regarded as the de facto standard for MxIF dataset curation. However, such curation procedures are highly subjective and resource-intensive at the cellular level. Figure 1 shows the current QC process on a MxIF sample with selective structural markers. The manual contours were placed on -catenin. Unfortunately, there is no existing objective and quantitative metrics to evaluate (1) the quality of spatial alignment, and (2) the quality of segmentation, beyond visual inspection. Thus, there is an urgent need to have automatic and quantitative metrics for the QC on high-resolution high-dimentionality MxIF data.
Fig. 1.
Here, we present an extreme case that in micron level (500 μm), we can see two DAPI images co-register not well. The membrane projection are generated used -catenin, panck, nakatpase images. After zooming in, we can observe a bow severe of the mis-alignment, which lead bias for a sample cell segmentation. Ideally, each marker images are supposed to be co-registered together. The yellow contour are manually annotated on the PanCK. (Color figure online)
There are a few related research works focused on solving the quality control of digital pathology images in terms of image quality, quantitative metrics of automatic pipeline, and benchmarking. Dima et al. proposed a method that quantifies cell edge character and estimates how accurately a fluorescence microscopy images of cells segmentation algorithm would perform [6]. Yu et al. provided a measurement to detect or avoid cell cross-contamination for cell line annotation [17]. Feng et al. developed a method to assess quality of synthetic fluorescence microscopy images with quality metrics to assess fluorescence microscopy images synthesized by different methods [7]. Janowczyk et al. proposed HistoQC, to identify image artifacts and compute quantitative metrics describing visual attributes of WSIs to the Nephrotic Syndrome Study Network digital pathology repository [8]. Kose et al. introduced a quality assurance to assess the lesional area by using reflectance confocal microscopy images and coregistered dermoscopy images together [9]. However, none of the above works were trying to provide image processing quality control starting from the usefulness of the fluorescence images, specifically, the cell type annotation for the MxIF images.
In this paper, we propose a scoring framework, MxIF Q-score that integrates an automatic image segmentation pipeline and a single-cell clustering method to conduct MxIF image processing quality guidance. The contribution of this paper is three-fold:
An biological knowledge-informed metrics, MxIF Q-score, is proposed for an objective and automatic QC on high-resolution and high-dimensionality MxIF data.
Comprehensive experiments on different MxIF Q-score thresholds have been conducted that demonstrate the usefulness analysis of the MxIF Q-score framework.
To our knowledge, MxIF Q-score is the first quantitative and objective metrics to meet the emergent needs of curating MxIF data.
2. Methods
In the current workflow, we typically need to visually inspect cellular level overlays of dozens of markers. Such a quality check is accurate but unscalable. The proposed MxIF Q-score aims to alleviate the currently resource-intensive QC process via biological knowledge informed single cell clustering.
2.1. Q-score for Spatial Alignment
The first step of Q-score is to inspect the quality of spatial alignment score . We hypothesize that a coarse nuclei segmentation on DAPI channels can be served as reference standard because well aligned DAPI image should generate consistent nuclei segmentation overlay. Herein, we aim to use a Q-score spatial alignment to filter out stain samples with misalignment issues. We setup the Ilastik to automatically perform nuclei segmentation for each DAPI images. Then we compare each segmentation output to the first round DAPI segmentation mask using Dice similarity coefficient (DSC). If group-wise DSC is smaller than selective DSC threshold , then is set as true.
2.2. Making Reference Data for Q-score Cell Clustering
Broadly, we can use MxIF marker images to classify the WSI into epithelial, stromal, and immune compartments. A more detailed cell type can be defined as fibroblasts, proliferating, stem goblet, endocrine, leukocyte, myeloid, monocytic and lymphocytic cell. For MxIF cell type annotation analysis, once the cell segmentation task is completed, a common practice is to extract cell image intensity features across all marker images, apply clustering method and run cell type annotation processing [4,14]. In this work, we hypothesize that if the quality of stains is acceptable, we could use nuclei segmentation to generate adequate features to classify cells into two broad groups: epithelial and stromal/immune. Thus, we first drive a cell type annotation for using a reference data, and then employ the reference data to drive the grading mechanism. If the stain is problematic, the cell clustering would fail to map to the reference data without any cell type being assigned, which is the key design criteria for Q-score on cell clustering. The full workflow to make a reference data is illustrated in Fig. 2.
Fig. 2.
The workflow of creating a reference data. Briefly, we do cell segmentation on a reference patch, and create a reference cell type matrix using marker gene expression and total number of target clusters (epithelial, stromal, immune). The reference matrix is then utilized by in-coming testing MxIF data for identifying cell type with an automated single cell annotation pipeline.
Construct the Reference MxIF ROI.
We found an epithelial region (500 μm), ran z-stack max projection on –catenin, NaKATPase, PanCK, and Vimentin in the same ROI [12,13], and merged the stacked membrane images with the DAPI channel. We utilized the Ilastik random forest pixel classification model (the model was trained interactively by a domain expert using eight -Actinin patches (at 50 μm) with partial manually traced labels)) to process the merged image and generate the membrane and nuclei segmentation masks [16]. Next, we overlayed the nuclei segmentation masks of ten markers (epithelial group: BetaCatenin, NaKATPase, PanCK; stromal group: Vimentin and SMA; immune group: CD3D, CD4, CD8, CD68 and CD11B) to build up nuclei-based mean intensity features.
Reference ROI Cell Clustering and Annotation.
We modeled cell states as neighborhoods on a K-nearest neighbor (KNN) graph-based approach for clustering using resolution = 0.3, where cells are embedded in a graph structure, and then partitioned into highly connected communities. Then, we engaged scMRMA pipeline to map clusters to a collection of average marker gene expression features to determine cell types [10]. Let cell type be a collection of marker genes . Then the average expression of gene in the cluster is defined as . For a given cluster cell type at cluster , scMRMA calculates an cell type activity score by 1 each cluster’s cell type by
| (1) |
where is a constant factor to adjust the score for the total number of markers in each cell type, is the weights of marker ,
| (2) |
where is the frequency of a marker .
To train the reference annotation model, we empirically integrated nine makers (-catenin as mentioned above) and yielded reference data into six such groups, three epithelium groups and three immune groups. One of the immune groups was empirically assigned as a stromal group due to lack of stromal signal in the reference ROI. The whole reference matrix is then applied log-normalization for further cell annotation. Finally, a seventh group would be marked as ‘unassigned’ by low expression other than six cell groups.
2.3. Q-score for Cell Clustering
For a given MxIF ROIs, we retrieve post stats from any cell or nuclei segmentation approaches and process marker feature matrix with same KNN clustering method. To annotate cell type, we applied a canonical automatic scRNA-seq annotation tool, singleR [1], to assign the cell type. A p-value is computed for cell/nuclei on each group, the cell/nuclei would fall on a specific group if p − value < 0.05. Otherwise the cell/nuclei would be marked as unassigned. The overall Q-score grading workflow for cell clustering is demonstrated in Fig. 3, which is biologically informed with multiple thresholds mainly based on (1) the popularity of local unassigned cells threshold , (2) local unassigned cells threshold and (3) cell types conflicts between epithelial cells and immune/stromal cells threshold . The workflow contains a local warning number, which will be added by any of above three threshold conditions. By the end of workflow, if the warning is larger than the warning threshold , then will be set as True. In summary, the overall binary Q-score is modeled as Eq. 3.
Fig. 3.
The Q-score grading workflow for cell clustering is biological informed with multiple thresholds at local cluster level and global cluster level, basing on (1) the popularity of unassigned cells and (2) cell types conflicts between epithelial cells and immune/stromal cells.
| (3) |
3. Data and Experimental Setting
49 sample biopsies have been collected from 20 CD patients and 5 healthy controls across three batches of imaging, where 24 tissues are from the terminal ileum and 25 tissues are from the ascending colon (paired samples from each patient). The sites and disease spectrum that is employed in this study is so far the most representative MxIF study for gastrointestinal tract. The 49 slides have scores that range from normal, quiescent, mild, moderate, and severe, which are all from CD patients. The selective MxIF markers for MxIF Q-score were stained in the following order - DAPI(every round if following markers are included), CD45, CD11B, -atenin, HLA-A, CD4, PanCK, CD3D, CD8, NaKATPase, CD68, Vimentin, ERBB2, and SMA with group-wise linear normalization. We computed the tissue masks that covered the tissue pixels that contained all markers across all staining rounds to ensure effective learning. Then, for experiment design purposes, and to ensure the generalizability of cell clustering, for each sample, we visually selected five ROIs with size of 500 μm that with obvious view of a group of epithelial, immune, and stromal cell groups when overlaying relevant markers images into different color channels. In summary, there are in total of 245 ROIs available to be graded by MxIF Q-score.
To validate Q-score spatial alignment , we used the same segmentation model as defined in Sect. 2.2. We processed all rounds of DAPI images without merging any structural marker images. The goal is to get coarse nuclei segmentation to verify DAPI images overlapping, the effectiveness of cell separation is not considered. The manual alignment check of the registration is served as the ground truth.
To validate Q-score cell clustering , due to lack of the cell membrane true labels, we take a step back, rather than validating the accuracy of individual segmentation pipelines, we aim to provide nuclei manual labels to serve as the reference truth for cell count tasks to rank the performance order of the different segmentation approaches. Then, the logic of Q-score cell clustering validation is to see if Q-score can build strong grade ranking order correlation with the reference truth cell count order of different segmentation methods. Although is a binary value, we would use local warning value to rank the segmentation performance order.
For each 500 μm ROI patch, we reviewed DAPI image and searched easy nuclei annotation on epithelial region at the scale of 25 μm. Two researchers manually annotated the nuclei on separate sets of ROIs. If cell count was same, we used the DSC metric to rank the segmentation pipeline. We used the C-index score to verify the ranking competence matching performance [15]. Two nuclei segmentation pipelines were chosen to generate cell features, again, one was the model as described in section [15]. Another Ilastik based segmentation pipeline was chosen with more sophisticated morphological post-processing steps to reconstruct membrane lines and identify the cell separation [11]. We only used nuclei masks for the cell count and cell feature selection.
4. Results
Q-score spatial alignment classification results based on batch-based, group-based DSC value as shown in Table 1. When is 0.5, the recall is 0.99 with precision 0.858. Figure 4 shows cellular level selective markers alignment using different Q-score threshold. The lowest batch based mean DSC threshold yields higher accuracy and F1 score across all 245 testing ROIs. We empirically set equals from 1 (single cluster fail pass threshold) to 1/3 of total number of clusters. Then we separately tuned , , and using three threshold together from setting threshold value from 0.05 to 0.2 with step of 0.01. All testing cases showed high C-Index = 1, which means the levels can predict consistent ranking order match segmentation cell count rank. If we set as 0.5, then there were 32 false positive samples that passed Q-Score .
Table 1.
classification performance using different threshold
| threshold value | Accuracy | AUC | Precision | Recall | F1 |
|---|---|---|---|---|---|
| 0.8 | 0.245 | 0.526 | 1.000 | 0.051 | 0.098 |
| 0.7 | 0.592 | 0.706 | 0.952 | 0.513 | 0.667 |
| 0.6 | 0.886 | 0.794 | 0.911 | 0.949 | 0.930 |
| 0.5 | 0.861 | 0.675 | 0.858 | 0.990 | 0.919 |
| 0.4 | 0.820 | 0.567 | 0.818 | 0.995 | 0.898 |
Fig. 4.
The qualitative Q-score spatial alignment results that filtering good and bad ROI using different thresholds. The yellow contours are manual labeled nuclei boundary are overlayed that aims to assist to identify the cell boundary alignment qualify across three membrane markers. (Color figure online)
For Q-Score , if ratio value is set as 0.05 to 0.2 and tuned , , individually, the could filter those questionable registration samples guided by visual check, with average miss classification rate (mean ± SD): , , , respectively. If we assign the same ratio to , , simultaneously, the average miss classification rate would be 0.05 ± 0.03 that significant different from tuning , , separately. Figure 5 shows sample’s marker gene expression matrix just for qualitative guidance about the accuracy of scoring. From visual inspection, we barely see significant marker gene expression in sub cluster 0 and 2 that explains the high number of unassigned cells.
Fig. 5.
A qualitative of cell clustering type result of a sample that Q-score , but the Q-score marked the sample as questionable. The cell type annotation use the reference data, the marker gene expression matrix is shown as qualitative guidance for the potential issue of the ROI, either due to segmentation or stains, which needs to be further validated.
5. Conclusion
In this paper, we propose a learning-based MxIF quality assurance tool (MxIF Q-score) that integrates automatic image segmentation and single-cell clustering methods to conduct biology-informed MxIF image data curation at whole slide scale. The results show that Q-score spatial alignment check is feasible to find the MxIF potential shift issues, and the Q-score cell clustering QC established strong correlation to rank different segmentation methods cell count performance. The framework has the potential usage to identify problem ROI within the stains, and we can also use Q-score cell clustering component to filter out questionable cells for further downstream analysis. Our next step is to provide a reference Q-score range and train a data driven learning model to train and test the scoring threshold and extract reference dataset in automatic manner.
Acknowledgements.
This research was supported by the Leona M. and Harry B. Helmsley Charitable Trust grant G-1903-03793 and G-2103-05128, NSF CAREER 1452485, NSF 2040462, and in part using the resources of the Advanced Computing Center for Research and Education (ACCRE) at Vanderbilt University. This project was supported in part by the National Center for Research Resources, Grant UL1 RR024975-01, and is now at the National Center for Advancing Translational Sciences, Grant 2 UL1 TR000445-06, the National Institute of Diabetes and Digestive and Kidney Diseases, the Department of Veterans Affairs I01BX004366, and I01CX002171. The de-identified imaging dataset(s) used for the analysis described were obtained from ImageVU, a research resource supported by the VICTR CTSA award (ULTR000445 from NCATS/NIH), Vanderbilt University Medical Center institutional funding and Patient-Centered Outcomes Research Institute (PCORI; contract CDRN-1306-04869). This work is supported by NIH grant T32GM007347 and grant R01DK103831.
References
- 1.Aran D, et al. : Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat. Immunol 20(2), 163–172 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Bao S, et al. : A cross-platform informatics system for the gut cell atlas: integrating clinical, anatomical and histological data. In: Medical Imaging 2021: Imaging Informatics for Healthcare, Research, and Applications, vol. 11601, pp. 8–15. SPIE (2021) [Google Scholar]
- 3.Baumgart DC, Sandborn WJ: Crohn’s disease. The Lancet 380(9853), 1590–1605 (2012) [Google Scholar]
- 4.Berens ME, et al. : Multiscale, multimodal analysis of tumor heterogeneity in idh1 mutant vs wild-type diffuse gliomas. PLoS ONE 14(12), e0219724 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Dahlhamer JM, Zammitti EP, Ward BW, Wheaton AG, Croft JB: Prevalence of inflammatory bowel disease among adults aged ≥ 18 years-united states, 2015. Morb. Mortal. Wkly Rep 65(42), 1166–1169 (2016) [Google Scholar]
- 6.Dima AA, et al. : Comparison of segmentation algorithms for fluorescence microscopy images of cells. Cytometry A 79(7), 545–559 (2011) [DOI] [PubMed] [Google Scholar]
- 7.Feng Y, Chai X, Ba Q, Yang G: Quality assessment of synthetic fluorescence microscopy images for image segmentation. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 814–818. IEEE; (2019) [Google Scholar]
- 8.Janowczyk A, Zuo R, Gilmore H, Feldman M, Madabhushi A: HistoQC: an open-source quality control tool for digital pathology slides. JCO Clin. Cancer Inform 3, 1–7 (2019) [Google Scholar]
- 9.Kose K, et al. : Utilizing machine learning for image quality assessment for reflectance confocal microscopy. J. Investig. Dermatol 140(6), 1214–1222 (2020) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Li J, Sheng Q, Shyr Y, Liu Q: scMRMA: single cell multiresolution marker-based annotation. Nucleic Acids Res. 50(2), e7–e7 (2022) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.McKinley ET, et al. : Machine and deep learning single-cell segmentation and quantification of multi-dimensional tissue images. BioRxiv, p. 790162 (2019)
- 12.McKinley ET, et al. : MIRIAM: a machine and deep learning single-cell segmentation and quantification pipeline for multi-dimensional tissue images. Cytometry Part A (2022)
- 13.McKinley ET, et al. : Optimized multiplex immunofluorescence single-cell analysis reveals tuft cell heterogeneity. JCI Insight 2(11) (2017) [Google Scholar]
- 14.Rashid R, et al. : Highly multiplexed immunofluorescence images and single-cell data of immune markers in tonsil and lung cancer. Sci. Data 6(1), 1–10 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Samplaski MK, Hernandez A, Gill IS, Simmons MN: C-index is associated with functional outcomes after laparoscopic partial nephrectomy. J. Urol 184(6), 2259–2263 (2010) [DOI] [PubMed] [Google Scholar]
- 16.Sommer C, Straehle C, Koethe U, Hamprecht FA: Ilastik: interactive learning and segmentation toolkit. In: 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pp. 230–233. IEEE (2011) [Google Scholar]
- 17.Yu M, et al. : A resource for cell line authentication, annotation and quality control. Nature 520(7547), 307–311 (2015) [DOI] [PubMed] [Google Scholar]





