Skip to main content
Journal of Medical Imaging logoLink to Journal of Medical Imaging
. 2025 Sep 4;12(5):057501. doi: 10.1117/1.JMI.12.5.057501

Fine-grained multiclass nuclei segmentation with molecular empowered all-in-SAM model

Xueyuan Li a, Can Cui b, Ruining Deng b, Yucheng Tang c, Quan Liu b, Tianyuan Yao b, Shunxing Bao d, Naweed Chowdhury e, Haichun Yang f, Yuankai Huo a,b,c,f,*
PMCID: PMC12410749  PMID: 40918610

Abstract.

Purpose

Recent developments in computational pathology have been driven by advances in vision foundation models (VFMs), particularly the Segment Anything Model (SAM). This model facilitates nuclei segmentation through two primary methods: prompt-based zero-shot segmentation and the use of cell-specific SAM models for direct segmentation. These approaches enable effective segmentation across a range of nuclei and cells. However, general VFMs often face challenges with fine-grained semantic segmentation, such as identifying specific nuclei subtypes or particular cells.

Approach

In this paper, we propose the molecular empowered all-in-SAM model to advance computational pathology by leveraging the capabilities of VFMs. This model incorporates a full-stack approach, focusing on (1) annotation—engaging lay annotators through molecular empowered learning to reduce the need for detailed pixel-level annotations, (2) learning—adapting the SAM model to emphasize specific semantics, which utilizes its strong generalizability with SAM adapter, and (3) refinement—enhancing segmentation accuracy by integrating molecular oriented corrective learning.

Results

Experimental results from both in-house and public datasets show that the all-in-SAM model significantly improves cell classification performance, even when faced with varying annotation quality.

Conclusions

Our approach not only reduces the workload for annotators but also extends the accessibility of precise biomedical image analysis to resource-limited settings, thereby advancing medical diagnostics and automating pathology image analysis.

Keywords: deep learning, image annotation, cell segmentation, molecular empowered learning, foundation model

1. Introduction

The field of computational pathology1 is undergoing transformative advancements by integrating computational algorithms with whole slide imaging (WSI). This integration has shown promising results with improved diagnostic precision and advancing personalized medical treatments.2 Computational pathology focuses on analyzing digital pathological images to support clinical decisions and personalized therapies.3 However, accurately segmenting and classifying cell nuclei in these images remains challenging, especially in oncology, where it significantly impacts diagnostic and therapeutic planning.

Recently, vision foundation models (VFMs)46 have made significant strides in the area of medical image segmentation. These models are typically trained on large and diverse datasets to achieve superior performance across different tasks. The Segment Anything Model (SAM),5,7,8 as an example, has been particularly notable for its adaptability and efficiency, offering effective segmentation across various scenarios with minimal need for detailed annotations.9,10 In digital pathology, however, VFMs face challenges in performing fine-grained semantic segmentation that is necessary for distinguishing subtle differences among cell types or accurately segmenting specific cells in heterogeneous tissues.11,12 This limitation is critical in clinical studies and scientific research.

In this paper, we propose the molecular empowered all-in-SAM model, a holistic framework designed for precise cell segmentation and nuclei classification (Fig. 1). Our model adopts a full-stack approach, which comprises (1) annotation—employing molecular empowered learning to engage lay annotators and minimize the need for intricate pixel-level annotations, (2) learning—modifying the SAM model to focus on specific semantics, thereby capitalizing on its robust generalizability through the use of an SAM adapter, and (3) refinement—boosting segmentation accuracy by incorporating molecular oriented corrective learning (MOCL).

Fig. 1.

Fig. 1

Overall idea of our work: this diagram illustrates the distinctions between our approach (bottom panel) and existing methods. (1) Traditional: expert annotators manually label cells using only PAS images. (2) MOCL: lay annotators provide pixel-level labels under the guidance of IF molecular images, followed by the application of deep learning for segmentation. (3) SAM-L: the SAM technique is utilized to expedite the annotation process, requiring only minimal (box) annotations. (4) All-in-SAM (our method): we integrate SAM in the annotation phase and adaptively fine-tune it during the training of the model.

In the proposed all-in-SAM framework, the VFM leverages both annotation generation and model fine-tuning stages to reduce annotation costs while maintaining high segmentation performance, for instance, nuclei segmentation. To address the key challenge of fine-grained instance cell segmentation for different cell types, our MOCL approach incorporates molecular imaging data in training (but not needed in inference), multimodal registration, and corrective learning, providing a holistic solution for this challenging task.

The contribution of the proposed all-in-SAM model is fourfold:

  • Scheme. Molecular empowered learning allows fine-grained nuclei annotation: We present molecular empowered learning, a method that reduces the need for extensive domain expertise in detailed annotation. This approach allows nonexperts, such as undergraduate students, to accurately annotate fine-grained nuclei from histopathology data using minimal domain knowledge. This is achieved by incorporating paired molecular immunofluorescence (IF) images, lowering the training costs associated with procuring annotated datasets from domain experts.

  • Annotation. SAM for annotation: The SAM model is used to reduce the annotation workload by shifting from detailed pixel-level contour delineation to more efficient weak annotations, such as box annotations.

  • Learning. SAM adaptor for label-efficient fine-tuning: The incorporation of the SAM adaptor enables the model to efficiently adapt a pre-trained SAM model using the mentioned annotations, reducing the necessity for extensive retraining with newly labeled data.

  • Refinement. Advanced corrective learning techniques for segmentation refinement: MOCL is introduced to further refine segmentation by correcting errors using molecular insights and partially annotated data, enabling more precise identification and segmentation of cell types, especially in complex and heterogeneous samples.

The proposed all-in-SAM model has been tested on both public and in-house datasets. The results show its superior performance on multiclass instance segmentation. It offers a promising direction for deploying pathological nuclei segmentation and classification, leveraging the latest advancements in VFM.

2. Related Work

The convergence of molecular biology and digital imaging within the realm of biomedical analysis has led to significant strides in disease diagnosis and research. This section delves into pivotal advancements in nuclei identification and segmentation, as well as the evolution of foundation models that form the scaffolding for molecular analysis, framing the backdrop for the novel contributions of our study.

2.1. Cell and Nuclei Segmentation

Digital pathology has made significant strides in cell and nuclei segmentation, which is vital for detailed pathological analysis. The Glo-In-One toolkit13 simplifies the detection of glomerular structures, integrating complex tasks into a user-friendly interface. Similarly, Juang et al.14 have enhanced cellular segmentation within renal pathology by combining deep learning with generative morphology techniques.

Advancements in self-supervised learning have revolutionized nuclei segmentation. Xie et al.15 introduced a model that uses data’s intrinsic properties to facilitate training without extensive labeled datasets, improving both automation and accuracy. In addition, Sahasrabudhe et al.16 have implemented attention mechanisms within their self-supervised learning framework to significantly enhance nuclei segmentation, adapting well to the variability of nuclear morphology.

These innovations are crucial for accurate diagnosis in medical imaging, as seen in Kumar et al.,17 who provided essential resources for reliable nuclear segmentation. Despite these advancements, challenges persist due to the variability in morphology across different slides and the vast amount of data in each image.18,19

Overall, the continuous evolution of segmentation technologies promises to refine the precision and utility of digital pathology, driving forward more advanced and automated methods for handling complex pathological data.

2.2. Vision Foundation Models

VFMs have revolutionized many areas within computer vision due to their ability to generalize effectively from vast, diverse datasets to specific tasks with minimal additional training.5,6 These models, extensively pre-trained on large image datasets, are exceptionally adaptable and capable of managing complex visual tasks, which makes them indispensable in fields ranging from general computer vision to specialized medical imaging applications.

In medical imaging and pathology, VFMs such as convolutional neural networks (CNNs), vision transformers (ViTs), and more recently, models such as Swin Transformers and Perceiver IO have shown significant promise. Each of these models brings unique strengths to the challenges inherent in pathological image analysis.

  • CNNs: They have been foundational in image analysis, renowned for their efficiency in processing pixel data and extracting hierarchical features. In pathology, they are predominantly utilized for tasks such as tumor detection and tissue classification, showcasing their adeptness in handling detailed and nuanced analyses.20,21

  • ViTs: Adapted from the transformer architecture initially developed for natural language processing, ViTs apply self-attention mechanisms to capture global dependencies within images. This characteristic enables them to excel in identifying intricate patterns and anomalies in medical images, which is crucial for complex diagnostic tasks.22,23

  • Advanced models (Swin Transformers and Perceiver IO): These newer VFMs enhance the capacity to manage diverse visual representations and complexities, making them highly suitable for the multifaceted nature of pathological image analysis.

  • Segment Anything Model (SAM): SAM is a groundbreaking advancement in VFMs, designed to offer highly flexible and generalizable image segmentation capabilities. Its proficiency in segmenting diverse and intricate objects within images is particularly beneficial in pathology, where distinguishing between normal and abnormal tissues is often challenging.

The integration of VFMs into pathology workflows significantly enhances diagnostic accuracy, speeds up decision-making, and automates routine analysis tasks, thereby reducing the workload for medical professionals and improving the scalability of assessments. As these models continue to evolve, their adoption in clinical pathology is poised to expand, promising to transform medical diagnostics and enable the development of more sophisticated, personalized treatments. The future of VFMs in pathology is bright, with ongoing technological advancements likely to introduce new capabilities and reshape the landscape of the field.

2.3. Foundation Models for Nuclei Segmentation

Advancements in deep learning, particularly through the development of foundation models such as the SAM and UNet, have significantly enhanced the segmentation of nuclei in pathology. SAM, developed by Deng et al.24, showcases a pivotal shift toward adaptable, scalable models with zero-shot learning capabilities, reducing the dependency on heavily annotated datasets. This allows for effective generalization across various nuclei segmentation tasks without extensive dataset-specific training.

Further advancements by Kaur et al.25 have refined the UNet architecture to autonomously segment nuclei in whole slide images (WSIs), highlighting how automation in deep learning can streamline the detection and analysis of key structures in medical imaging. These automated methods are crucial for improving the throughput and diagnostic accuracy in pathological analysis.

In addition, the integration of molecular data has been instrumental in refining the segmentation accuracy. Techniques such as Cellpose 2.09 and Stardist,10 introduced by Pachitariu and Stevens, respectively, enhance nuclei detection by providing high-precision segmentation in complex tissue, demonstrating significant improvements in tasks such as circulating tumor cell detection.

The Segment Any Cell framework by Na and the CellVit model by Ho¨rst et al.12 further exemplify the evolution of foundation models. These models use advanced machine learning techniques to segment and classify different types of cells with remarkable accuracy, even under challenging conditions.11 This highlights the potential of foundation models not just in enhancing existing applications but also in pioneering new methods for medical diagnostics.

These technological innovations have set a new standard in the field, combining the robustness of foundation models with the precision required for effective nuclei segmentation. The synergy between these models and MOCL addresses key challenges in the field, paving the way for groundbreaking advancements in biomedical image analysis.

3. Method

Our proposed framework is illustrated in Fig. 2. Step 1 transforms weak annotations into pixel-level annotations using both acid-Schiff (PAS) and IF images. SAM allows annotators to only provide weak annotations (e.g., boxes) by converting those to pixel-level annotations. Then, the SAM adapter is introduced to fine-tune the SAM model for multiclass nuclei segmentation. Last, the corrective learning is developed to further enhance the segmentation performance.

Fig. 2.

Fig. 2

Framework: this figure shows the framework of the proposed all-in-SAM method. The framework consists of three main steps: (1) molecular empower annotation, where experts or lay annotators provide box-level annotations to PAS images, which serve as prompts for the pre-trained SAM; (2) SAM adaptation, which utilizes image embeddings processed through multiple transformer blocks and adaptors, integrating prompts to fine-tune the segmentation masks; and (3) MOCL, where corrective learning processes are applied to refine segmentation based on top-k pixel features from images, enhancing the specificity and accuracy of the predictions.

3.1. Molecular Empowered SAM Annotation

Following our recent work,26 the first major innovation of our pipeline is the integration of molecular empowered learning into the annotation process. Specifically, we provide PAS and IF images simultaneously, enabling lay annotators to perform fine-grained, multiclass instance segmentation of cell nuclei. This approach simplifies the segmentation task from identifying complex morphologies to merely recognizing different colors. Subsequently, all cell types are categorized using multiple binary masks.

In our all-in-SAM framework, SAM is employed to further simplify and accelerate the above molecular empowered annotation process. Specifically, the lay annotators only draw the bounding boxes for cell nuclei. Then, such weak annotations are converted to pixel-level annotations using the SAM model.

3.2. SAM Adapter for Cell Segmentation

Although SAM demonstrates strong performance in generic segmentation tasks, its efficacy may diminish when applied to specific tasks, potentially resulting in suboptimal outcomes or failures. To address this challenge, we utilize a new pipeline, presented in Fig. 2 at step 2. This approach is discerning, opting not for a wholesale fine-tuning of the entire model but rather for a targeted adaptation of its latter layers.

More specifically, this strategy involves automatically extracting and encoding texture information from each image as handcrafted features, which are subsequently incorporated into multiple layers of the encoder. Supervised by approximate segmentation masks, this prompt-based fine-tuning is applied to the pre-trained SAM model. During inference, nuclei can be segmented directly from images without the need for box prompts.

3.3. MOCL-Assisted Segmentation Refinement

MOCL aims to further improve the segmentation performance of the SAM adapter as a post-processing step. Specifically, as illustrated in Fig. 2 at step 3, MOCL utilizes top-k pixel feature embeddings from the annotation regions. These regions are selected based on higher confidence from the prediction probability W, defined as the confidence score in Eq. (1), indicative of critical areas for the current cell type identified from the decoder in Eq. (2)

W=f(X;θ)[:,1], (1)
topk(k,E,W,Y)={(e1,w1),(e2,w2),,(ek,wk)}Y(E,W). (2)

In this model, k represents the number of selected embedding features, E denotes the embedding map from the last layer of the decoder, and Y is the lay annotation. Each element in this map, e, represents the embedded features of a specific area in the image. w represents the confidence scores associated with each embedding e.

Next, we calculate a cosine similarity score S between the embedding from an arbitrary pixel and the embedding from critical embedding features, as shown in Eq. (3). Here, m denotes the channel of the feature embeddings

S(ei,etopk)=m=1M(ei×etopk)m=1M(ei)2×m=1M(etopk)2. (3)

Given that labels from lay annotators can be noisy and erroneous, W and S in Eq. (4) are utilized in the subsequent equations to highlight the regions where both the model’s predictions and lay annotations concur on the cell type, improving the accuracy of the segmentation. For expert annotations, higher confidence W and similarity S values lead to stronger weighting, maximizing precision. For lay annotations, where variability is higher, W and S help to mitigate noise by selectively emphasizing areas with agreement, thereby improving the robustness of the segmentation pipeline. This weighting approach is then integrated into the calculation of the loss function in Eq. (5)

ω(W)=exp(W)×Y,ω(S)=S×Y, (4)
L(Y,f(X;θ))=(LDice(Y,f(X;θ))+LBCE(Y,f(X;θ)))×ω(W)×ω(S). (5)

4. Data and Experimental Design

We employed both public and in-house datasets for our experiments.

4.1. In-House Dataset

For this experiment, we utilized PAS-stained glomerular images paired with corresponding WT1 (podocyte marker) or GATA3 (mesangial cell marker)-stained IF images to enhance analytical accuracy. Although IF imaging is somewhat more expensive than PAS staining, they are both affordable. It is important to note that once the model is trained, IF images are no longer required for new patients. The trained model only relies on PAS images as input, significantly reducing the ongoing costs. The dataset comprised 11 WSIs, including three slides of injured glomeruli. Digital scans of these stained tissue samples were conducted at 20× magnification. Using a comprehensive multimodality multiscale registration process, we created a dataset consisting of 1147 patches featuring podocytes and 789 patches with mesangial cells. Each patch measured 512×512  pixels, derived from specific glomeruli or molecular structures within the WSIs.

WT1 and GATA3 exhibit heterogeneous expression in normal and diseased states. To mitigate the inconsistency, experienced pathologists provided us with the optimal threshold for different markers at the WSI level. Moreover, the introduction of the SAM method alleviates the inter-rater variability when applied to different stains.

Annotations were provided by one experienced renal pathologist and three computer science students, utilizing ImageJ (version v1.53t). The “Synchronize Windows” feature was employed for cursor synchronization across different modalities, and the “ROI Manager” was used to manage binary masks of each cell type. The dataset was randomly split into training, validation, and testing subsets in a 6:1:3 ratio, ensuring a balanced representation of both injured and normal glomeruli.

4.2. Public Dataset

For a more focused and detailed experiment on all-in-SAM, we utilized the MICCAI 2018 Monuseg dataset,17 which includes 30 training images and 14 testing images. Each image has dimensions of 1000×1000  pixels and is accompanied by corresponding masks of nuclei.

The MICCAI dataset has inherent limitations in terms of its size, which may restrict the generalizability of our findings. We explicitly acknowledge this as a constraint and highlight the necessity of validating our results using larger and more diverse datasets in future studies. To address this limitation, we plan to expand our dataset by incorporating additional WSIs and annotated patches. Such efforts will facilitate more comprehensive evaluations and enhance the robustness of our findings in subsequent research.

4.3. Environment and Evaluation Metrics

The experiment involved delineating cellular structures on WSIs at a workstation equipped with a 12-core Intel Xeon W-2265 Processor and an NVIDIA RTXA6000 graphics processing unit (GPU). Separate annotation of cell contours was carried out using an 8-core AMD Ryzen 7 5800X Processor and an XP-PEN Artist 15.6 Pro Wacom tablet. The annotation process for a single cell type on one WSI was 9  h, whereas the batch processing of staining and scanning 24 IF WSIs took 3  h.

In this study, we use the Dice similarity coefficient (Dice) to evaluate pixel-level segmentation accuracy, and the F1-score is employed to assess instance-level detection.

5. Results

5.1. All-in-SAM on In-House Multiclass Cell Segmentation Dataset

We present a comprehensive evaluation of the all-in-SAM method applied to multiclass cell segmentation. As shown in Fig. 3 and Table 1, we investigate the performance of four distinct methods, including MOCL, SAM-L with tight bounding boxes and random bounding boxes, and all-in-SAM, applied to multiclass cell segmentation, specifically targeting podocytes and mesangials.

Fig. 3.

Fig. 3

Annotation results: this figure presents the annotation accuracy using different methods, notably SAM-L and all-in-SAM, highlighting their effectiveness in various test conditions.

Table 1.

Performance of multiclass cell segmentation (F1-score). The difference between the reference (Ref.) method and benchmarks is statistically evaluated by the Wilcoxon signed rank test.

  Pod. Mes. Pod. Mes. Pod. Pod. Mes. Mes.
MOCL
Expert 0.7124 0.7022 0.7547 0.6674 0.7321 p<0.05 0.6685 p<0.05
Students
0.7198
0.7157
0.7657
0.6830
0.7411
p<0.05
0.7028
p<0.05
SAM-L tight
Expert 0.7105 0.7027 0.7657 0.6683 0.7362 p<0.05 0.6891 p<0.05
Students
0.7043
0.7014
0.7390
0.6513
0.7205
p<0.05
0.6817
p<0.05
SAM-L random
Expert 0.7226 0.6994 0.7673 0.6713 0.7434 p<0.05 0.6883 p<0.05
Students
0.7170
0.7096
0.7565
0.6723
0.7354
p<0.05
0.6949
p<0.05
All-in-SAM Expert 0.7218 0.7087 0.7699 0.7380 0.7462 Ref. 0.7146 Ref.
Students 0.7364 0.6987 0.7421 0.7187 0.7329 p<0.05 0.7137 p<0.05

Note: p<0.05 indicates the statistical significant.

The F1-score represents the harmonic mean of precision and recall, two critical indicators of a model’s accuracy. Each entry in the F1-score columns of Table 1 reflects the F1-score calculated from the outcomes achieved by each segmentation method under specific conditions, such as “injured podocyte” and “normal mesangial.”

We conducted a statistical analysis using the Wilcoxon signed rank test to evaluate the performance differences in multiclass cell segmentation F1-scores. Specifically, we used the average podocyte F1-score derived from expert annotations in the all-in-SAM method as the reference value (0.7462). Similarly, the average mesangial F1-score derived from expert annotations in the all-in-SAM method (0.7146) was also used as a reference for mesangial comparisons. For each case, we calculated the differences between these reference values and the average F1-scores obtained from other methods.

The statistical test revealed that all comparisons for both podocytes and mesangials yielded p-values<0.05, indicating statistically significant differences between the all-in-SAM method and the other benchmarks. With the highest average F1-scores for both podocyte and mesangial cell segmentation based on expert annotations, all-in-SAM demonstrates superior precision and recall, consistently outperforming other methods across various conditions and annotator groups.

5.2. All-in-SAM on Public Dataset

Building on the insights from the initial results, we extend our analysis to compare all-in-SAM with other leading methods under varying training regimes and annotation strengths. This comparison aims to assess the robustness and adaptability of all-in-SAM in less controlled environments.

Table 2 presents a further comparative analysis involving other state-of-the-art (SOTA) methods such as nnUNet, LViT, and BEDs. The evaluation spans multiple training volumes and annotation qualities, from weak to complete, providing a comprehensive overview of each method’s performance under diverse operational conditions.

Table 2.

Comparison with other SOTA methods when using different numbers of training data with weak or complete annotation.

Label Method Training data Dice IoU ADJ
Complete
Xie
All 0.7063
30% 0.6031
10%


0.5501
LViT
All 0.8033 0.6724
25%
0.7994
0.6680

BEDs
All + More
0.8177


nnUNet
All 0.8244 0.6976 0.7028
4%
0.7920
0.6540
0.6580
All-in-SAM
All 0.8254 0.6974 0.7036
4%
0.8134
0.6810
0.6867
Weak nnUNet
All 0.8212 0.6935 0.6974
4%
0.7913
0.6527
0.6562
All-in-SAM All 0.8246 0.6973 0.7024
4% 0.8099 0.6760 0.6814

When trained with the entire dataset and complete annotation, both nnUNet and all-in-SAM exhibit comparable performance. However, with 100% of the training data volume, all-in-SAM emerges as the top performer. Subsequently, when the training data is reduced to 4%, all-in-SAM demonstrates superior performance compared with other methods. Notably, when utilizing weak labels for training data, all-in-SAM consistently outperforms other methods, maintaining the highest level of performance across evaluations.

5.3. Ablation studies

In Table 3, we specifically select two outstanding methods identified from Table 2: nnUNet and all-in-SAM, assessing their performance using both complete and weak labels across varying percentages of training data. In addition, at this time, we use more metrics, such as recall and precision, to evaluate the model’s performance.

Table 3.

Comparison of different methods trained by different numbers of weakly/completely annotated data.

Label Method Training data Dice AUC Recall Precision bestF1 IoU ADJ
Complete
nnUNet
All 0.8244 0.9604 0.8282 0.8255 0.8321 0.6976 0.7028
4% 0.7920 0.9410 0.8490 0.7460 0.7970 0.6540 0.6580
0.50%
0.7623
0.9249
0.8679
0.6861
0.7797
0.6135
0.6186
All-in-SAM
All 0.8254 0.9717 0.8474 0.8081 0.8304 0.6974 0.7036
4% 0.8134 0.9550 0.8492 0.7853 0.8190 0.6810 0.6867
0.50%
0.7816
0.9440
0.8457
0.7351
0.7917
0.6379
0.6430
Weak nnUNet
All 0.8212 0.9565 0.9126 0.7498 0.8368 0.6935 0.6974
4% 0.7913 0.9474 0.9276 0.6922 0.8154 0.6527 0.6562
0.50%
0.7500
0.9264
0.9336
0.6199
0.7757
0.5881
0.5918
All-in-SAM All 0.8246 0.9732 0.8947 0.7678 0.8339 0.6973 0.7024
4% 0.8099 0.9522 0.8845 0.7509 0.8187 0.6760 0.6814
0.50% 0.7873 0.9430 0.8704 0.7248 0.7982 0.6457 0.6502

When trained with complete training data, all-in-SAM demonstrates superior performance compared with nnUNet. This trend persists even when the training data is reduced to 0.5%. In addition, when comparing all-in-SAM’s performance between complete and weak labels, it maintains its effectiveness without any significant drop, indicating its robustness across different annotation qualities.

6. Discussion

The comprehensive empirical analysis conducted as part of this study highlights the profound efficacy of the molecular empowered all-in-SAM model in addressing the challenging task of multiclass cell segmentation within high-resolution WSIs. Our investigation spans a variety of testing scenarios characterized by different levels of annotation completeness and volumes of training data. The data distilled into Tables 13 illustrate a clear superiority of our proposed models over current SOTA methods, particularly emphasizing the robustness of our approaches when working with suboptimal annotations. Moreover, the proposed all-in-SAM model maintains high performance under constrained conditions, highlighting its adaptability and readiness for broader clinical adoption, which could significantly impact the future of digital pathology by making high-quality diagnostics more accessible and reliable.

The significance of this research lies not only in its technical contributions but also in its potential to transform pathological workflows. The ability of the molecular empowered all-in-SAM model to reliably and efficiently segment WSIs addresses a critical bottleneck in digital pathology. By automating this traditionally labor-intensive process, the model enables faster, more accurate pathological assessments, which are vital for timely and effective medical diagnoses.

Moreover, this innovation holds particular promise for resource-limited settings where access to expert pathologists and advanced diagnostic tools may be scarce. By enhancing the accessibility and precision of digital pathology, the all-in-SAM model can democratize high-quality diagnostics, bridging gaps in healthcare inequities and improving outcomes on a global scale. Its ability to operate effectively with limited training data and annotations further amplifies its utility in settings with restricted resources, accelerating the pace of diagnostic processes while maintaining reliability.

In conclusion, the molecular empowered all-in-SAM model represents a significant step forward in digital pathology. Its robust performance, adaptability, and accessibility underscore its potential to advance medical diagnostics, streamline workflows, and ultimately contribute to improving healthcare outcomes worldwide. Future work will focus on expanding the dataset and validating the model across diverse clinical and pathological contexts to further enhance its generalizability and impact.

7. Conclusion

In this study, we have introduced the molecular empowered all-in-SAM model, an AI framework designed to address the challenges of multiclass cell segmentation within high-resolution WSIs in computational pathology. By leveraging innovative techniques such as the SAM and MOCL, the key contributions lie in its ability to utilize lay annotators for cost-effective data collection with only weak labels and molecular guided segmentation, employ the SAM adaptor for label-efficient finetuning, and apply advanced corrective learning techniques for precise segmentation tasks. Through comprehensive evaluations, the proposed all-in-SAM model has demonstrated superior performance while reducing reliance on specialized annotations from domain experts.

Acknowledgments

This research was supported by the Department of Defense [DoD Grant No. HT9425-23-1-0003 (Yang)] and the National Institutes of Health [NIH Grant No. R01DK135597 (Huo) as well as NIH Grant Nos. R01EB033385, R01DK132338, REB017230, and R01MH12593. This research was supported by the National Science Foundation (NSF Grant No. 2040462) and the NSF National Artificial Intelligence Research Resource Pilot NAIRR Pilot (Award No. NAIRR240055). The work was also supported by the Vanderbilt Seed Success Grant, the Vanderbilt Discovery Grant, and the VISE Seed Grant. The project was also supported by the Leona M. and Harry B. Helmsley Charitable Trust (Grant Nos. G-1903-03793 and G-2103-05128). We extend our gratitude to NVIDIA for their support by means of the NVIDIA hardware grant.

Biographies

Xueyuan Li is a master’s student in data science at Vanderbilt University. She is advised by Prof. Yuankai Huo at HRLB Lab. Her main research interests are medical image analysis, deep learning, and computer vision.

Biographies of the other authors are not available.

Funding Statement

This research was supported by the Department of Defense [DoD Grant No. HT9425-23-1-0003 (Yang)] and the National Institutes of Health [NIH Grant No. R01DK135597 (Huo) as well as NIH Grant Nos. R01EB033385, R01DK132338, REB017230, and R01MH12593. This research was supported by the National Science Foundation (NSF Grant No. 2040462) and the NSF National Artificial Intelligence Research Resource Pilot NAIRR Pilot (Award No. NAIRR240055). The work was also supported by the Vanderbilt Seed Success Grant, the Vanderbilt Discovery Grant, and the VISE Seed Grant. The project was also supported by the Leona M. and Harry B. Helmsley Charitable Trust (Grant Nos. G-1903-03793 and G-2103-05128).

Contributor Information

Xueyuan Li, Email: xueyuan.li@ucsf.edu.

Can Cui, Email: can.cui.1@vanderbilt.edu.

Ruining Deng, Email: r.deng@vanderbilt.edu.

Yucheng Tang, Email: yuchengt@nvidia.com.

Quan Liu, Email: quan.liu@vanderbilt.edu.

Tianyuan Yao, Email: tianyuan.yao@vanderbilt.edu.

Shunxing Bao, Email: shunxing.bao@vanderbilt.edu.

Naweed Chowdhury, Email: naweedc@gmail.com.

Haichun Yang, Email: haichun.yang@vumc.org.

Yuankai Huo, Email: yuankai.huo@vanderbilt.edu.

Disclosures

The authors of the paper have no conflicts of interest to report.

Code and Data Availability

The code used to generate the results and figures is available in a GitHub repository (https://github.com/Xueyuan33/Molecular-Empowered-All-in-SAM). We have two different datasets: in-house dataset and public dataset. The in-house dataset that supports the findings of this article is not publicly available due to privacy concerns. They can be requested from the author at lixueyuan69@gmail.com. For the public dataset, you can find it from the MICCAI 2018 MoNuSeg dataset: https://doi.org/10.57702/e1r4ueix.

References

  • 1.Cui M., Zhang D. Y., “Artificial intelligence and computational pathology,” Lab. Invest. 101(4), 412–422 (2021). 10.1038/s41374-020-00514-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Fuchs T. J., Buhmann J. M., “Computational pathology: challenges and promises for tissue analysis,” Comput. Med. Imaging Graphics 35(7–8), 515–530 (2011). 10.1016/j.compmedimag.2011.02.006 [DOI] [PubMed] [Google Scholar]
  • 3.Huo Y., et al. , “Ai applications in renal pathology,” Kidney Int. 99(6), 1309–1320 (2021). 10.1016/j.kint.2021.01.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Lee H. H., et al. , “Foundation models for biomedical image segmentation: a survey,” arXiv:2401.07654 (2024).
  • 5.Liu Y., et al. , “Segment any point cloud sequences by distilling vision foundation models,” in Adv. Neural Inf. Process. Syst. (2024). [Google Scholar]
  • 6.Wang H., et al. , “SAM-CLIP: merging vision foundation models towards semantic and spatial understanding,” in Proc. IEEE/CVF Conf. Comput. Vision and Pattern Recognit., pp. 3635–3647 (2024). 10.1109/CVPRW63382.2024.00367 [DOI] [Google Scholar]
  • 7.Huang Y., et al. , “Segment anything model for medical images?,” Med. Image Anal. 92, 103061 (2024). 10.1016/j.media.2023.103061 [DOI] [PubMed] [Google Scholar]
  • 8.Mazurowski M. A., et al. , “Segment anything model for medical image analysis: an experimental study,” Med. Image Anal. 89, 102918 (2023). 10.1016/j.media.2023.102918 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Pachitariu M., Stringer C., “Cellpose 2.0: how to train your own model,” Nat. Methods 19(12), 1634–1641 (2022). 10.1038/s41592-022-01663-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Stevens M., et al. , “StarDist image segmentation improves circulating tumor cell detection,” Cancers 14(12), 2916 (2022). 10.3390/cancers14122916 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Na S., et al. , “Segment any cell: a SAM-based auto-prompting fine-tuning framework for nuclei segmentation,” arXiv:2401.13220 (2024).
  • 12.Ho¨rst F., et al. , “CellViT: vision transformers for precise cell segmentation and classification,” Med. Image Anal. 94, 103143 (2024). 10.1016/j.media.2024.103143 [DOI] [PubMed] [Google Scholar]
  • 13.Yao T., et al. , “Glo-in-one: holistic glomerular detection, segmentation, and lesion characterization with large-scale web image mining,” J. Med. Imaging 9(5), 052408 (2022). 10.1117/1.JMI.9.5.052408 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Juang C.-F., et al. , “Deep learning-based glomerulus detection and classification with generative morphology augmentation in renal pathology images,” Comput. Med. Imaging Graphics 115, 102375 (2024). 10.1016/j.compmedimag.2024.102375 [DOI] [PubMed] [Google Scholar]
  • 15.Xie X., et al. , “Instance-aware self-supervised learning for nuclei segmentation,” Lect. Notes Comput. Sci. 12265, 341–350 (2020). 10.1007/978-3-030-59722-1_33 [DOI] [Google Scholar]
  • 16.Sahasrabudhe M., et al. , “Self-supervised nuclei segmentation in histopathological images using attention,” Lect. Notes Comput. Sci. 12265, 393–402 (2020). 10.1007/978-3-030-59722-1_38 [DOI] [Google Scholar]
  • 17.Kumar N., et al. , “A dataset and a technique for generalized nuclear segmentation for computational pathology,” IEEE Trans. Med. Imaging 36(7), 1550–1560 (2017). 10.1109/TMI.2017.2677499 [DOI] [PubMed] [Google Scholar]
  • 18.Dimitriou N., Arandjelović O., Caie P. D., “Deep learning for whole slide image analysis: an overview,” Front. Med. 6, 264 (2019). 10.3389/fmed.2019.00264 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Lu M. Y., et al. , “Data-efficient and weakly supervised computational pathology on whole-slide images,” Nat. Biomed. Eng. 5(6), 555–570 (2021). 10.1038/s41551-020-00682-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.O’shea K., Nash R., “An introduction to convolutional neural networks,” arXiv:1511.08458 (2015).
  • 21.Chauhan R., Ghanshala K. K., Joshi R., “Convolutional neural network (CNN) for image detection and recognition,” in First Int. Conf. Secure Cyber Comput. and Commun. (ICSCCC), IEEE, pp. 278–282 (2018). 10.1109/ICSCCC.2018.8703316 [DOI] [Google Scholar]
  • 22.Park N., Kim S., “How do vision transformers work?,” arXiv:2202.06709 (2022).
  • 23.Zhou D., et al. , “Understanding the robustness in vision transformers,” in Int. Conf. Mach. Learn., PMLR, pp. 27378–27394 (2022). [Google Scholar]
  • 24.Deng R., et al. , “Segment anything model (SAM) for digital pathology: assess zero-shot segmentation on whole slide imaging,” in IS&T Int. Symp. Electron. Imaging, Vol. 37, pp. COIMG-132 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Kaur G., et al. , “Automatic identification of glomerular in whole-slide images using a modified Unet model,” Diagnostics 13(19), 3152 (2023). 10.3390/diagnostics13193152 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Deng R., et al. , “Democratizing pathological image segmentation with lay an- notators via molecular-empowered learning,” Lect. Notes Comput. Sci. 14225, 497–507 (2023). 10.1007/978-3-031-43987-2_48 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The code used to generate the results and figures is available in a GitHub repository (https://github.com/Xueyuan33/Molecular-Empowered-All-in-SAM). We have two different datasets: in-house dataset and public dataset. The in-house dataset that supports the findings of this article is not publicly available due to privacy concerns. They can be requested from the author at lixueyuan69@gmail.com. For the public dataset, you can find it from the MICCAI 2018 MoNuSeg dataset: https://doi.org/10.57702/e1r4ueix.


Articles from Journal of Medical Imaging are provided here courtesy of Society of Photo-Optical Instrumentation Engineers

RESOURCES