NuInsSeg: A fully annotated dataset for nuclei instance segmentation in H&E-stained histological images

Amirreza Mahbod; Christine Polak; Katharina Feldmann; Rumsha Khan; Katharina Gelles; Georg Dorffner; Ramona Woitek; Sepideh Hatamikia; Isabella Ellinger

doi:10.1038/s41597-024-03117-2

. 2024 Mar 14;11:295. doi: 10.1038/s41597-024-03117-2

NuInsSeg: A fully annotated dataset for nuclei instance segmentation in H&E-stained histological images

Amirreza Mahbod ^1,^2,^✉, Christine Polak ², Katharina Feldmann ², Rumsha Khan ², Katharina Gelles ², Georg Dorffner ³, Ramona Woitek ¹, Sepideh Hatamikia ^1,⁴, Isabella Ellinger ²

PMCID: PMC10940572 PMID: 38486039

Abstract

In computational pathology, automatic nuclei instance segmentation plays an essential role in whole slide image analysis. While many computerized approaches have been proposed for this task, supervised deep learning (DL) methods have shown superior segmentation performances compared to classical machine learning and image processing techniques. However, these models need fully annotated datasets for training which is challenging to acquire, especially in the medical domain. In this work, we release one of the biggest fully manually annotated datasets of nuclei in Hematoxylin and Eosin (H&E)-stained histological images, called NuInsSeg. This dataset contains 665 image patches with more than 30,000 manually segmented nuclei from 31 human and mouse organs. Moreover, for the first time, we provide additional ambiguous area masks for the entire dataset. These vague areas represent the parts of the images where precise and deterministic manual annotations are impossible, even for human experts. The dataset and detailed step-by-step instructions to generate related segmentation masks are publicly available on the respective repositories.

Subject terms: Computational models, Super-resolution microscopy, Data acquisition, Data processing

Background & Summary

With the advent of brightfield and fluorescent digital scanners that produce and store whole slide images (WSIs) in digital form, there is a growing trend to exploit computerized methods for semi or fully-automatic WSI analysis¹. In digital pathology and biomedical image analysis, nuclei segmentation plays a fundamental role in image interpretation². Specific nuclei characteristics such as nuclei density or nucleus-to-cytoplasm ratio can be used for cell and tissue identification or for diagnostic purposes such as cancer grading^2–4. Nuclei instance segmentation masks enable the extraction of valuable statistics for each nucleus⁵. While experts can manually segment nuclei, this is a tedious and complex procedure as thousands of instances can appear in a small patch of a WSI^4,6. It is also worth mentioning that due to various artifacts such as folded tissues, out-of-focus scanning, considerable variations of nuclei staining intensities within a single image, and the complex nature of some histological samples (e.g., high density of nuclei), accurate and deterministic manual annotation is not always possible, even for human experts. The inter- and intraobserver variability reported in previous studies showing a low level of agreement in the annotation of cell nuclei by medical experts confirms this general problem^5,7.

In recent years, many semi- and fully-automatic computer-based methods have been proposed to perform nuclei instance segmentation automatically and more efficiently. A wide range of approaches from classical image processing to advanced machine learning methods have been proposed for this task^4,7. Up to this point, supervised deep learning (DL) methods such as Mask R-CNN and its variants^8,9, distance-based methods^10,11 and multi encoder-decoder approaches^6,12,13 have shown the best instance segmentation performances. However, to train these models, fully annotated datasets are required which is difficult to acquire in the medical domain^4,5,14.

A number of fully annotated nuclei instance segmentation datasets are available. These datasets were introduced for various types of staining such as Hematoxylin and Eosin (H&E), immunohistochemical and immunofluorescence stainings^4,15–17. The most common staining type in routine pathology is H&E-staining. Therefore, most introduced datasets were based on this staining method. Although these datasets are valuable contributions to the research field and help researchers to develop better segmentation models, providing more annotated datasets from different organs and centers to cover more data variability is still of high importance. Table 1 shows the most prominent fully manually annotated H&E-stained nuclei segmentation datasets that have been actively used by the research community in the past few years. Besides these datasets, some semi-automatically generated datasets such as PanNuke¹⁸, Lizard (used in the CoNIC challenge)^19,20, and Hou et al. dataset²¹ have also been introduced in the past. To generate these datasets, various approaches, such as using trained backbone models or point annotation, were exploited^20,22,23. However, training models based on semi-automatically generated datasets may introduce a hidden bias towards the reference model instead of learning the true human expert style for nuclei instance segmentation. It is also worth mentioning that other publicly available datasets, such as the OCELOT challenge dataset²⁴ or the DigestPath dataset²⁵, have been introduced for different nuclei analysis tasks, including nuclei detection and classification. However, the primary focus of this study is on the task of nuclei instance segmentation.

Table 1.

Publicly available H&E-stained nuclei segmentation datasets.

dataset	vague mask	# image tiles	# nuclei	magnification	# organs	tile size (pixels)	source
Kumar et al.⁴	✗	30	21,623	40×	7	1000 × 1000	TCGA
MoNuSeg⁷	✗	44	28,846	40×	9	1000 × 1000	TCGA
MoNuSAC¹⁷	partial	209	31,411	40×	4	81 × 113 to 1422 × 2162	TCGA
CoNSeP⁶	✗	41	24,319	40×	1	1000 × 1000	UHCW
CPM-15⁴⁰	✗	15	2,905	40×, 20×	2	400 × 400, 600 × 1000	TCGA
CPM-17⁴⁰	✗	32	7,570	40×, 20×	4	500 × 500 to 600 × 600	TCGA
TNBC¹⁰	✗	50	4,022	40×	1	512 × 512	Curie Inst.
CRCHisto⁴¹	✗	100	29,756	20×	1	500 × 500	UHCW
Janowczyk⁴²	✗	143	12,000	40×	1	2000 × 2000	n/a
Crowdsource⁴³	✗	64	2,532	40×	1	400 × 400	TCGA
CryoNuSeg⁵	✗	30	7,596	40×	10	512 × 512	TCGA
NuInsSeg²⁶	✓	665	30,698	40×	31	512 × 512	MUV

Open in a new tab

In the table, TCGA refers to The Cancer Genome Atlas, UHCW refers to University Hospitals Coventry and Warwickshire, and MUV refers to Medical University of Vienna. The last row of the table represents the NuInsSeg dataset introduced in this work.

In this paper, we present NuInsSeg²⁶, one of the most comprehensive, fully manually annotated, publicly available datasets for nuclei segmentation in H&E-stained histological images. The primary statistic of this dataset is presented in the last row of Table 1. Our dataset can be used alone to develop, test, and evaluate machine learning-based algorithms for nuclei instance segmentation or can be used as an independent test set to estimate the generalization capability of the already developed nuclei instance segmentation methods²⁷.

Methods

Sample preparation

The NuInsSeg dataset²⁶ contains fully annotated brightfield images for nuclei instance segmentation. The H&E-stained sections of 23 different human tissues were provided by Associate Professor Adolf Ellinger, PhD from the specimen collection of the Department of Cell Biology and Ultrastructural Research, Center for Anatomy and Cell Biology, Medical University of Vienna. We only obtained the stained tissue sections, not the original tissues. These images were only used for teaching purposes for a long time where no ethic votum applied. Some of the human tissues were formaldehyde-fixed, embedded in celloidin and sectioned at ≈ $15 - 20 μ m$ (jejunum, kidney, liver, oesophagus, palatine tonsil, pancreas, placenta, salivary gland, spleen, tongue). The other human tissues were formaldehyde-fixed and paraffin-embedded (FFPE) and sectioned at ≈ $4 - 5 μ m$ (cerebellum, cerebrum, colon, epiglottis, lung, melanoma, muscle, peritoneum, stomach (cardia), stomach (pylorus), testis, umbilical cord, and urinary bladder). Mouse tissue samples from bone (femur), fat (subscapularis), heart, kidney, liver, muscle (tibialis anterior muscle), spleen, and thymus were obtained from 8-week-old male C57BL/6 J mice28. 4 μm sections of the FFPE tissue samples were stained with H&E (ROTH, Austria) and coverslipped with Entellan (Merck, Germany). With one exception (human melanoma) all tissues in our dataset are healthy tissues.

Sample acquisition

WSIs were generated with a TissueFAXS (TissueGnostics, Austria) scanning system composed of an Axio Imager Z1 (Zeiss, Oberkochen, Germany), equipped with a Plan-Neofluar 40×/0.75 objective (40× air) in combination with the TissueFAXS Image Acquisition and Management Software (Version 6.0, TissueGnostics, Austria). Images were acquired at 8-bit resolution using a colour camera (Baumer HXG40c).

Field of view and patch selection

The scanning system stores individual 2048 × 2048 Field of Views (FOV) with their respective locations in order to be able to combine them into a WSI. Instead of using WSIs, we utilized the FOVs to generate the dataset. A senior cell biologist selected the most representative FOVs for each human and mouse WSI. From each FOV, a 512 × 512 pixel image was extracted by central cropping. These images were saved in lossless Portable Network Graphics (PNG) format. In total, 665 raw image patches were created to build the NuInsSeg dataset.

Generation of ground truth, auxiliary, and ambiguous area segmentation masks

We used ImageJ²⁸ (version 1.53, National Institutes of Health, USA) to generate the ground truth segmentation masks. We followed the same procedure suggested by Mahbod et al.⁵ to label nuclei. We used the region of interest (ROI) manager tool (available on the Analysis tab) and the freehand option to delineate the nuclei borders. We manually draw the nuclei border for each instance until all nuclei were segmented for a given image patch. Although some semi-automatic tools such as AnnotatorJ with U-Net backbone²⁹ could be used to speed up the annotation, we stuck to fully manual segmentation to prevent any hidden bias toward the semi-autonomic annotation method. The delineated ROIs were saved as a zip file, and the Matlab software (version 2020a) was then used to create binary and labeled segmentation images (as PNG files). Besides the original raw image patches, binary and labeled segmentation masks, we also publish a number of auxiliary segmentation masks that can be useful for developing computer-based segmentation models. Auxiliary segmentation masks, including border-removed binary masks, elucidation distance maps of nuclei, weighted binary masks (where higher weights are assigned in the borders of touching objects), are published along with our dataset. The developed codes to generate these masks are available on the published GitHub repository. Moreover, we annotated the ambiguous areas in all images of the dataset for the first time. Indicating ambiguous regions was partially provided in the test set of the MoNuSAC challenge³⁰, but in this work, we provide it for the entire dataset. We used an identical procedure and software to create the ambiguous segmentation masks. These vague areas consist of image parts with very complex appearances where the accurate and reliable manual annotation is impossible. This is potentially helpful for in-depth analysis and evaluation of any automatic model for nuclei instance segmentation. Manual segmentation of nuclei and ambiguous areas detection were performed by three students with a background in cell biology. The annotations were then controlled by a senior cell biologist and corrected when necessary. Some example images, along with related segmentation and vague masks, are shown in Fig. 1.

Fig. 1 — Example images and manual segmentation masks of three human organs from the NuInsSeg dataset. The first three columns show the original images, the labeled and the binary mask, respectively. The represented images in the fourth to sixth columns show auxiliary segmentation masks that can be beneficial for the development of segmentation algorithms. The last column shows the vague areas where accurate and deterministic manual segmentation is impossible. Some images do not contain ambiguous regions, such as the represented spleen image in the last row.

Data Records

The NuInsSeg dataset²⁶ is publicly available on Zenodo (10.5281/zenodo.10518968) and also on the Kaggle platform (https://www.kaggle.com/datasets/ipateam/nuinsseg). The related code to generate the binary, labeled, and auxiliary segmentation masks from the ImageJ ROI files is also available on the NuInsSeg published GitHub repository https://github.com/masih4/NuInsSeg. This dataset contains 665 image patches with 30,698 segmented nuclei from 31 human and mouse organs. The organ-specific details of the generated dataset are shown in Table 2. As shown in the table, the nuclei density in some tissues/organs (e.g., mouse spleen) is much higher in comparison to other tissues/organs (e.g., mouse muscle). This diversity allows for a more in-depth investigation into the capabilities of automatic models to handle different training and testing sizes for the nuclei instance segmentation task in the future studies.

Table 2.

Details of the NuInsSeg dataset per human and mouse organ.

Organ	Type	# Images	# Nuclei	Avg. #Nuclei per image
Cerebellum	human	12	549	45.8
Cerebrum	human	12	146	12.2
Colon	human	12	349	29.1
Epiglottis	human	11	228	20.7
Jejunum	human	10	874	87.4
Kidney	human	11	1,222	111.1
Liver	human	40	1,370	34.3
Lung	human	11	318	28.9
Melanoma	human	12	533	44.4
Muscle	human	9	127	14.1
Oesophagus	human	47	2,046	43.5
Palatine tonsil	human	12	1,045	87.1
Pancreas	human	44	2,178	49.5
Peritoneum	human	12	468	39.0
Placenta	human	40	1,966	49.2
Salivary gland	human	44	3,129	71.1
Spleen	human	34	3,286	96.7
Stomach (cardia)	human	12	671	55.9
Stomach (pylorus)	human	12	441	36.8
Testis	human	12	380	31.7
Tongue	human	40	1,415	35.4
Umbilical cord	human	11	106	9.6
Urinary bladder	human	12	400	33.3
Bone (femur)	mouse	6	757	126.2
Fat (subscapularis)	mouse	42	549	13.1
Heart	mouse	28	738	26.4
Kidney	mouse	40	1,597	39.9
Liver	mouse	36	646	17.9
Muscle (tibialis anterior muscle)	mouse	28	165	5.9
Spleen	mouse	7	1,657	236.7
Thymus	mouse	6	1,342	223.7
All	human	472	23,247	49.3
All	mouse	193	7,451	38.6
All	human + mouse	665	30,698	46.2

Open in a new tab

Technical Validation

To create a baseline segmentation benchmark, we randomly split the dataset into five folds with an equal number of images per fold (i.e., 133 images per fold). We used the Scikit-learn Python package to create the folds with a fixed random state to reproduce the results (splitting code is available on the Kaggle and Github pages). Based on the created folds, we developed a number of DL-based segmentation models and evaluated their performance based on five-fold cross-validation. To facilitate to use of our dataset and developing segmentation models, we published our codes for two standard segmentation models, namely shallow U-Net and deep U-Net models³¹ on the Kaggle platform (https://www.kaggle.com/datasets/ipateam/nuinsseg/code?datasetId=1911713). The model architectures of the shallow U-Net and deep U-Net are very similar to the original U-Net model but we added drop out layers between all convolutional layers in both encoder and decoder parts. Four and five convolutional blocks were used in the encoder and decoder parts of the shallow U-Net and deep U-Net, respectively. The model architecture of these two models is publicly available at our published kernels on our NuInsSeg page on the Kaggle platform. Besides these two models, we also evaluated the performance of the attention U-Net³², residual attention U-Net^32,33, two-stage U-Net³⁴, and the dual decoder U-Net¹³ models. The architectural details of these models were published in the respective articles. We performed an identical five-fold cross-validation scheme in all experiments to compare the results. For evaluation, we utilized similarity Dice score, aggregate Jaccard index (AJI), and panoptic quality (PQ) scores as suggested in former studies^5,6,35. The segmentation performance of the aforementioned models is reported in Table 3. As the results show, residual attention U-Net delivers the best overall Dice score between these models, but dual decoder U-Net provides the best average AJI and PQ scores. Interestingly, the dual decoder model achieved the best overall PQ score in the MoNuSAC post challenge leaderborad^17,36, and it also achieved the best instance-based segmentation scores for the NuInsSeg dataset. It should be noted that these results can be potentially improved by using well-known strategies such as ensembling³⁷, stain augmentation^27,38 or test time augmentation³⁹ but achieving the best segmentation scores is out of the focus of this study. Instead, these results could be used as baseline segmentation scores for comparison to other segmentation models in the future, given that the same five-fold cross-validation scheme is used.

Table 3.

NuInsSeg segmentation benchmark results based on five-fold cross-validation.

Model	Reference	# Parameters	Avg.Dice (%)	Avg. AJI (%)	Avg. PQ (%)
Shallow U-Net	³¹	1.9 million	78.8	50.5	42.7
Deep U-Net	³¹	7.7 million	79.7	49.4	40.4
Attention U-Net	³²	2.3 million	80.5	45.7	36.4
Residual attention U-Net	^32,33	2.4 million	81.4	46.2	36.9
Two-stage U-Net	³⁴	3.9 million	76.6	52.8	47.2
Dual decoder U-Net	¹³	3.5 million	79.4	55.9	51.3

Open in a new tab

Usage Notes

Our dataset, including raw image patches, binary and labeled segmentation masks, and other auxiliary segmentation masks, is publicly available on the published NuInsSeg pages on Zenodo²⁶ and Kaggle platform. Step-by-step instructions to perform manual annotations and related codes to generate the main and auxiliary segmentation masks are available at our published Github repository. We also provide three kernels on the Kaggle platform to facilitate using our dataset. One kernel is devoted to explanatory data analysis (EDA), where interested researchers can visualize and explore different statistics of the NuInsSeg dataset. The other two kernels consist of related codes to perform five-fold cross-validation based on two DL-based models, namely shallow U-Net and deep U-Net, as described in the previous section. Different Python packages were used in the coding of these kernels. To report statistics and visualize data in the EDA kernel, we mainly used Pandas (version 1.3.5) and Matplotlib (version 3.5.1) Python packages. For the DL-based model development, we mainly used Tensorflow (version 2.6.2), Keras (version 2.6.0) frameworks, and finally, for performing cross-validation, pre-and post-processing, and augmentation, Scikit-learn (version 0.23.2), Scikit-image (version 0.19.1) and Albumentation (version 1.1.0) were exploited, respectively.

In addition to publishing the dataset on a well-known repository (i.e., Zenodo²⁶), we have also made it available on the Kaggle platform, which offers limited free computational resources. Therefore, interested researchers can directly access our dataset and develop ML- or DL-based algorithms to perform nuclei instance segmentation on the NuInsSeg dataset. However, there is no limitation to downloading and saving the dataset on local systems and performing analysis using local or other cloud-based computational resources.

It is worth mentioning that the NuInsSeg dataset can be used alone to train, validate, and test any segmentation algorithm, or it can be used as an independent test set to measure the generalization capability of already developed segmentation models.

Acknowledgements

This work was supported by the Austrian Research Promotion Agency (FFG), No. 872636. We would like to thank NVIDIA for their generous GPU donation and the TissueGnostics support team (https://www.tissuegnostics.com/) for their valuable advice to generate the NuInsSeg dataset. Moreover, we would like to thank Adolf Ellinger (MedUni Vienna) for providing the human tissue sections and Peter Pietschmann (MedUni Vienna) who provided the mouse samples.

Author contributions

A.M. and I.E. conceptualized the paper idea, K.G. prepared the H&E-stained mouse sections and scanned all tissue sections, A.M., C.L., R.K., K.F., and I.E. performed annotations and controlled the segmentation masks, I.E. obtained funding, A.M conducted the experiments and reported the results, and G.D., S.H., R.W., and I.E. supervised the entire work. All authors reviewed the manuscript.

Code availability

The dataset and required code to generate the dataset are publicly available on Zenodo²⁶, Kaggle (https://www.kaggle.com/datasets/ipateam/nuinsseg), and GitHub (https://github.com/masih4/NuInsSeg), respectively.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Cui M, Zhang DY. Artificial intelligence and computational pathology. Lab. Invest. 2021;101:412–422. doi: 10.1038/s41374-020-00514-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Skinner BM, Johnson EE. Nuclear morphologies: their diversity and functional relevance. Chromosoma. 2017;126:195–212. doi: 10.1007/s00412-016-0614-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Chan JKC. The wonderful colors of the hematoxylin-eosin stain in diagnostic surgical pathology. International Journal of Surgical Pathology. 2014;22:12–32. doi: 10.1177/1066896913517939. [DOI] [PubMed] [Google Scholar]
4.Kumar N, et al. A dataset and a technique for generalized nuclear segmentation for computational pathology. IEEE Trans. Med. Imag. 2017;36:1550–1560. doi: 10.1109/TMI.2017.2677499. [DOI] [PubMed] [Google Scholar]
5.Mahbod A, et al. CryoNuSeg: A dataset for nuclei instance segmentation of cryosectioned H&E-stained histological images. Comput. Biol. Med. 2021;132:104349. doi: 10.1016/j.compbiomed.2021.104349. [DOI] [PubMed] [Google Scholar]
6.Graham S, et al. Hover-Net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med. Image Anal. 2019;58:101563. doi: 10.1016/j.media.2019.101563. [DOI] [PubMed] [Google Scholar]
7.Kumar N, et al. A multi-organ nucleus segmentation challenge. IEEE Trans. Med. Imag. 2020;39:1380–1391. doi: 10.1109/TMI.2019.2947628. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.He, K., Gkioxari, G., Dollár, P. & Girshick, R. Mask R-CNN. In International Conference on Computer Vision, 2980–2988 (2017).
9.Bancher B, Mahbod A, Ellinger I, Ecker R, Dorffner G. Improving Mask R-CNN for nuclei instance segmentation in hematoxylin & eosin-stained histological images. MICCAI Workshop on Computational Pathology. 2021;156:20–35. [Google Scholar]
10.Naylor P, Laé M, Reyal F, Walter T. Segmentation of nuclei in histopathology images by deep regression of the distance map. IEEE Trans. Med. Imag. 2019;38:448–459. doi: 10.1109/TMI.2018.2865709. [DOI] [PubMed] [Google Scholar]
11.Naylor, P., Laé, M., Reyal, F. & Walter, T. Nuclei segmentation in histopathology images using deep neural networks. In IEEE International Symposium on Biomedical Imaging, 933–936, 10.1109/ISBI.2017.7950669 (2017).
12.Zhao B, et al. Triple U-net: Hematoxylin-aware nuclei segmentation with progressive dense feature aggregation. Med. Image Anal. 2020;65:101786. doi: 10.1016/j.media.2020.101786. [DOI] [PubMed] [Google Scholar]
13.Mahbod, A. et al. A dual decoder u-net-based model for nuclei instance segmentation in hematoxylin and eosin-stained histological images. Frontiers in Medicine9, 10.3389/fmed.2022.978146 (2022). [DOI] [PMC free article] [PubMed]
14.Mahmood F, Chen R, Durr NJ. Unsupervised reverse domain adaptation for synthetic medical images via adversarial training. IEEE Trans. Med. Imag. 2018;37:2572–2581. doi: 10.1109/TMI.2018.2842767. [DOI] [PubMed] [Google Scholar]
15.Kromp F, et al. An annotated fluorescence image dataset for training nuclear segmentation methods. Sci. Data. 2020;7:1–8. doi: 10.1038/s41597-020-00608-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Mahbod, A. et al. Investigating the impact of the bit depth of fluorescence-stained images on the performance of deep learning-based nuclei instance segmentation. Diagnostics11, 10.3390/diagnostics11060967 (2021). [DOI] [PMC free article] [PubMed]
17.Verma, R. et al. MoNuSAC2020: A multi-organ nuclei segmentation and classification challenge. IEEE Trans. Med. Imag. 1–1, 10.1109/TMI.2021.3085712 (2021). [DOI] [PubMed]
18.Gamper, J., Alemi Koohbanani, N., Benet, K., Khuram, A. & Rajpoot, N. PanNuke: An open pan-cancer histology dataset for nuclei instance segmentation and classification. In Reyes-Aldasoro, C. C., Janowczyk, A., Veta, M., Bankhead, P. & Sirinukunwattana, K. (eds.) Digital Pathology, 11–19, 10.1007/978-3-030-23937-4_2 (2019).
19.Graham, S. et al. Lizard: A large-scale dataset for colonic nuclear instance segmentation and classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 684–693 (2021).
20.Graham S, et al. CoNIC challenge: Pushing the frontiers of nuclear detection, segmentation, classification and counting. Med. Image Anal. 2024;92:103047. doi: 10.1016/j.media.2023.103047. [DOI] [PubMed] [Google Scholar]
21.Hou L, et al. Dataset of segmented nuclei in hematoxylin and eosin stained histopathology images of ten cancer types. Sci. Data. 2020;7:1–12. doi: 10.1038/s41597-020-0528-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Lin Y, et al. Nuclei segmentation with point annotations from pathology images via self-supervised learning and co-training. Med. Image Anal. 2023;89:102933. doi: 10.1016/j.media.2023.102933. [DOI] [PubMed] [Google Scholar]
23.Alemi Koohbanani N, Jahanifar M, Zamani Tajadin N, Rajpoot N. NuClick: A deep learning framework for interactive segmentation of microscopic images. Med. Image Anal. 2020;65:101771. doi: 10.1016/j.media.2020.101771. [DOI] [PubMed] [Google Scholar]
24.Ryu, J. et al. OCELOT: Overlapped cell on tissue dataset for histopathology. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 23902–23912 (2023).
25.Da Q, et al. DigestPath: A benchmark dataset with challenge review for the pathological detection and segmentation of digestive-system. Med. Image Anal. 2022;80:102485. doi: 10.1016/j.media.2022.102485. [DOI] [PubMed] [Google Scholar]
26.Mahbod A, 2024. NuInsSeg: A fully annotated dataset for nuclei instance segmentation in H&E-stained histological images. Zenodo. [DOI] [PubMed]
27.Mahbod A, Dorffner G, Ellinger I, Woitek R, Hatamikia S. Improving generalization capability of deep learning-based nuclei instance segmentation by non-deterministic train time and deterministic test time stain normalization. Comput. Struct. Biotechnol. J. 2024;23:669–678. doi: 10.1016/j.csbj.2023.12.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Schindelin J, et al. Fiji: an open-source platform for biological-image analysis. Nat. Methods. 2012;9:676. doi: 10.1038/nmeth.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Hollandi R, Diósdi A, Hollandi G, Moshkov N, Horváth P. AnnotatorJ: an imagej plugin to ease hand annotation of cellular compartments. Mol. Biol. Cell. 2020;31:2179–2186. doi: 10.1091/mbc.E20-02-0156. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Verma R, et al. Author’s reply to “MoNuSAC2020: A multi-organ nuclei segmentation and classification challenge”. IEEE Trans. Med. Imag. 2022;41:1000–1003. doi: 10.1109/TMI.2022.3157048. [DOI] [PubMed] [Google Scholar]
31.Ronneberger, O., Fischer, P. & Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 234–241, 10.1007/978-3-319-24574-4_28 (2015).
32.Oktay, O. et al. Attention U-Net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 (2018).
33.He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, 770–778, 10.1109/CVPR.2016.90 (2016).
34.Mahbod, A. et al. A two-stage U-Net algorithm for segmentation of nuclei in H&E-stained tissues. In European Congress on Digital Pathology, 75–82, 10.1007/978-3-030-23937-4_9 (2019).
35.Kirillov, A., He, K., Girshick, R., Rother, C. & Dollar, P. Panoptic segmentation. In Conference on Computer Vision and Pattern Recognition, 9404–9413 (2019).
36.Foucart A, Debeir O, Decaestecker C. Comments on “MoNuSAC2020: A multi-organ nuclei segmentation and classification challenge”. IEEE Trans. Med. Imag. 2022;41:997–999. doi: 10.1109/TMI.2022.3156023. [DOI] [Google Scholar]
37.Mahbod, A., Schaefer, G., Ecker, R. & Ellinger, I. Pollen grain microscopic image classification using an ensemble of fine-tuned deep convolutional neural networks. In International Conference on Pattern Recognition, 344–356, 10.1007/978-3-030-68763-2_26 (2021).
38.Li, F., Hu, Z., Chen, W. & Kak, A. A laplacian pyramid based generative h&e stain augmentation network. IEEE Trans. Med. Imag. 1–1, 10.1109/TMI.2023.3317239 (2023). [DOI] [PubMed]
39.Wang, C. et al. FUSeg: The foot ulcer segmentation challenge. Information15(3), 140, 10.3390/info15030140 (2024).
40.Vu QD, et al. Methods for segmentation and classification of digital microscopy tissue images. Front. Bioeng. Biotechnol. 2019;7:53. doi: 10.3389/fbioe.2019.00053. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Sirin K, et al. Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images. IEEE Transaction on Medical Imaging. 2016;35:1196–1206. doi: 10.1109/TMI.2016.2525803. [DOI] [PubMed] [Google Scholar]
42.Janowczyk A, Madabhushi A. Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases. Journal of Pathology Informatics. 2016;7:29. doi: 10.4103/2153-3539.186902. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Irshad, H. et al. Crowdsourcing image annotation for nucleus detection and segmentation in computational pathology: evaluating experts, automated methods, and the crowd. In Pacific symposium on biocomputing Co-chairs, 294–305, 10.1142/9789814644730_0029 (2014). [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

Mahbod A, 2024. NuInsSeg: A fully annotated dataset for nuclei instance segmentation in H&E-stained histological images. Zenodo. [DOI] [PubMed]

Data Availability Statement

[CR1] 1.Cui M, Zhang DY. Artificial intelligence and computational pathology. Lab. Invest. 2021;101:412–422. doi: 10.1038/s41374-020-00514-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Skinner BM, Johnson EE. Nuclear morphologies: their diversity and functional relevance. Chromosoma. 2017;126:195–212. doi: 10.1007/s00412-016-0614-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Chan JKC. The wonderful colors of the hematoxylin-eosin stain in diagnostic surgical pathology. International Journal of Surgical Pathology. 2014;22:12–32. doi: 10.1177/1066896913517939. [DOI] [PubMed] [Google Scholar]

[CR4] 4.Kumar N, et al. A dataset and a technique for generalized nuclear segmentation for computational pathology. IEEE Trans. Med. Imag. 2017;36:1550–1560. doi: 10.1109/TMI.2017.2677499. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Mahbod A, et al. CryoNuSeg: A dataset for nuclei instance segmentation of cryosectioned H&E-stained histological images. Comput. Biol. Med. 2021;132:104349. doi: 10.1016/j.compbiomed.2021.104349. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Graham S, et al. Hover-Net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med. Image Anal. 2019;58:101563. doi: 10.1016/j.media.2019.101563. [DOI] [PubMed] [Google Scholar]

[CR7] 7.Kumar N, et al. A multi-organ nucleus segmentation challenge. IEEE Trans. Med. Imag. 2020;39:1380–1391. doi: 10.1109/TMI.2019.2947628. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.He, K., Gkioxari, G., Dollár, P. & Girshick, R. Mask R-CNN. In International Conference on Computer Vision, 2980–2988 (2017).

[CR9] 9.Bancher B, Mahbod A, Ellinger I, Ecker R, Dorffner G. Improving Mask R-CNN for nuclei instance segmentation in hematoxylin & eosin-stained histological images. MICCAI Workshop on Computational Pathology. 2021;156:20–35. [Google Scholar]

[CR10] 10.Naylor P, Laé M, Reyal F, Walter T. Segmentation of nuclei in histopathology images by deep regression of the distance map. IEEE Trans. Med. Imag. 2019;38:448–459. doi: 10.1109/TMI.2018.2865709. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Naylor, P., Laé, M., Reyal, F. & Walter, T. Nuclei segmentation in histopathology images using deep neural networks. In IEEE International Symposium on Biomedical Imaging, 933–936, 10.1109/ISBI.2017.7950669 (2017).

[CR12] 12.Zhao B, et al. Triple U-net: Hematoxylin-aware nuclei segmentation with progressive dense feature aggregation. Med. Image Anal. 2020;65:101786. doi: 10.1016/j.media.2020.101786. [DOI] [PubMed] [Google Scholar]

[CR13] 13.Mahbod, A. et al. A dual decoder u-net-based model for nuclei instance segmentation in hematoxylin and eosin-stained histological images. Frontiers in Medicine9, 10.3389/fmed.2022.978146 (2022). [DOI] [PMC free article] [PubMed]

[CR14] 14.Mahmood F, Chen R, Durr NJ. Unsupervised reverse domain adaptation for synthetic medical images via adversarial training. IEEE Trans. Med. Imag. 2018;37:2572–2581. doi: 10.1109/TMI.2018.2842767. [DOI] [PubMed] [Google Scholar]

[CR15] 15.Kromp F, et al. An annotated fluorescence image dataset for training nuclear segmentation methods. Sci. Data. 2020;7:1–8. doi: 10.1038/s41597-020-00608-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Mahbod, A. et al. Investigating the impact of the bit depth of fluorescence-stained images on the performance of deep learning-based nuclei instance segmentation. Diagnostics11, 10.3390/diagnostics11060967 (2021). [DOI] [PMC free article] [PubMed]

[CR17] 17.Verma, R. et al. MoNuSAC2020: A multi-organ nuclei segmentation and classification challenge. IEEE Trans. Med. Imag. 1–1, 10.1109/TMI.2021.3085712 (2021). [DOI] [PubMed]

[CR18] 18.Gamper, J., Alemi Koohbanani, N., Benet, K., Khuram, A. & Rajpoot, N. PanNuke: An open pan-cancer histology dataset for nuclei instance segmentation and classification. In Reyes-Aldasoro, C. C., Janowczyk, A., Veta, M., Bankhead, P. & Sirinukunwattana, K. (eds.) Digital Pathology, 11–19, 10.1007/978-3-030-23937-4_2 (2019).

[CR19] 19.Graham, S. et al. Lizard: A large-scale dataset for colonic nuclear instance segmentation and classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 684–693 (2021).

[CR20] 20.Graham S, et al. CoNIC challenge: Pushing the frontiers of nuclear detection, segmentation, classification and counting. Med. Image Anal. 2024;92:103047. doi: 10.1016/j.media.2023.103047. [DOI] [PubMed] [Google Scholar]

[CR21] 21.Hou L, et al. Dataset of segmented nuclei in hematoxylin and eosin stained histopathology images of ten cancer types. Sci. Data. 2020;7:1–12. doi: 10.1038/s41597-020-0528-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Lin Y, et al. Nuclei segmentation with point annotations from pathology images via self-supervised learning and co-training. Med. Image Anal. 2023;89:102933. doi: 10.1016/j.media.2023.102933. [DOI] [PubMed] [Google Scholar]

[CR23] 23.Alemi Koohbanani N, Jahanifar M, Zamani Tajadin N, Rajpoot N. NuClick: A deep learning framework for interactive segmentation of microscopic images. Med. Image Anal. 2020;65:101771. doi: 10.1016/j.media.2020.101771. [DOI] [PubMed] [Google Scholar]

[CR24] 24.Ryu, J. et al. OCELOT: Overlapped cell on tissue dataset for histopathology. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 23902–23912 (2023).

[CR25] 25.Da Q, et al. DigestPath: A benchmark dataset with challenge review for the pathological detection and segmentation of digestive-system. Med. Image Anal. 2022;80:102485. doi: 10.1016/j.media.2022.102485. [DOI] [PubMed] [Google Scholar]

[CR26] 26.Mahbod A, 2024. NuInsSeg: A fully annotated dataset for nuclei instance segmentation in H&E-stained histological images. Zenodo. [DOI] [PubMed]

[CR27] 27.Mahbod A, Dorffner G, Ellinger I, Woitek R, Hatamikia S. Improving generalization capability of deep learning-based nuclei instance segmentation by non-deterministic train time and deterministic test time stain normalization. Comput. Struct. Biotechnol. J. 2024;23:669–678. doi: 10.1016/j.csbj.2023.12.042. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Schindelin J, et al. Fiji: an open-source platform for biological-image analysis. Nat. Methods. 2012;9:676. doi: 10.1038/nmeth.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Hollandi R, Diósdi A, Hollandi G, Moshkov N, Horváth P. AnnotatorJ: an imagej plugin to ease hand annotation of cellular compartments. Mol. Biol. Cell. 2020;31:2179–2186. doi: 10.1091/mbc.E20-02-0156. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Verma R, et al. Author’s reply to “MoNuSAC2020: A multi-organ nuclei segmentation and classification challenge”. IEEE Trans. Med. Imag. 2022;41:1000–1003. doi: 10.1109/TMI.2022.3157048. [DOI] [PubMed] [Google Scholar]

[CR31] 31.Ronneberger, O., Fischer, P. & Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 234–241, 10.1007/978-3-319-24574-4_28 (2015).

[CR32] 32.Oktay, O. et al. Attention U-Net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 (2018).

[CR33] 33.He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, 770–778, 10.1109/CVPR.2016.90 (2016).

[CR34] 34.Mahbod, A. et al. A two-stage U-Net algorithm for segmentation of nuclei in H&E-stained tissues. In European Congress on Digital Pathology, 75–82, 10.1007/978-3-030-23937-4_9 (2019).

[CR35] 35.Kirillov, A., He, K., Girshick, R., Rother, C. & Dollar, P. Panoptic segmentation. In Conference on Computer Vision and Pattern Recognition, 9404–9413 (2019).

[CR36] 36.Foucart A, Debeir O, Decaestecker C. Comments on “MoNuSAC2020: A multi-organ nuclei segmentation and classification challenge”. IEEE Trans. Med. Imag. 2022;41:997–999. doi: 10.1109/TMI.2022.3156023. [DOI] [Google Scholar]

[CR37] 37.Mahbod, A., Schaefer, G., Ecker, R. & Ellinger, I. Pollen grain microscopic image classification using an ensemble of fine-tuned deep convolutional neural networks. In International Conference on Pattern Recognition, 344–356, 10.1007/978-3-030-68763-2_26 (2021).

[CR38] 38.Li, F., Hu, Z., Chen, W. & Kak, A. A laplacian pyramid based generative h&e stain augmentation network. IEEE Trans. Med. Imag. 1–1, 10.1109/TMI.2023.3317239 (2023). [DOI] [PubMed]

[CR39] 39.Wang, C. et al. FUSeg: The foot ulcer segmentation challenge. Information15(3), 140, 10.3390/info15030140 (2024).

[CR40] 40.Vu QD, et al. Methods for segmentation and classification of digital microscopy tissue images. Front. Bioeng. Biotechnol. 2019;7:53. doi: 10.3389/fbioe.2019.00053. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Sirin K, et al. Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images. IEEE Transaction on Medical Imaging. 2016;35:1196–1206. doi: 10.1109/TMI.2016.2525803. [DOI] [PubMed] [Google Scholar]

[CR42] 42.Janowczyk A, Madabhushi A. Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases. Journal of Pathology Informatics. 2016;7:29. doi: 10.4103/2153-3539.186902. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Irshad, H. et al. Crowdsourcing image annotation for nucleus detection and segmentation in computational pathology: evaluating experts, automated methods, and the crowd. In Pacific symposium on biocomputing Co-chairs, 294–305, 10.1142/9789814644730_0029 (2014). [DOI] [PMC free article] [PubMed]

PERMALINK

NuInsSeg: A fully annotated dataset for nuclei instance segmentation in H&E-stained histological images

Amirreza Mahbod

Christine Polak

Katharina Feldmann

Rumsha Khan

Katharina Gelles

Georg Dorffner

Ramona Woitek

Sepideh Hatamikia

Isabella Ellinger

Abstract

Background & Summary

Table 1.

Methods

Sample preparation

Sample acquisition

Field of view and patch selection

Generation of ground truth, auxiliary, and ambiguous area segmentation masks

Fig. 1.

Data Records

Table 2.

Technical Validation

Table 3.

Usage Notes

Acknowledgements

Author contributions

Code availability

Competing interests

Footnotes

References

Associated Data

Data Citations

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

NuInsSeg: A fully annotated dataset for nuclei instance segmentation in H&E-stained histological images

Amirreza Mahbod

Christine Polak

Katharina Feldmann

Rumsha Khan

Katharina Gelles

Georg Dorffner

Ramona Woitek

Sepideh Hatamikia

Isabella Ellinger

Abstract

Background & Summary

Table 1.

Methods

Sample preparation

Sample acquisition

Field of view and patch selection

Generation of ground truth, auxiliary, and ambiguous area segmentation masks

Fig. 1.

Data Records

Table 2.

Technical Validation

Table 3.

Usage Notes

Acknowledgements

Author contributions

Code availability

Competing interests

Footnotes

References

Associated Data

Data Citations

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases