. 2022 Jul 9;5:688. doi: 10.1038/s42003-022-03634-z

Table 1.

Overview of the deep learning models used in this study.

Network	Description
U-Net	The U-Net, an encoder-decoder type of convolutional neural network (CNN), was proposed for the first time by Olaf Ronneberger to segment microscopy images⁴¹. It represents a milestone in the field of computer vision (CV), and particularly, for bioimage analysis. The subdivision into two parts, the encoder and the decoder, is the main difference to previous CNN. The encoder extracts features at different scales by successively processing and downscaling the input image. By this process, the content of the input image is projected in more abstract feature space (i.e., feature encoding). Then, these features are upscaled, processed again and synthesised until reaching an image of similar size as the original one but which contains only the information of interest (i.e., feature decoding). In the decoding process, the image features are compared to the respective encoder part image to allow for better adjustment of the output to the initial image.
CARE	Content-aware image restoration (CARE) is a supervised DL-based image processing workflow developed by Weigert et al. for image restoration¹⁴. It uses a U-Net as backbone network architecture and its training parameters are modified to retain intensity differences instead of creating probability masks for segmentation. CARE’s main applications are image denoising and resolution enhancement, both in 2D and 3D. CARE is accessible through the CSBDeep toolbox, which allows the deployment of trained models in Fiji as well.
StarDist	StarDist was developed by Schmidt et al. for the supervised segmentation of star-convex objects in 2D and 3D (i.e., ellipse-like shaped objects)⁹. StarDist uses a U-Net like CNN to detect the centroid of each object in an image and the distances from each centroid to the object boundary. These features allow the representation of each object boundary as a unique polygon in the image, which is then used to determine the object segmentation. By treating each detected object independently, StarDist achieves an excellent performance at high object densities. The StarDist Python package is optimised for a fast and reliable training and prediction. StarDist is available as Fiji and Napari plugins and it is also integrated in QuPath⁸⁵. The battery of StarDist software is equipped with pretrained models for the segmentation of cell nuclei in fluorescence and histology images.
SplineDist	SplineDist was developed by the Uhlmann’s group and represents an extension of StarDist to detect non-star-convex objects⁴². The latter is achieved by substituting the polygons by splines, which enables the reconstruction of more complex structures besides ellipses. SplineDist is equipped with a Python package for training and deployment of their models.
pix2pix	The supervised pix2pix, developed in the lab of Alexei Efros, belongs to the class of generic adversarial networks (GANs)¹⁸. Two separate U-Net-like networks are trained in parallel to perform image-to-image translation (e.g., to convert daylight to nightlight photos, from DIC to fluorescence). Hereby, one network performs the image translation task (i.e., generator), while the second network tries to discriminate between the predicted image and the ground truth (i.e., discriminator). Model training is considered successful when the discriminator cannot distinguish between the original and the generated image anymore. In microscopy, pix2pix is employed to generate super-resolved images from sparse PALM data (ANNA-PALM)⁸⁶ or to convert low-resolution (widefield, confocal) to high-resolution images (confocal, STED)¹⁶.
Noise2Void	Noise2Void is a self-supervised network for image denoising proposed in microscopy by the Jug lab¹⁵. The idea behind this approach is that each image has a unique noise pattern. Hence, a small portion of an image is used during the training to determine it and then, use it to denoise the entire image. Training and prediction are fast. Noise2Void is part of the CSBDeep toolbox as well and in this particular case, training and deployment are available in Fiji.
YOLOv2	YOLOv2 was developed by Redmon and Farhadi for supervised (real-time) detection and classification of objects in images¹². Training requires paired data consisting of images and corresponding bounding boxes (i.e., rectangles drawn around the object of interest). For fast performance, YOLOv2 divides images into grids, in which each segment can only contain the centroid of a single object.
fnet	fnet was developed by the group of Gregory Johnson for artificial labelling of bright field/transmitted light images²⁰. In the original work, fnet generated pseudo-fluorescent images of different organelles in individual image stacks, increasing the multiplexing capability and reducing phototoxicity. As a U-Net type network, it performs supervised learning and requires the bright field/transmitted light images and fluorescence images as input. Originally designed for 3D images, we deploy a variant that can be used for artificial labelling of 2D images.