Skip to main content
PLOS One logoLink to PLOS One
. 2024 Mar 11;19(3):e0297356. doi: 10.1371/journal.pone.0297356

Automatic detection of cell-cycle stages using recurrent neural networks

Abin Jose 1,*,#, Rijo Roy 1,#, Daniel Moreno-Andrés 2, Johannes Stegmaier 1,*
Editor: Xiao Luo3
PMCID: PMC10927108  PMID: 38466708

Abstract

Mitosis is the process by which eukaryotic cells divide to produce two similar daughter cells with identical genetic material. Research into the process of mitosis is therefore of critical importance both for the basic understanding of cell biology and for the clinical approach to manifold pathologies resulting from its malfunctioning, including cancer. In this paper, we propose an approach to study mitotic progression automatically using deep learning. We used neural networks to predict different mitosis stages. We extracted video sequences of cells undergoing division and trained a Recurrent Neural Network (RNN) to extract image features. The use of RNN enabled better extraction of features. The RNN-based approach gave better performance compared to classifier based feature extraction methods which do not use time information. Evaluation of precision, recall, and F-score indicates the superiority of the proposed model compared to the baseline. To study the loss in performance due to confusion between adjacent classes, we plotted the confusion matrix as well. In addition, we visualized the feature space to understand why RNNs are better at classifying the mitosis stages than other classifier models, which indicated the formation of strong clusters for the different classes, clearly confirming the advantage of the proposed RNN-based approach.

Introduction

Chromatin segregation errors during mitosis are a source of chromosomal instability and a hallmark of diverse pathologies [1, 2], including cancer [3, 4]. Therefore studying in detail the highly dynamic process of mitosis, as well as the fate of chromosomes during cell division is of great importance for both basic and clinical research. The study of mitotic cytology and its pathological phenotypes date back more than a century with the pioneering work of Walther Flemming [5]. However, the quantitative study of morphological parameters in fixed samples [68] or time-lapse images of living cells have recently emerged as a powerful tool to elucidate underlying molecular mechanisms [911]. In the pursuit of less biased and time-consuming quantitative approaches, an immense effort is being made to generalise algorithms capable of automatically segmenting, tracking, extracting and quantifying different feature aspects of the mitotic events in time-lapse microscopy image sequences. A sequence indicates a series of frames. Such tools for unbiased and comprehensive cytological analysis reflecting natural and pathological changes in different phases of the cell-cycle are urgently needed in both basic and translational research [12]. The technological advancement in the area of machine learning with deep neural networks can potentially support biological and clinical researchers. For example, these automated methods could give a faster and more precise analysis of cell behaviors under different drug treatments [13, 14].

Related work

The state-of-the-art methods identifying cell-cycle stages in time-lapse microscopy records extract image features and then expert biologists train machine learning algorithms to generate cytological classifiers. For example, CellCognition [10] is a technique that is applied for annotating time-resolved mitosis stages from live cell image sequences. The high similarity between images in some stages of mitosis and smooth transition will introduce high classification noise at the state transitions. CellCognition demonstrates that the incorporation of time information into the annotation can compress the classification noise and reduce the confusion between images of similar morphology. In this approach, an object detection method is implemented initially to identify the location of each cell. They used local adaptive thresholding [15] with watershed split-and-merge error correction [16] to detect individual cells with high accuracy. Then for each object, they calculated features describing texture and shape. From these features, they combined a supervised machine learning technique for classification with a Hidden Markov Model (HMM) to reduce misclassification by incorporating time information. Even though this method gives good performance, some of the mitosis stages are wrongly classified. This is due to the high degree of similarity between some classes as well as the fact that some classes have fewer training samples than others. They assumed that each state of the cell at a given time point depends on the previous state and considered an HMM [17] for the error correction. All the parameters of the HMM like prior probabilities, transition probabilities, and prediction probabilities are derived automatically from the output of a Support Vector Machine classifier. Then they derived the overall maximum likelihood path for the sequence by the Viterbi algorithm [18]. HMM increased the overall accuracy of the model by eliminating the misclassifications at stage transitions.

In 2012, [19] proposed an unsupervised method for identifying the stages of mitosis from the images captured using time-lapse microscopy. This paper introduces a clustering algorithm based on a temporally-constrained combinatorial clustering (TC3) method as a module in the CellCognition [10] software. This approach uses the features extracted from the time-lapse microscopy images of human tissue culture cells (HeLa ‘Kyoto’ cells). This method classifies the cells into interphase and five stages of mitosis, which are prophase, prometaphase, metaphase, anaphase, and telophase. The CellCognition software calculates the synchronized time series of cell features based on the shape and texture of the tracked cells over time. The authors then convert these features to a lower data dimension using principal component analysis [20] (PCA). They use this converted data as the starting point of the TC3 algorithm. For each cell trajectory, the TC3 algorithm will cluster temporally liked features to a user-defined number of classes. The authors first used a binary clustering algorithm to divide the PCA features into 3 clusters based on mitotic subgraph properties. Further, they used the TC3 algorithm to cluster into subclasses. The authors also state that the performance of the model can be increased by using TC3 results to initialize a Gaussian Mixture Model (GMM). Also, further extended the performance by extending this GMM model results to HMM. This model predicted the cell behaviors closer to the user annotations. For each subcluster, the TC3 algorithm will do an exhaustive search with all possible combinations of labels. Within each set, it calculates the distance measures of features assigned to the same class of labels and selects the set of labels with the least distance measure as the final output. TC3 algorithm on the set of features used for this dataset gives results similar to the user annotations.

LiveCellMiner [11] is an open-source tool developed to analyse live-cell time-lapse records obtained in different microscopy platforms in a quantitative and unbiased manner. This software tool can track single cell fates and extracts, analyses, and visualises biological relevant image features from 2D+t microscopy images. The study of human cells passing through mitosis under different experimental conditions has settled the proof of principal application of this tool. Some of the functionalities of this software include fully-automatic segmentation and tracking of the cells. This tool can be also used to extract the quantitative features of the cells being tracked. This tool is available as an extension package in the MATLAB toolbox SciXMiner [21]. Object detection functionalities of this software are utilized to locate different cell nuclei in an image containing many cells and then crop a square region around each cell. One of the datasets used in this paper is extracted by this tool. Segmentation labels for each cell are extracted by using a modified version of Otsu’s method [22]. The cell trajectory synchronization application of LiveCellMiner is used to identify the duration of different mitotic stages. Using this tool mitosis as well as wrongly detected trajectories can be automatically identified. The default settings for mitotic analysis on LiveCellMiner divides cell-cycle into three classes: interphase, early mitosis (prophase to the end of metaphase), and late mitosis (anaphase, telophase and until G1 when the nucleus recovers to interphase). LiveCellMiner uses three different methods for synchronizing cell trajectories. The first method is a TC3 clustering method [19]. In this method, classical image features like area, circularity, and intensity of each cell are calculated and clusters with minimum within-class variance are extracted for detecting the interphase to prophase transition. The metaphase-to-anaphase transition comes from the tracking and uses a user-defined distance of sister chromatin masses to identify anaphase onset. The second approach is based on the first approach and additionally uses trainable LSTM [23] networks which evaluate each trajectory as a whole, identifying erroneous trajectories. The third method uses Convolutional Neural Networks (CNNs) features extracted by using GoogleNet and uses LSTMs to predict the state sequence for all time points. These predicted time points are post-processed with an HMM model which allows only valid state transitions. Then the most likely sequence of stages is identified using the Viterbi algorithm [18].

A Recurrent Neural Network (RNN) [24] is a type of deep learning technique used with temporal sequence data. Artificial neural networks such as CNNs are meant for single data points which are independent of each other. However, in sequential data or time series data, one data point is dependent on the previous data points. Since RNNs are temporally connected, they can store the information from prior inputs to influence the current input and output. Thus, the output of an RNN is dependent on not only the current time point but also on the previous time points. This type of network is commonly used in speech recognition and natural language processing. RNNs can also be used with convolutional layers to extend their application to video data. When the sequence data is very large, the gradient information during the training of an RNN is not able to propagate back to the earlier time points. This problem is called vanishing gradients. Cho et al. [25] proposed a type of RNN called Gated Recurrent Unit (GRU) to solve the vanishing gradient problem. GRU solves this problem by using an update gate and a reset gate, which decides what information has to be passed to the output.

In 2016, Ondruska et al. [26] proposed an RNN-based approach as ‘Deep Tracking’, for the end-to-end detection of objects from sensor data. In this paper, the authors used RNN networks to extract features from the input data. For the end-to-end object tracking directly from raw sensor data for robots, in 2016, Ondruska et al. [26] proposed the deep tracking approach. The idea was to use raw sensor data without any feature engineering as the input and produce an output that also included the detection of the occluded objects. This paper also followed an unsupervised training approach to achieve this result. The authors stated that this is the first approach that uses unsupervised end-to-end tracking of objects using sensor data. Classical approaches at the time of this work solved object detection in separate stages, with an object detection stage and a tracking stage. The proposed deep tracking approach is trained end-to-end, thereby overcoming the hand-engineering required in such a separately executed model. This is achieved by exploiting the sequential model in the form of RNNs to learn complex dynamics from raw data to object tracks. In this model, the hidden states of the RNNs can capture certain appearances and motion patterns of objects. Inspired by the Bayesian filtering approach and considering a generative model, the authors stated that there exists a Markov process, which completely captures the state of the world. From the hidden state representation, the location of each object at each time frame is trained to predict using binary cross-entropy loss. In a later work, Ondruska et al. [27] proposed that each of the object classes can be learned from this hidden state distribution by using RNNs. The network can perform a semantic classification from the rich information learned in the hidden states. They proposed that this network can be trained end-to-end with a small amount of labeled data. This classification network is introduced once the deep tracking network is learned. This method outperformed many semantic classification methods available at that time. To be able to track moving objects throughout the time sequence, the network must remember the location and other properties of each object. To achieve this tracking, the authors used GRUs as the processing steps at each layer of the RNN. Convolutional GRUs are utilized to maintain the resolution. From the learned hidden layer, two convolutional networks are employed to predict the semantic segmentation as well as the class of the object. The network is trained with softmax loss for the classification and a binary cross-entropy loss for the segmentation.

This paper

In this paper, we propose an approach that identifies different stages of a cell during mitosis using deep learning techniques from tracked cell images. The tracked cell images follow a single cell in different time frames and avoid background clutter from the surrounding cells. The aim is to find the phases of mitosis of the cell in different time frames. In other words, the aim is to find the temporal segmentation of a video sequence of cell data. This means that the class labels are assigned to each frame of the video sequence to classify the mitotic phases. Inspired by the model proposed by [27], we modify the network architecture and incorporate time-related propagation of features for better classification of mitosis stages. The main contributions of this paper are summarized here:

  • We propose a novel network architecture, Time Encoded ResNet18 Model, which uses the GRUs to capture the inherent time dependency between different frames during mitosis, resulting in better classification.

  • We compare the performance of the proposed RNN-based approaches to the original deep learning-based classification networks such as ResNet18 [28].

  • Experiments were conducted on two different datasets: a) LiveCellMiner dataset [11], which contains images in three different phases of mitosis. b) Zhong et al.’s dataset which contains images in six different phases of mitosis.

  • By visualizing the feature space using PCA [20], the effect of the incorporation of time information is studied.

  • We also plotted the confusion matrices to identify the classes that cause confusion and identified potential areas for further improvement in classification accuracy.

  • For quantitative evaluation, we have measured the precision, recall, and F-score.

  • The reconstruction of the center cell using tracking network is visualized for the images in different mitotic phases.

Proposed approach

Given a cell mitosis video sequence, it has become of utmost importance in the context of cell-biology research to find the temporal segmentation of cell-cycle stages. While there are several successful methods to track [29, 30] the cells from a video sequence, identification of time-dependent cell-cycle stages or cytological phenotypes using deep learning methods is still a challenging task. Most existing approaches [10, 11, 19] for cell stage classification start by extracting features separately for each of the frames and then try to identify the stage sequence by a similarity-based grouping of the feature vectors. However, the temporal dependency of features that were extracted from successive frames is not considered here. To incorporate this time dependency, using normal CNNs would not be sufficient. In this context, we propose an RNN-based approach to address the problem of time-related features. Since in RNN, the networks are connected in time, it can effectively transfer information across temporal data.

Base model

Each cell follows a set of unique steps during cell division. Thus, the different phases of cell mitosis always occur in the same order and the current state can be considered as being dependent on its previous states. Therefore, a cell in any frame of a cell mitosis video sequence is dependent on the same cell from earlier frames. To this end, inspired by the networks proposed by [26, 27], we propose our first architecture by considering time-related features for the classification of the cell mitosis stages. The network design of the model is given in Fig 1. Our proposed model consists of four main layers:

Fig 1. Network architecture.

Fig 1

Illustration of the architecture of our proposed base model applied to the nth frame of a video sequence [31].

  • A backbone network for feature extraction.

  • A time encoding network to incorporate time information into the features.

  • A tracking network to track the cell.

  • A classification network for the mitosis stage prediction.

The initial results and experiments using this base model are summarized in our previous work [31]. The experiments show that adding the time information into the features, helps to better identify cell-cycle stages. Each of these modules is explained in detail in [31].

We further modified this baseline architecture, with a view that deeper networks can create deep feature representations with complex functions and are more efficient. The modified model, time encoded ResNet18, which we elaborate in this paper also has the aforementioned modules. The details of this model are explained in next subsection.

Time encoded ResNet18

This model introduces deeper layers between each GRU layer by combining the architecture of ResNet18 [28] with RNN layers. Combining these two modules introduces a deeper architecture with convolutional layers between each RNN layer. This in turn enhances the feature extraction capability of the model. Combining these two modules helps to propagate the features extracted at different layers and increases the classification performance. RNN combines the features of the current frame with the information from the previous frames in the video. In a process like a cell mitosis, this encoding of time information helps in predicting the class to which the cell belongs, because one stage is dependent on the previous stages and always occurs in the same order. After the time encoded features are extracted, tracking and classification networks are used for reconstructing the center-cell and predicting the class of the cell respectively.

Time encoded backbone network

The main difference from the base model [31], is that there is no separate backbone and time encoding network. They are combined together into a single module, as shown in Fig 2. The time encoded backbone network extracts the features with time information for each frame in the video sequence. This is achieved by combining a state-of-the-art model like ResNet18 with the properties of an RNN. ResNet18 contains eighteen deep layers with eight residual blocks. The proposed network uses an architecture of ResNet18 with convolutional GRUs between the residual blocks to transfer time information between frames. Similar to the base model, this model has three convolutional GRUs. The GRU layers in this model have been positioned such that the dimensions of each GRU layer is different. This helps to propagate information at multiple scales. In the ResNet18 architecture, a max-pooling layer reduces the dimensionality after every two residual blocks. Thus, each convolutional GRU is placed between two residual blocks of the ResNet18 architecture. This addition of convolutional GRU layers does not change the dimensionality of the features of the ResNet18, but help in propagating time information by combining features from the previous time frames to the present frame. The channel sizes of the convolution GRU layers are 128, 256, and 512. The original grayscale image is rescaled and converted to an RGB image for matching the input requirements of the ResNet18 architecture. Starting with the initial image ximg3×H0×W0 (RGB image), the time encoded backbone generates a lower resolution activation map fC×H×W. Typical values used for channels and image resolution are C = 512, and H,W = H032,W032 respectively. The ResNet18 layers are loaded with pre-trained weights of network trained on ImageNet dataset and the complete network is fine-tuned with the cell sequence dataset.

Fig 2. Network architecture.

Fig 2

Illustration of the architecture of our proposed time encoded ResNet18 model applied to the nth frame of a video sequence where GRU blocks are connected between residual blocks of ResNet18.

Tracking network

Since the time encoded backbone network has a deeper architecture compared to that of the base model, the output features are at a lower spatial dimension. So compared to the base model, which uses three transpose convolutional layers, this tracking network uses four transpose convolutional layers to reconstruct the center-cell to the original image dimension from the output of the time encoded backbone network. 3×3 kernels are used in all transpose convolutional layers. The typical values of the channel dimensions used in each transpose convolutional layer are 256, 128, 64, and 3. The final layer has an output with three channels matching the RGB image size with each channel representing the grayscale content. Thus a resolution equal to the RGB input image is reconstructed and the output dimensions are xtracking3×H0×W0.

Classification network

A shallow classification network is enough to predict the class from the output features of the time-encoding backbone network. A convolutional layer, a max-pool layer, and two fully-connected layers make up this network. The convolutional layer has 1024 filters and uses 3×3 kernels. The feature dimensions are then reduced by a 2×2 max-pooling layer. The features are flattened after the max-pooling layer. Then a series of two fully-connected networks is added that predicts the class to which each frame belongs. The classification network’s last layer will provide an output whose size is equal to the number of classes.

ResNet18 classifier

The two models that were discussed previously use RNN to propagate information between subsequent frames. To compare the performance of the models without using RNN, we propose to use a state-of-the-art deep learning classification model. This model consider each frame in the video as independent of other frames in the same sequence. This approach uses transfer learning on a ResNet18 [28] model. A pre-trained model architecture with the last layer replaced with the number of classes is used. Then the complete network is trained with image datasets of the mitosis sequence. ResNet18 has eighteen deep layers with eight residual block connections. In the final layer, the number of outputs is updated with the number of classes in our dataset. Thus the network predicts the stage of mitosis for each input image. The complete network is then trained with the cell mitosis dataset. In this way, the network can learn more information related to microscopy images, whereas the pre-trained ResNet18 model is trained with non-microscopy images. The network consists of seventeen convolutional layers and one fully-connected layer. All the convolutional layers have a kernel size of 3×3. Eight residual blocks are implemented with two convolutional layers. A max-pooling operation reduces the dimension after every two residual blocks. Finally, the fully-connected layer predicts the class to which each image belongs.

Training details

This section explains training with the supervised approaches proposed in this paper. The models are trained end-to-end. The proposed models that use GRUs to deal with time-based features have a tracking network and a classification network. The tracking network reconstructs only the center-cell out of many cells in the frame so that the feature space will contain information about this cell. The classification network predicts the stage of mitosis of the cell. The fully-connected network in the final layer of the classification network has an output dimension equal to the number of classes to predict. The losses for these two networks are combined simultaneously by using a weighting factor. The deep learning classification models, which do not have a GRU, have only the classification network that predicts the stage of the mitosis of a cell, and the loss is optimized during training. The following section explains how these models are trained in different scenarios.

Training base model and time encoded ResNet18

Since time encoding networks are used to help the flow of information across time frames, each image in a sequence is dependent on previous images. Therefore in these models, the sequence of images belonging to the same cell is given as the input.

Tracking network loss

The input images used for training base model are grayscale images with a resolution of 96×96. Time encoded ResNet18 uses 3-channel RGB images as input and has a resolution of 224×224. Hence, the original images are converted to 3 channels and resized for training these models. The output of the tracking network in both models has a dimensionality equal to that of the input training images. The tracking network reconstructs only the center-cell from the input image. This is achieved by calculating the loss between the predicted output and a masked image. The masked image is the input image masked with the segmentation of the center-cell. The last layer of the tracking network uses a sigmoid activation layer. Then the loss between the predicted and the expected input is calculated using a binary cross-entropy loss [32] as given below:

Ltrack=-1W0·H0wW0hH0(y(w,h)·log(y^(w,h))+(1-y(w,h))·log(1-y^(w,h))), (1)

where y(w, h) and y^(w,h) are the input and predicted values of the network at pixel location ((w, h)). W0 and H0 are the input image dimensions. The segmentation mask of new images can also be extracted using a simple intensity threshold operation on the predicted output of the tracking network.

Classification network loss

In the classification network, the final layer has a number of outputs equal to the number of mitosis stages. This layer has a sigmoid activation function. The network is trained with a one-hot vector of the ground-truth values. This helps the network to train to predict the probability of the image belonging to each class. The training loss function used is binary cross-entropy loss [32] as shown below:

Lcls=-1NclasscNclass(yc·log(y^c)+(1-yc)·log(1-y^c)), (2)

where yc is the one-hot embedding of the ground truth values and y^c is the output predicted values of the classification network for the cth class. Nclass is the total number of classes in the dataset. The class with the highest probability is chosen as the predicted class during inference. The total loss is calculated as a weighted sum of the tracking and classification losses. A parameter λwt is used to weigh the losses.

Ltot=Ltrack+λwt·Lcls. (3)

Training ResNet18 classifier

The deep learning architectures which used ResNet18 classifier has only the classification network loss. In these models, each image is considered to be independent of the previous image in the same sequence. The final layer of these networks has a number of outputs equal to the number of classes to predict. A one-hot embedding vector of the ground truth is used to train this network. This network is trained with the cross-entropy loss [32] to predict the probability of the image belonging to each class as shown below:

Lcls=-1NclasscNclass(yc·log(y^c)), (4)

where yc is the one-hot embedding of the ground truth values and y^c is the classification network’s output prediction values for the cth class. Nclass is the total number of classes in the dataset. The class with the highest probability is chosen as the predicted class during inference. The total loss of models with deep learning architectures is the same as the classification loss as shown below:

Ltot=Lcls. (5)

Datasets

For experimental evaluation, we have used two main datasets. The first dataset is provided by Moreno-Andrés et al. [11] and the second dataset is provided by [19]. Both datasets contain microscopic image sequences of the mitotic process from human HeLa cells expressing H2B-mCherry as fluorescent chromatin marker. These images are acquired using time-lapse microscopy. In the following subsections, these two datasets are explained.

LiveCellMiner dataset

As explained in, the LiveCellMiner [11] tool allows the analysis of mitotic phases in 2D+t microscopy images. These images contain human tissue culture cells as they undergo mitosis and are acquired using widefield and confocal microscopy. These acquired images have many cells in each time frame. The software tracks the position of each cell in the image and then extracts the tracked single cell image. The software then checks and eliminates cells that do not undergo cell division. A region is cropped around each tracked cell from each time frame if the cell undergoes cell division. Each of these cropped images has a resolution of 96 × 96 pixels, and the target cell is in the center of the cropped image. For each cell division sequence, 90 frames are available with this resolution. This dataset is divided into three classes. This software identifies interphase to early prophase transition and metaphase/early anaphase to late anaphase transition as reference patterns for the alignment of interphase or postmitotic frames, and then automatically detects interphase to prophase and metaphase to anaphase transitions. Then it divides the data into interphase, mitosis, and post-mitosis classes. A sequence of images belonging to different classes is illustrated in Fig 3. Experts corrected the predicted labels of the LiveCellMiner tool and released them as ground truth annotations. Along with this tool, four different image datasets are published. All four datasets are acquired on the human HeLa cells expressing H2B-mCherry transfected with indicated siRNA oligonucleotides in eight-well µ-slide chambers. These datasets contain images that were taken three minutes apart. The total number of training sequences in each of these datasets and a detailed description of these datasets are given below. The first dataset is the LSM710 dataset. In this dataset, the cells are imaged using an LSM710 confocal microscope (Zeiss) and ZEN software (Zeiss). 1458 sequences are available for training with the LSM710 dataset. The second dataset is the LSD1 dataset [33]. It was acquired using an LSM5 live confocal microscope (Zeiss) and ZEN software. With this dataset 1042 sequences are available for training. RecQL4 dataset [34] is the third dataset and was acquired using LSM5 live confocal microscope and ZEN software. 1214 sequences with this dataset are available for training. The last dataset is the NikonXLight dataset [35]. This dataset is imaged with a widefield module of a Ti2 Eclipse (Nikon) equipped with a LED light engine SpectraX and GFP/mCherry filter sets and using elements software (Nikon). This dataset contains 1229 sequences for training. Fig 4 illustrates a few images of these four datasets at different stages of cell splitting.

Fig 3. Illustration of the images of a cell undergoing cell division.

Fig 3

Illustration of the images of a cell that undergoes cell division and also assigned to the three different cell-cycle classes: interphase, mitosis (prophase, prometaphase, metaphase), and post-mitosis (early and late anaphase and telophase) classes [11]. Two sequences are available after the cell-splitting—each follows one daughter cell after mitosis.

Fig 4. Images from four different datasets.

Fig 4

Images belonging to four datasets of LiveCellMiner at different cell-cycle stages [11].

Zhong et al.’s dataset

The dataset provided by Zhong et al. [19] also uses time-lapse microscopy images of human tissue culture cells (HeLa cells). This dataset contains cells as they undergo cell division. Each frame in the dataset is labeled as interphase or one of the five mitosis stages. Cell images belonging to these six stages, with their state diagram is shown in Fig 5. The labeling by different biologists can be inconsistent despite the well-defined chromatin morphology. The biologist’s annotations were subjected to a dissimilarity study that showed minor inconsistencies between annotations made by the same person on different days, but significant differences between annotations made by different users. The gold standard for the labels is chosen by a majority vote among these user annotations. The dataset consists of seven image sequences with a total of 326 cell division events uniformly sampled with an interval of 4.6 minutes. Since mitosis stages occur in a shorter time compared to the interphase stage, the distribution of cells in different classes is highly uneven. In this dataset, each image has a resolution of 96 × 96 pixels, and each cell sequence has a length of 40 frames.

Fig 5. Cell mitosis stages in the form of a state diagram.

Fig 5

Illustration of the different stages of cell-cycle [19] as a state diagram. Cell images belonging to different classes are labeled with different colors.

Evaluation criteria

This section explains the different evaluation techniques, used to compare the models. We compared our results with some of the state-of-the-art methods as well as between our different models. Here we measured classification performance such as classification accuracy, confusion matrix, and some of the measures which are derived from the confusion matrix. For the evaluation of multi-class classification, three possible cases for predictions are available for each class. These are true positives (TP), type I error or false positives (FP), and type II error or false negatives (FN) [36]. TPs of a class are the correctly classified data, whereas FPs and FNs are the incorrectly classified data. The data that belongs to this class but is predicted as one of the other classes is FN. FPs are the data that are predicted into this class that belongs to another class.

Accuracy

Accuracy is defined as the sum of all TPs from all the classes divided by the total number of data points. Here in this paper, we calculate frame-to-frame accuracy from each sequence and average it over the total length of data. The accuracy that is discussed in this paper is the average accuracy generated by evaluating the ground-truth labels and the predictions from the test dataset. A higher value of accuracy means better performance. The value is in the range of 0 to 1.

Confusion matrix

The confusion matrix can visualize the three cases of prediction, that is TPs, FPs, and FNs. The TPs for each class lie on the diagonal of the confusion matrix [36]. The FPs and FNs for a class will be on the corresponding columns and rows respectively, such that the true class is on the y-axis and the predicted class is on the x-axis. In this paper, we present the normalized confusion matrix.

Precision and recall

The precision represents the proportion of correct classified images within a class to the total number of images classified in this class. I“t is the ratio of TP to the sum of TP and FP as shown in Eq 6. A higher value of precision means better performance. Recall again indicates the ratio of TP to the sum of TP and FN.

precision=TPTP+FP (6)
recall=TPTP+FN (7)

F-score

The F-score denotes the harmonic mean between precision and recall. The values of the F-score lie in the range of 0 to 1. A higher F-score means that the classification has a better performance. The F-score is defined as in Eq 8.

F-score=2.precision.recallprecision+recall (8)

Experiments

In line with our previous work [31] we conduct the experiments for each of the five datasets mentioned in Section. We compared the performance of the proposed RNN-based methods with ResNet18 classifier. The classification models treat each image independently. Hence, the number of images in the batch equals the batch size. Models with GRU layers always consider each image in a sequence dependent on the previous images from the same sequence. Each sequence is used together as input during training to produce this dependency. So, in this case, the batch size equals the number of sequences. The experiments are carried out with these different models to evaluate their performance. In experiments, we observed that a batch size of two or four sequences performs better than a batch size of one. In this paper, the typical batch chosen has 4 sequences for the models with GRU layers. This is because, due to the graphic processing unit’s memory limitations, some deeper models cannot employ batch sizes greater than 4. For the models using ResNet18 classifier, the batch size used is 64 images. For the training of the base model where a pre-trained model is not available, a learning rate of 0.01 proved to provide better training than lower or higher values. The typical learning rate used for models with a pre-trained model is 0.001. Inspired by [27], the learning rate is scheduled to reduce to 10 percent of the current value after 2500 training iterations. The weighting parameter value controls the weighting between multiple losses. We observed that the weighting parameter value of 0.01 has better accuracy with the base model and a value of 0.1 with the time encoded ResNet18 model by using the basic grid search [37] approach. Besides the aforementioned hyperparameters, data augmentation is introduced in all experiments to increase the robustness and avoid overfitting. Rotations by multiples of 90 degrees and flips in vertical and horizontal directions are the data augmentations used. We also split the datasets into separate training and testing datasets. The typical value of the train-test ratio used is 0.85, such that 85% of the total data is used to train the models, and the remaining 15% is used as test dataset. 10% out of the test images were used as the validation set. The experiments were done for 20 repetitions and the standard deviation is measured. Table 1 shows the typical values of the hyperparameters used in the experiments, which were determined by grid search-based parameter tuning. The cells will grow back to the interphase stage after they undergo mitosis. The expert annotation of Zhong et al.’s dataset considers this, while the LiveCellMiner dataset does not consider this aspect. In LiveCellMiner datasets, the cells moving back to the interphase stage are annotated as the part of the post-mitosis class. Due to the resemblance between these cells in the post-mitosis class and those in the interphase class, this will pose issues during training. Studies from the LiveCellMiner [11] showed that the recovery back to the interphase stage happens around 15 to 20 frames after mitosis. To overcome the above issue, during training, we labeled all the frames after around 15 to 20 frames from the post-mitosis stage as the interphase class. During inference, the predicted interphase recovered class after the mitosis stage is assigned to the post-mitosis stage and evaluated. This also solves the problem of similarity between images for deep learning classification networks like ResNet18. Thus, the recovery back to interphase does not affect the prediction accuracies of these models.

Table 1. Hyperparamater values.

The hyperparameter values chosen for experiments after initial grid-search based tuning.

Model
Hyperparameter Base model Time encoded Classifier
Batch Size 4 Sequences 4 Sequences 64 Images
Learning Rate 0.01 0.001 0.001
Learning Rate Scheduler 0.001 0.0001 0.0001
Loss Regularization 0.01 0.1 -
Train-Test Ratio 0.85 0.85 0.85

Experimental results

This section presents the results of the proposed approaches on different datasets. The code is available in the git link, https://github.com/Rijo756/cell-cycle-stages-identification. Accuracy, confusion matrix, precision, recall, and F-score are used to quantify the classification performance. The results are estimated on two models using time encoding with GRU layers and also on the ResNet18 classifier. The models are trained for 12 epochs with the LiveCellMiner dataset and 40 epochs with Zhong et al.’s dataset (both around 10, 000 batch iterations). We also plotted the label matrix which helps in comparing the class predictions of proposed models to the user annotations. Another experiment to study how the learned feature space looks for different models is also plotted for different datasets by taking the first 3 principle components, to visualize how the feature space looks in a lower dimensional space.

Results on LiveCellMiner dataset

In this section, the results are presented in four parts. Each part belongs to results from one of the four datasets available from LiveCellMiner.

LSM710 dataset

The results of the proposed models trained with the LSM710 dataset from LiveCellMiner are presented in this subsection. Fig 6 visualizes the label matrix of the ground truth annotation and the labels generated by the proposed models on 50 test sequences. The y-axis denotes these sequences, and the x-axis is the length of each sequence. Each of the four proposed models produced excellent results on this dataset. The performance of each of the predicted labels can be computed using the accuracy of the predictions.

Fig 6. Label matrices.

Fig 6

Label matrices of user annotation and the predictions by proposed models for 50 sequences selected from the test data of the LSM710 dataset. The y-axis represents different cell trajectories, and the x-axis represents the length of each trajectory. Green, magenta, and red represent the interphase, mitosis, and post-mitosis classes respectively.

Table 2 demonstrates the frame-to-frame accuracy of each of the proposed models. With the LSM710 dataset, the classification using the ResNet18 model got the least accuracy, and the time encoded ResNet18 model has the maximum performance in terms of accuracy. Both the proposed models have higher accuracy than the ResNet18 architecture. This indicates that introducing features from the previous time step using GRU layers seems to help increase the performance.

Table 2. The frame-to-frame accuracy values.

The frame-to-frame accuracy value of various proposed models using the LSM710 dataset. Boldface indicates the best performance.

Model Accuracy
LiveCellMiner 99.39
Base Model 99.529±0.094
Time Encoded ResNet18 99.565± 0.040
ResNet18 99.347±0.039

The LSM710 datasets are labeled as interphase, mitosis, and post-mitosis stages. The confusion matrix represents the correct predictions as well as the wrong predictions. Fig 7 visualizes the normalized confusion matrices for the proposed models. It is evident from the confusion matrices that all the models were able to classify the images into correct stages in most of the cases as the values of the diagonal elements are higher. The precision, recall, and F-score of the three classes of various models are presented in Table 3. The F-score, precision, and recall are highest for the interphase and post-mitosis stages for the time encoded ResNet18 model. For the mitosis stage, recall and F-score for LiveCellMiner have slightly better performance but the time encoded ResNet18 also has comparable performance here as well.

Fig 7. Confusion matrix plot.

Fig 7

Normalized confusion matrices of the prediction with the LSM710 test dataset.

Table 3. Average precision, recall, and F-score for the LSM710 dataset.

Average precision, recall, and F-score for each stage of mitosis for the LSM710 dataset. Boldface indicates the best performance.

Model Precision
Interphase Mitosis Post-mitosis
LiveCellMiner 98.569 98.374 99.745
Base Model 99.129±0.207 98.302±0.318 99.893±0.035
Time Encoded ResNet18 99.199±0.19 98.479±0.166 99.912±0.031
ResNet18 99.125±0.061 97.563±0.281 99.763±0.021
Recall
Interphase Mitosis Post-mitosis
LiveCellMiner 98.492 98.315 99.883
Base Model 99.46±0.108 98.097±0.42 99.83±0.038
Time Encoded ResNet18 99.769±0.119 98.172±0.268 99.864±0.032
ResNet18 99.182±0.063 97.451±0.114 99.767±0.062
F-score
Interphase Mitosis Post-mitosis
LiveCellMiner 98.531 98.344 99.814
Base Model 99.294±0.141 98.2±0.351 99.862±0.03
Time Encoded ResNet18 99.452±0.088 98.325±0.167 99.898±0.011
ResNet18 99.153±0.044 97.507±0.137 99.765±0.028

Fig 8 visualizes the reconstructed output image from the tracking network. The base model and the time encoded ResNet18 model has a tracking network to reconstruct the center-cell from the input image. The ResNet18 model is a classification network and only outputs the stage each input image belongs to. Both models using a tracking network were able to reconstruct the center-cell. The reconstructed cell from the base model seems to be better compared to the time encoded ResNet18 model. This could be because the embedding space of the base model is at a higher spatial dimension compared to the other model. This tracking network can give the segmentation of the center-cell, with a simple intensity thresholding operation as shown in Fig 8. The embedding space or the feature space gives characteristics related to some properties of the input data. From this feature space in a lower dimension, it can be understood approximately how the classes are divided. Fig 9 illustrates the first three principal components of the feature space of our proposed models. It can be seen that time encoded ResNet18 modules have a proper separation between the features belonging to each class. The higher performance scores of this model could arise because the features belonging to each class are well separated compared to other models. The features of the baseline model also look well separated. During the postmitotic stage, the daughter cells will grow back to the interphase stage. Since the ResNet18 operates on individual images instead of sequences, less separation between interphase and post-mitosis features is visible. This does not occur in models with GRU layers because the feature space includes the time information which is the main advantage of the proposed model. Hence as seen in Fig 9, time encoded ResNet18 model got better separation of feature embeddings of the classes.

Fig 8. Tracking network output.

Fig 8

Illustration of the output of the tracking network of the proposed models on the LSM710 dataset. The tracking network reconstructs the center-cell given an input image. Additionally shown the ground-truth mask and the segmentation using intensity thresholding of each output.

Fig 9. PCA plot for the images from the LSD1 dataset.

Fig 9

Illustration of the first three principal components of the embeddings of the proposed models. a) Base Model b) Time Encoded ResNet18 c) ResNet18.

LSD1 dataset

The results of the proposed models trained with the LSD1 dataset from LiveCellMiner are presented in this subsection. Fig 10 visualizes the label matrix of the ground truth annotation and the labels generated by the proposed models on 50 test sequences from the LSD1 dataset. Each sequence has a length of 90 frames.

Fig 10. Label matrix.

Fig 10

a) Label matrices of user annotation and the predictions by proposed models for 50 sequences selected from the test data of the LSD1 dataset. The y-axis represents different cell trajectories, and the x-axis represents the length of each trajectory. Green, magenta, and red represent the interphase, mitosis, and post-mitosis classes respectively.

The predicted label matrix of the proposed models compared to the user annotation labels looks similar. This means that each of the four proposed models produced excellent results on this dataset. Further detailed evaluation of the performance is achieved by computing the frame-to-frame accuracy of the predictions on the test dataset. In this dataset, the classification using the ResNet18 model got the least accuracy, and the time encoded ResNet18 model has the maximum performance in terms of accuracy. Thus it seems that introducing features from the previous time step using GRU layers helps in increasing the performance. The precision, recall, and F-score of the three classes of various models are presented in Table 4. Table 5 demonstrates the frame-to-frame accuracy of the proposed models. Except for the precision values in the mitosis and post-mitosis classes, the proposed time encoded ResNet18 model has the highest scores in all other cases. Thus it seems that introducing GRU layers between residual blocks ResNet18 architecture helps in increasing the performance of the model.

Table 4. Average precision, recall, and F-score for the LSD1 dataset.

Average precision, recall, and F-score for each stage of mitosis for the LSD1 dataset. Boldface indicates the best performance.

Model Precision
Interphase Mitosis Post-mitosis
Interphase Mitosis Post-mitosis
LiveCellMiner 97.837 98.549 99.989
Base Model 98.857±0.192 98.353±0.131 99.896±0.153
Time Encoded ResNet18 98.983±0.129 98.436±0.187 99.957±0.047
ResNet18 96.05±0.363 98.256±0.317 99.622±0.069
Recall Precision
Interphase Mitosis Post-mitosis
LiveCellMiner 99.539 97.792 99.018
Base Model 99.224±0.406 97.639±0.463 99.802±0.017
Time Encoded ResNet18 99.654±0.132 97.964±0.252 99.823±0.035
ResNet18 99.517±0.04 92.569±0.697 99.404±0.058
F-score Precision
Interphase Mitosis Post-mitosis
LiveCellMiner 98.681 98.169 99.501
Base Model 99.04±0.265 97.994±0.288 99.899±0.079
Time Encoded ResNet18 99.314±0.072 98.199±0.097 99.902±0.024
ResNet18 97.849±0.182 95.327±0.441 99.513±0.048
Table 5. The frame-to-frame accuracy values.

The frame-to-frame accuracy value of various proposed models using the LSD1 dataset. Boldface indicates the best performance.

Model Accuracy
LiveCellMiner 98.98
Base Model 99.508±0.119
Time Encoded ResNet18 99.572±0.030
ResNet18 98.677±0.103

The confusion matrix represents the correct predictions as well as the wrong predictions. Fig 11 visualizes the normalized confusion matrices for the proposed models. The diagonal values of the confusion matrix represent the true positive predictions for each class. With the LSD1 dataset, the diagonal elements of the confusion matrix have higher values than the off-diagonal values. This implies that the proposed models were able to classify the images into the correct classes. The PCA plot in Fig 12, indicates better clustering of feature embedding, for the time encoded ResNet18 model compared to the classifier model.

Fig 11. Confusion matrix plot.

Fig 11

Normalized confusion matrices of the prediction with the LSD1 test dataset.

Fig 12. Illustration of the first three principal components of the embeddings of the proposed models.

Fig 12

a) Base Model b) Time Encoded ResNet18 c) ResNet18.

RecQL4 dataset

The results of the proposed models trained with the RecQL4 dataset from LiveCellMiner are demonstrated in this subsection. Fig 13 visualizes the label matrix of the ground-truth annotation and the labels generated by the proposed models on test sequences. The label matrix generated by the ResNet18 model contains some misclassifications in the post-mitosis class and also in the mitosis class for some sequences which is eliminated by the proposed RNN-based model. The same behaviour is observed for the other datasets as well which is shown in the respective subsections. The predicted label matrix of the proposed models compared to the user annotation show a very good correspondence. This indicates that in this dataset, both the proposed models produced good results. By calculating the frame-to-frame accuracy of the predictions on the test dataset, a further assessment of the performance is made possible.

Fig 13. Label matrix.

Fig 13

Label matrices of user annotation and the predictions by proposed models for 50 sequences selected from the test data of the NikonXLight dataset. The y-axis represents different cell trajectories, and the x-axis represents the length of each trajectory. Green, magenta and red represent the interphase, mitosis, and post-mitosis classes respectively.

Table 6 demonstrates the frame-to-frame accuracy of each of the proposed models. With the RecQL4 dataset, the base model, and the time encoded ResNet18 model have similar accuracies with a slightly higher value for the time encoded ResNet18 model. The ResNet18 model has the least frame-to-frame accuracy clearly showing the advantage of the proposed model. The precision, recall, and F-score of the three classes of various models for the RecQL4 dataset are presented in Table 7. These values help in the analysis of each model’s performance in predicting each class.

Table 6. The frame-to-frame accuracy values.

The frame-to-frame accuracy values of various proposed models using the RecQL4 dataset. Boldface indicates the best performance.

Model Accuracy
LiveCellMiner 98.93
Base Model 99.322±0.055
Time Encoded ResNet18 99.345±0.038
ResNet18 98.850±0.114
Table 7. Average precision, recall, and F-score values.

Average precision, recall, and F-score for each stage of mitosis when trained using the RecQL4 dataset on various proposed models. Boldface indicates the best performance.

Model Precision
Interphase Mitosis Post-mitosis
LiveCellMiner 98.314 98.224 99.946
Base Model 98.465±0.239 98.532±0.197 99.795±0.1
Time Encoded ResNet18 98.856±0.317 98.633±0.364 99.865±0.119
ResNet18 97.101±0.201 97.969±0.105 99.54±0.12
Recall
Interphase Mitosis Post-mitosis
LiveCellMiner 99.237 98.245 98.987
Base Model 99.346±0.149 97.091±0.475 99.857±0.032
Time Encoded ResNet18 99.422±0.049 97.234±0.19 99.867±0.299
ResNet18 99.197±0.07 94.468±0.665 99.698±0.057
F-score
Interphase Mitosis Post-mitosis
LiveCellMiner 98.773 98.235 99.464
Base Model 98.903±0.151 97.806±0.313 99.826±0.06
Time Encoded ResNet18 99.124±0.147 97.957±0.121 99.879±0.049
ResNet18 98.138±0.103 96.185±0.372 99.619±0.069

It is evident from Table 7 that the time encoded ResNet18 model gave the highest performance scores for most metrics among all models. For some cases, the performance is slightly better for the LiveCellMiner approach. However, the overall performance scores are highest for the time encoded ResNet18 model for most of the cases. The RecQL4 dataset is also annotated into three stages of cell-cycle. The confusion matrix portrays the correct predictions as well as the incorrect predictions. Fig 14 visualizes the normalized confusion matrices for the proposed models. The diagonal values of the confusion matrix represent the true positive predictions for each of the three classes. With the RecQL4 dataset, the diagonal elements of the confusion matrix have higher values than the off-diagonal values. This implies that the proposed models mostly were able to classify the images into correct classes. The PCA plot (Fig 15) has similar behaviour as the other two datasets, with embeddings corresponding to time encoded ResNet18 model showing time continuity as well as clear clustering in the embedding space.

Fig 14. Normalised confusion matrix.

Fig 14

Normalized confusion matrices of the prediction with the RecQL4 test dataset with proposed models.

Fig 15. PCA plot.

Fig 15

Illustration of the first three principal components of the embeddings of the proposed models. a) Base Model b) Time Encoded ResNet18 c) ResNet18.

NikonXLight dataset

The results of the proposed models trained with the NikonXLight dataset from LiveCellMiner are discussed below. Fig 16 visualizes the label matrix of the ground-truth annotation and the labels generated by the proposed models on test sequences. The predicted label matrix of the proposed models compared to the user annotation labels looks comparable. This indicates that with the NikonXLight dataset, each of the three proposed models produced predictions close to the user annotation. By computing the frame-to-frame accuracy of the predictions on the test dataset, a better assessment of the performance is made possible.

Fig 16. Label matrix.

Fig 16

Label matrices of user annotation and the predictions by proposed models for 51 sequences selected from the test data of the Zhong et al.’s dataset. The y-axis represents different cell trajectories, and the x-axis represents the length of each trajectory. Green, yellow, orange, violet, blue, and red represent interphase, prophase, prometaphase, metaphase, anaphase, and telophase classes respectively.

Table 8 demonstrates the frame-to-frame accuracy of the proposed models. For the NikonXLight dataset, the time encoded ResNet18 model has slightly lower frame-to-frame accuracy compared to the LiveCellMiner approach. The ResNet18 model has the least frame-to-frame accuracy. This indicates again that introducing the GRU layers to propagate features with time information between frames in a sequence helps in increasing the frame-to-frame accuracy. The precision, recall, and F-score of the three classes of various models for the NikonXLight dataset are presented in Table 9. These values help in the analysis of each model’s performance in predicting each class. It is evident from Table 9 that the time encoded ResNet18 model gave the highest F-scores except for mitosis class, in which LiveCellMiner approach outperforms. The precision and recall values are also highest for the time encoded ResNet18 model in most of the cases, with an exception of precision value in mitosis class and recall value in post-mitosis class in which LiveCellMiner approach slightly outperforms other models. Fig 17 visualizes the normalized confusion matrices for the proposed models. The diagonal values of the confusion matrix represent the true positive predictions for each of the three classes. With the NikonXLight dataset, the diagonal elements of the confusion matrix have higher values than the off-diagonal values. This implies that the proposed models mostly were able to classify the images into correct classes. The feature space embeddings shown in Fig 18 shows formation of very compact clusters of different classes for time encoded ResNet18 model, clearly confirming the advantage of the proposed model. We have also measured the classification accuracy of the proposed model in four datasets for two train test ratios, 0.85 and 0.5. With lower number of training data the performance only drops slightly indicating the advantage of the proposed model. Thus, even with less number of annotated data, the network will be able to given reasonably good performance. The results are given in Table 10.

Table 8. The frame-to-frame accuracy values.

The frame-to-frame accuracy value of various proposed models using the NikonXLight dataset. Boldface indicates the best performance.

Model Accuracy
LiveCellMiner 99.37
Base Model 99.269±0.140
Time Encoded ResNet18 99.292±0.058
ResNet18 98.436±0.127
Table 9. Average precision, recall, and F-score values.

Average precision, recall, and F-score for each stage of mitosis when trained using the NikonXLight dataset on various proposed models. Boldface indicates the best performance.

Model Precision
Interphase Mitosis Post-mitosis
LiveCellMiner 97.229 99.190 99.009
Base Model 98.943±0.124 97.977±0.377 99.624±0.162
Time Encoded ResNet18 98.952±0.179 98.386±0.412 99.671±0.071
ResNet18 96.694±0.377 97.652±0.22 99.138±0.066
Recall Precision
Interphase Mitosis Post-mitosis
LiveCellMiner 99.231 96.172 99.984
Base Model 99.081±0.131 96.666±0.763 99.846±0.08
Time Encoded ResNet18 99.526±0.093 96.723±0.436 99.854±0.083
ResNet18 99.327±0.084 92.269±0.452 99.389±0.153
F-score Precision
Interphase Mitosis Post-mitosis
LiveCellMiner 98.219 97.657 99.494
Base Model 99.012±0.118 97.317±0.504 99.735±0.113
Time Encoded ResNet18 99.233±0.11 97.492±0.204 99.797±0.026
ResNet18 97.992±0.198 94.884±0.312 99.264±0.099
Fig 17. Normalised confusion matrix.

Fig 17

Normalized confusion matrices of the prediction with the NikonXLight test dataset.

Fig 18. PCA plot.

Fig 18

Illustration of the first three principal components of the embeddings of the proposed models. a) Base Model b) Time Encoded ResNet18 c) ResNet18.

Table 10. Classification accuracy for different train test ratios.

Classification accuracy in 4 datasets, for two different train test ratios.

Dataset 0.85 0.5
LSM710 99.39 96.65
LSD1 98.98 96.82
RecQL4 98.93 96.72
NikonXLight 99.37 95.85

Results on Zhong et al.’s dataset

This section compares the performance of models with the Zhong et al.’s dataset. As seen in results from the LiveCellMiner dataset, the base model, the time encoded ResNet18 model, and the ResNet18 models are evaluated. Fig 19 visualizes the label matrix of the ground-truth annotation and the labels generated by the proposed models on test sequences. The y-axis in this plot represents the 51 test sequences, and the x-axis is the length of each cell sequence. Each cell sequence in this dataset has 40 frames. The predicted label matrix of the proposed models compared to the user annotation labels looks comparable. This indicates that all the models were able to predict closer to ground-truth annotation.

Fig 19. Label matrix.

Fig 19

Label matrices of user annotation and the predictions by proposed models for 51 sequences selected from the test data of the Zhong et al.’s dataset. The y-axis represents different cell trajectories, and the x-axis represents the length of each trajectory. Green, yellow, orange, violet, blue, and red represent interphase, prophase, prometaphase, metaphase, anaphase, and telophase classes respectively.

Table 11 shows the precision, recall, and F-score of the proposed models on each class of the Zhong et al.’s dataset. Zhong et al.’s dataset is labeled into six stages of cell-cycle. In Table 11, the first row shows the best model proposed by Zhong et al. [19]. This state-of-the-art method is not a deep learning approach. It is a combination of feature extraction and clustering algorithms. The results of this method compared with results from our proposed models that use deep learning techniques are shown in the table. Our models have slightly lower or higher performance compared to the method proposed by Zhong et al. Within our approaches, the maximum values for each class are distributed across various models. In general, it can be seen that, our proposed models with the GRU layers got higher scores compared to the ResNet18 model. A more precise evaluation can be done by analyzing the frame-to-frame accuracy of each model. Table 12 shows the frame-to-frame accuracy of the proposed models. From this table, it is evident that the time encoded ResNet18 model has the highest frame-to-frame accuracy. So it is clear that our proposed model with GRU layers is able to propagate features between different time instances and increase the performance of the model. The base model which is a very shallow network also has an accuracy higher than the ResNet18 model. The embedding space or the feature space gives characteristics related to some properties of the input data. The first three principal components from principal component analysis (PCA) [20] are used to visualize the embedding space of the proposed models. This is illustrated in Fig 20. The time encoded ResNet18 has the most separate features for each class. This explains the advantage of the proposed model compared to other approaches. The confusion matrix displays the correct predictions as well as the incorrect predictions. Fig 21 visualizes the normalized confusion matrices for the proposed models. The diagonal values of the confusion matrix represent the true-positive predictions for each of the three classes. For all the proposed models, the diagonal values are higher. This implies that the proposed models are able to predict most of the images into their correct classes. The prophase class has the lowest true-positive rate, which can be attributed to the prophase and interphase cell images having high visual similarities. The true-positive values are higher for the other classes. This denotes that our proposed models were able to give good classification results. We observe that there is a higher rate of misclassification between adjacent classes for the proposed model and the misclassfication value is quite low. This phenomenon comes as an additional advantage stemming from the inclusion of time information. In contrast, for the ResNet18 model without the incorporation of time information, misclassifications are evenly distributed across all classes, and the inclusion of time information mitigates this issue.

Table 11. Average precision, recall, and F-score.

Average precision, recall, and F-score for each stage of mitosis when trained using the Zhong et al.’s dataset on various proposed models compared to the TC3 model proposed by Zhong et al. [19]. Boldface indicates the best performance.

Model Precision
Inter Pro Prometa Meta Ana Telo
TC3 95.97±0.83 83.53±2.07 91.47±2.45 96.82±0.92 80.57±7.67 84.57±5.28
Base Model 95.53±0.525 88.838±3.035 91.004±2.342 89.494±1.424 91.05±1.057 86.763±1.358
Time Encoded 96.261±0.494 85.811±1.566 86.729±1.62 91.32±1.054 90.514±1.531 84.507±1.952
ResNet18 93.905±0.234 83.863±1.211 85.751±2.479 90.85±0.87 86.199±1.965 81.959±1.667
Recall
Inter Pro Prometa Meta Ana Telo
TC3 99.51±0.32 82.75±4.13 84.43±2.96 88.24±3.63 80.22±6.24 79.50±5.09
Base Model 98.123±0.383 71.053±3.419 85.613±4.591 96.731±1.036 79.757±1.496 78.807±2.295
Time Encoded 96.98±0.372 81.711±2.387 89.34±2.313 94.538±0.784 85.664±1.89 79.727±2.152
ResNet18 92.273±0.224 80.263±2.372 79.835±1.823 91.538±0.999 81.305±1.359 70.568±0.745
F-score
Inter Pro Prometa Meta Ana Telo
TC3 97.69±0.36 82.84±2.62 87.64±2.35 92.05±2.00 80.03±6.79 81.51±4.70
Base Model 96.808±0.27 78.877±2.311 88.163±2.98 92.959±0.642 85.018±0.914 82.571±1.409
Time Encoded 96.622±0.23 83.681±1.338 87.95±1.646 92.897±0.659 87.995±0.818 82.015±1.263
ResNet18 95.559±0.133 82.003±1.425 82.668±1.737 91.187±0.578 83.67±1.357 75.829±0.861

Table 12. The frame-to-frame accuracy values.

The frame-to-frame accuracy value of various proposed models using the Zhong et al.’s dataset. Boldface indicates the best performance.

Model Accuracy
TC3 94.1
Base Model 93.142±0.117
Time Encoded ResNet18 93.315±0.252
ResNet18 91.237±0.273

Fig 20. PCA plot.

Fig 20

Illustration of the first three principal components of the embeddings of the proposed models a) Base Model b) Time Encoded ResNet18 c) ResNet18.

Fig 21. Normalized confusion matrix.

Fig 21

Normalized confusion matrices of the prediction for the six class dataset.

Tracking network results

We have also evaluated the reconstruction results from the network for all the datasets. Figs 22 and 23 shows the reconstruction of center-cell tracked by the proposed models at different stages of cell-splitting for the four LiveCellMiner datasets. The background information from other cells are suppressed in the tracker output. This ensures that the features for classification will not contain any background information. This is because, we train the model with a loss function which calculates the loss between the predicted output and a masked image. Fig 24 shows the output of the tracking network from the proposed models for the Zhong et al.’s dataset for six different mitotic phases. Here we could visualize that the neural tracking module of the proposed model is able to track the center-cell, avoid the background clutter from other cells, and extract the features of the cell undergoing mitosis. Training together with the tracker module boosts the performance of the classification model as the feature extraction for the classification happens from the neural features of the tracked cell and background clutter from other cells is completely suppressed, which indeed is the main advantage of our proposed model.

Fig 22. Reconstruction result of the center-cell.

Fig 22

Center-cell tracking for NikonXLight and RecQL4 datasets.

Fig 23. Reconstruction result of the center-cell.

Fig 23

Center-cell tracking for LSM710 and LSD1 datasets.

Fig 24. Tracking network output.

Fig 24

Illustration of the output of the tracking network of the proposed models for input images belonging to different classes. It can be seen that both models were able to reconstruct approximately the center-cell from the input image belonging to various classes. The base model performs the center-cell reconstruction more effectively than other model since the feature space is at a higher dimension.

Conclusions

In this work, we attempt to study the mitosis process in which the parent cell divides into two identical daughter cells. We also, track the cell division process automatically using deep learning methods. For training the neural network models, we have extracted, video sequences of the cells that undergo cell division and then trained an RNN model, to extract the feature vectors. The use of this RNN has proven to help in better extracting the feature vectors, compared to the conventional classifier models. We have carried out experiments to visualize the feature space to better understand why RNNs are better at identifying the mitosis states. This, in turn, showed that the feature space has a time continuity in the high-dimensional space and clustering is happening as well when trained using RNN model. In addition, we have measured the precision, recall, and F-score as well. This indicates the superiority of the proposed method compared to the ResNet18 baseline. By plotting the confusion matrix, we quantified the amount of misclassification in adjacent classes for both the 3-class and 6-class datasets. Furthermore, the reconstruction results were evaluated to understand how the neural network reconstructs the center-cell during training. Since the center cell is only reconstructed, the network is easily able to extract the features from the cell undergoing cell division, and suppress the clutter from the background cells. This is because during training, the loss is calculated between the predicted output and a masked image. A notable drawback of the proposed method is its inconsistency in outperforming the state-of-the-art across all phases of mitosis. In experiments with the 6-class dataset, the proposed method achieves a higher F-score only for the prophase and anaphase. However, for the other phases, the numbers are comparable to the state-of-the-art model. Similarly, in the case of the 3-class dataset, the F-score for the mitotic phase is lower. A possible reason for this might be the lower number of frames in this phase for training. As future work, we plan to investigate frame interpolation techniques to increase the number of data frames in this class to mitigate this issue.

Data Availability

The data underlying the results presented in this study are in location: https://osf.io/8qdgm/files/osfstorage For information about data please refer to: (https://osf.io/b6gy5/wiki/home/).

Funding Statement

This work is supported by internal funds.

References

  • 1. Noatynska A, Gotta M, Meraldi P. Mitotic spindle (DIS) orientation and Disease: cause or consequence? Journal of Cell Biology. 2012;199(7):1025–35. doi: 10.1083/jcb.201209015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Potapova T, Gorbsky GJ. The Consequences of chromosome segregation errors in mitosis and meiosis. Biology (Basel). 2017;6(1). doi: 10.3390/biology6010012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Tijhuis AE, Johnson SC, McClelland SE. The emerging links between chromosomal instability (CIN), metastasis, inflammation and tumour immunity. Mol Cytogenet. 2019;12(1):17. doi: 10.1186/s13039-019-0429-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Simonetti G, Bruno S, Padella A, Tenti E, Martinelli G. Aneuploidy: Cancer strength or vulnerability? Int J Cancer. 2019;144(1):8–25. doi: 10.1002/ijc.31718 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Paweletz N. Walther Flemming: pioneer of mitosis research. Nat Rev Mol Cell Biol. 2001;2(1):72–5. doi: 10.1038/35048077 [DOI] [PubMed] [Google Scholar]
  • 6. er EG. Nuclear Morphology and the Biology of Cancer Cells. Acta Cytol. 2020;64(6):511–519. doi: 10.1159/000508780 [DOI] [PubMed] [Google Scholar]
  • 7. Katayama A, Toss MS, Parkin M, Sano T, Oyama T, Quinn CM, et al. Nuclear morphology in breast lesions: refining its assessment to improve diagnostic concordance. Histopathology. 2021. [DOI] [PubMed] [Google Scholar]
  • 8. Way GP, Kost-Alimova M, Shibue T, Harrington WF, Gill S, Piccioni F, et al. Predicting cell health phenotypes using image-based morphology profiling. Mol Biol Cell. 2021;32(9):995–1005. doi: 10.1091/mbc.E20-12-0784 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Neumann B, Walter T, Heriche JK, Bulkescher J, Erfle H, Conrad C, et al. Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes. Nature. 2010;464(7289):721–7. doi: 10.1038/nature08869 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Held M, Schmitz MH, Fischer B, Walter T, Neumann B, Olma MH, et al. CellCognition: time-resolved phenotype annotation in high-throughput live cell imaging. Nat Methods. 2010;7(9):747–54. doi: 10.1038/nmeth.1486 [DOI] [PubMed] [Google Scholar]
  • 11. Moreno-Andrés D, Bhattacharyya A, Scheufen A, Stegmaier J. LiveCellMiner: A new tool to analyze mitotic progression. PloS one. 2022;17(7):e0270923 doi: 10.1371/journal.pone.0270923 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Wang W, Douglas D, Zhang J, Kumari S, Enuameh MS, Dai Y, et al. Live-cell imaging and analysis reveal cell phenotypic transition dynamics inherently missing in snapshot data. Sci Adv. 2020;6(36). doi: 10.1126/sciadv.aba9319 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Cheng X, Dai C, Wen Y, Wang X, Bo X, He S, et al. NeRD: a multichannel neural network to predict cellular response of drugs by integrating multidimensional data. BMC medicine. 2022;20(1):1–16. doi: 10.1186/s12916-022-02549-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Xia F, Allen J, Balaprakash P, Brettin T, Garcia-Cardona C, Clyde A, et al. A cross-study analysis of drug response prediction in cancer cell lines. Briefings in Bioinformatics. 2021;23(1):bbab356. doi: 10.1093/bib/bbab356 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Neumann B, Held M, Liebel U, Erfle H, Rogers P, Pepperkok R, et al. High-throughput RNAi screening by time-lapse imaging of live human cells. Nat Methods. 2006;3(5):385–390. doi: 10.1038/nmeth876 [DOI] [PubMed] [Google Scholar]
  • 16. Wählby C, Sintorn IM, Erlandsson F, Borgefors G, Bengtsson E. Combining intensity, edge and shape information for 2D and 3D segmentation of cell nuclei in tissue sections. J Microsc. 2004;215(1):67–76. doi: 10.1111/j.0022-2720.2004.01338.x [DOI] [PubMed] [Google Scholar]
  • 17. Durbin R, Eddy SR, Krogh A, Mitchison G. Biological sequence analysis: probabilistic models of proteins and nucleic acids. Camb Univ Press. 1998. [Google Scholar]
  • 18. Forney GD. The viterbi algorithm. Proceedings of the IEEE. 1973;61(3):268–278. doi: 10.1109/PROC.1973.9030 [DOI] [Google Scholar]
  • 19. Zhong Q, Busetto AG, Fededa JP, Buhmann JM, Gerlich DW. Unsupervised modeling of cell morphology dynamics for time-lapse microscopy. Nat Methods. 2012;9(7):711–713. doi: 10.1038/nmeth.2046 [DOI] [PubMed] [Google Scholar]
  • 20. Wold S, Esbensen K, Geladi P. Principal component analysis. Chemometr Intell Lab Syst. 1987;2(1-3):37–52. doi: 10.1016/0169-7439(87)80084-9 [DOI] [Google Scholar]
  • 21.Mikut R, Bartschat A, Doneit W, Ordiano JÁG, Schott B, Stegmaier J, et al. The MATLAB toolbox SciXMiner: User’s manual and programmer’s guide. arXiv preprint arXiv:1704.03298. 2017.
  • 22. Otsu N. A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern Syst. 1979;9(1):62–66. doi: 10.1109/TSMC.1979.4310076 [DOI] [Google Scholar]
  • 23. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–1780. doi: 10.1162/neco.1997.9.8.1735 [DOI] [PubMed] [Google Scholar]
  • 24. Medsker LR, Jain LC. Recurrent neural networks. Design and Applications. 2001;5:64–67. [Google Scholar]
  • 25.Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. 2014.
  • 26. Ondruska P, Posner I. Deep tracking: Seeing beyond seeing using recurrent neural networks. Thirtieth AAAI con on art intell. 2016. [Google Scholar]
  • 27.Ondruska P, Dequaire J, Wang DZ, Posner I. End-to-end tracking and semantic segmentation using recurrent neural networks. arXiv preprint arXiv:1604.05091. 2016.
  • 28.He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 770–778.
  • 29. Ulman V, et al. An objective comparison of cell-tracking algorithms. Nat Methods. 2017;14(12):1141–1152. doi: 10.1038/nmeth.4473 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Payer C, Štern D, Neff T, Bischof H, Urschler M. Instance segmentation and tracking with cosine embeddings and recurrent hourglass networks. In: Int Conf on Med Image Compu and Comp Inter. 2018. p. 3–11.
  • 31.Jose A, Roy R, Eschweiler D, Laube I, Azad R, Moreno-Andrés D, et al. End-To-End Classification Of Cell-Cycle Stages With Center-Cell Focus Tracker Using Recurrent Neural Networks. bioRxiv. 2022. Cold Spring Harbor Laboratory.
  • 32. Ho Y, Wookey S. The real-world-weight cross-entropy loss function: Modeling the costs of mislabeling. IEEE Access. 2019;8:4806–4813. doi: 10.1109/ACCESS.2019.2962617 [DOI] [Google Scholar]
  • 33. Schooley A, Moreno-Andrés D, De Magistris P, Vollmer B, Antonin W. The lysine demethylase LSD1 is required for nuclear envelope formation at the end of mitosis. J of cell sci. 2015;128(18):3466–3477. [DOI] [PubMed] [Google Scholar]
  • 34. Yokoyama H, Moreno-Andrés D, Astrinidis SA, Hao Y, Weberruss M, Schellhaus AK, et al. Chromosome alignment maintenance requires the MAP RECQL4, mutated in the Rothmund–Thomson syndrome. Life sci all. 2019;2(1). doi: 10.26508/lsa.201800120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Moreno-Andrés D, Yokoyama H, Scheufen A, Holzer G, Lue H, Schellhaus AK, et al. VPS72/YL1-mediated H2A.Z deposition is required for nuclear reassembly after mitosis. Cells. 2020;9(7):1702. doi: 10.3390/cells9071702 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Tharwat A. Classification assessment methods. Appl Comput and Informat. 2020. [Google Scholar]
  • 37.Brownlee J. How to grid search hyperparameters for deep learning models in python with keras. Disponible en línea. 2016.

Decision Letter 0

Xiao Luo

13 Sep 2023

PONE-D-23-27382Automatic Detection of Cell-cycle Stages using Recurrent

Neural NetworksPLOS ONE

Dear Dr. Jose,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Oct 28 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Xiao Luo

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. 

When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

3. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

4. We noted in your submission details that a portion of your manuscript may have been presented or published elsewhere. 

"Yes, Fig. 1. was published in ICASSP 2023.This is a pre-work of this Journal paper. The Journal paper covers advanced model architecture and more experiments." 

Please clarify whether this [publication] was peer-reviewed and formally published. If this work was previously peer-reviewed and published, in the cover letter please provide the reason that this work does not constitute dual publication and should be included in the current manuscript.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: N/A

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors propose a novel network architecture, called Time Encoded ResNet18 Model, and conduct a series of experiments. Here, i have some questions that need to be answered.

1. Did the authors use any statistical methods at any points in the manuscript? Please updated and include the details in the respective statistical analysis section.

2. Whether the change of hyperparameters will have a drastic effect on the model's performance is hoped that the authors can prove it experimentally.

3.From the performance of label matrix, the superiority of the model proposed by authors seems not obvious, and it is hoped that this results can be further explained.

4. The baseline proposed by authors is relatively few, and it is hoped that some baseline other than RNN-based can be added.

Reviewer #2: Jose et al. proposed a novel deep-learning model named Time Encoded ResNet18 to distinguish different mitotic stage. They evaluated the model by comparing it with the base model and other algorithms and proved the overall superior performance. In real datasets, Time Encoded ResNet18 achieved high accuracy and precision. I have some concerns listed below.

Major:

1. The accuracies, precisions, recalls, and F-scores showed in Table 3, 5, 7, 9, and 11 did not prove the consistent superior performance of Time Encoded ResNet18. In Table 11, the model reached the top F-score in only 2/6 states, which showed that in some cases, the model could not even outperform the base model. Since the evaluation criteria were quite close among four methods (close to 1.000), I didn’t see the advantage of Time Encoded ResNet18.

2. Fig13 shows the predictions of each method and the true labels. But I can hardly tell which method was better. Authors may compare the key difference of each method and plot the figures focusing on the difference.

Minor:

3. Please define the term “sequence” clearly.

4. Please explain why models treat each image independently in line 423.

5. The typo in equation 8.

Reviewer #3: In the paper, the authors developed a deep-learning framework to predict mitosis stages from time-lapse microscopy images. They adopted an RNN to capture the connections between successive frames. With several in-depth experiments on two public datasets, they demonstrated that their methods performed better than the baseline methods. Besides, they also studied the features extracted by different models and explained why their RNN model had superior performance.

Strength:

1. Instead of viewing each frame as an independent sample, the authors use RNN to capture the time dependency between successive frames.

2. Features extracted by their method could clearly distinguish different mitosis stages.

3. The tracking network in their model can automatically track the center-cell and suppress information from other cells, which brings convenience for users.

Weakness:

1. In their literature review, the authors mentioned a state-of-the-art methods to identify cell-cycle stages, CellCognition and LiveCellMiner. However, they did not compare the performance of their model with CellCognition and only compared with LiverCellMiner on the "LiveCellMiner Dataset". In my point of view, other competing methods they adopted were more like an ablation study. Therefore, I suggest adding some experiments to compare the performance of their Time Encoded ResNet18 with CellCognition and LiveCellMiner on both the "LiveCellMiner Dataset" and the "Zhong et al.’s Dataset".

2. In their experiments, the training and testing were performed within the same dataset. However, in practice, after getting some new time-lapse microscopy images, it is usually time-consuming to annotate some of them to get a training set. Therefore, I suggest the authors to test the performance of their model when training on one dataset, say, the LSM710 dataset and predicting the cell-cycle stages of another dataset containing the same type of cells (maybe after fine-tuning on a small fraction of data), say, the LSD1 dataset (also capturing the human HeLa cells). Or, maybe they can test their model on one dataset they used but with a smaller train-test ratio, say, 0.5 or even 0.2. I think these settings will make their model more attractive and helpful for users.

3. It seems that the improvements in accuracy, recall and F-score were quite marginal in some experiments, for example, Table 2, Table 8 and Table 10. Also, I noticed that in several settings, their Time Encoded ResNet18 had a lower F-score than LiveCellMiner for the identification of cells belonging to the mitosis stage (Table 3, Table 7, Table 9). Therefore, I was wondering if the improvements in classification brought by their model can help obtain some biological findings. For example, the authors of LiveCellMiner [1] analyzed the NikonXLight dataset and found that downregulation of INO80, SRCAP, EP400 and H2A.Z consistently lengthens early mitotic progression. They also detailedly analyzed the LSD1 dataset and performed some experiments to support their findings. Hence, I suggest the authors carefully examine the results of at least one of the datasets in their experiments and explain or discuss whether any possible new biological findings can be found when applying their Time Encoded ResNet18 model.

4. In the "Conclusions" part, the authors did not discuss the limitations of their model, which are important for users to correctly use their model and avoid possible pitfalls.

Minor points:

1. The authors did not provide their codes for this model, making it difficult to try their methods.

2. In Table 8, the accuracy of LiveCellMiner was higher than the mean accuracy of the Time Encoded ResNet18 model. Therefore, LiveCellMiner should be in boldface rather than Time Encoded ResNet18.

3. In Table 10 (the analysis of Zhong et al.’s Dataset), I suggest adding the frame-to-frame accuracy of TC3, since it performed the best in predicting cells belonging to the interphase.

4. In the last line on page 3, the names of the authors of reference [26] were missing (after the words "in 2016").

References:

[1] Moreno-Andr´es D, Bhattacharyya A, Scheufen A, Stegmaier J. LiveCellMiner: A new tool to analyze mitotic progression. PloS one. 2022;17(7):e0270923.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Mar 11;19(3):e0297356. doi: 10.1371/journal.pone.0297356.r002

Author response to Decision Letter 0


17 Dec 2023

Response to the editor.

2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match.

When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

We have updated this section.

3. In your Data Availability statement :

Code is now available in the repo : https://github.com/Rijo756/cell-cycle-stages-identification. Dataset will be updated soon. We are working on it now.

4. We noted in your submission details that a portion of your manuscript may have been presented or published elsewhere.

"Yes, Fig. 1. was published in ICASSP 2023. This is a pre-work of this Journal paper. The Journal paper covers advanced model architecture and more experiments."

Please clarify whether this [publication] was peer-reviewed and formally published. We have mentioned this in the cover letter, that only the Fig. 1 is resued.

Response to the reviewers.

We thank the reviewers for their critical assessment of our approach. In the following, we address

the major concerns raised point by point, and the corresponding revisions.

Comments from Reviewer 1

Comment: 1. Did the authors use any statistical methods at any points in the manuscript? Please

update and include the details in the respective statistical analysis section.

Reply: Thanks for your comment. Kindly note that the reported numbers in the paper are the average

performance of 20 repetitions and we have included the standard deviations of each experiment as well

in the tables. Please check line number 447.

Comment: 2. Whether the change of hyperparameters will have a drastic effect on the model's

performance is hoped that the authors can prove it experimentally.

Reply: Thanks for the comment. The results reported have already undergone grid search-based

hyperparameter tuning. The optimum hyperparameters selected by the grid search are provided in

Table 1, and the standard deviation of 20 repetitions is already mentioned in the paper. Please check

line 447 and 448 in the manuscript.

Comment: 3. From the performance of label matrix, the superiority of the model proposed by

authors seems not obvious, and it is hoped that this results can be further explained.

Reply: It is true that visually \fnding the differences in the label matrix can be challenging. To address

this, we have increased the size of the label matrix plots to make the differences more clear. Please

refer to Fig. 6, Fig. 7, Fig. 10, Fig. 11, and Fig. 13 for visual representations. Additionally, we have

included quantitative metrics such as precision, recall, and F-score. To further illustrate the advantages

of incorporating time information, we have included the PCA plots. Furthermore, the accuracy, which

is a qualitative measure of the label matrix, is clearly demonstrated in the accuracy table, highlighting

the advantages of the proposed method. We have further added the following sentence in the results

section. "The label matrix generated by the ResNet18 model contains some misclassi\fcations in the

post-mitosis class and also in the mitosis class for some sequences which is eliminated by the proposed

RNN-based model. The same behaviour is observed for the other datasets as well which is shown in the

respective subsections." (Please refer to line numbers 552 to line 555 in the manuscript).

Comment: 4. The baseline proposed by authors is relatively few, and it is hoped that some

baseline other than RNN-based can be added.

Reply: Kindly note that, we have included the ResNet18 model which is not an RNN-based approach

for comparison. We have also compared our model with LiveCellMiner paper and the baseline model

we have developed. We are con\fdent that the proposed baseline re

ects the state-of-the-art.

Comments from Reviewer 2

1

Comment: 1. The accuracies, precisions, recalls, and F-scores showed in Table 3, 5, 7, 9, and

11 did not prove the consistent superior performance of Time Encoded ResNet18. In Table 11,

the model reached the top F-score in only 2/6 states, which showed that in some cases, the model

could not even outperform the base model. Since the evaluation criteria were quite close among

four methods (close to 1.000), I didn't see the advantage of Time Encoded ResNet18.

Reply: Kindly note that, even though the results for the Inter, Prometa, Meta, and Telo phases are

somewhat lower, they are quite close to those of the best-performing model. In the two states where

the results are better, we observe that the numbers are signi\fcantly higher compared to the second

best-performing model. Also please refer to the Tables 3, 7, and 9 in the results section. It is quite

clear that the results are only lower for mitosis state and for the other two states, the F-score is already

higher.

Comment: 2. Fig. 13 shows the predictions of each method and the true labels. But I can hardly

tell which method was better. Authors may compare the key difference of each method and plot

the \fgures focusing on the difference.

Reply: Thanks for pointing out this. It is true that label matrix plots were small. Now we have

increased the size of the plots and updated the label matrix plots and the latent space plots. Kindly

refer to the response to comment 3 of reviewer 1 in the rebuttal letter.

Minor comments:

Comment: 3. Please de\fne the term \\sequence" clearly.

Reply: Kindly note that a "Sequence" typically refers to a series of related or connected frames. We

have indicated that now in the introduction. Please refer to line 13 in the manuscript.

Comment: 4. Please explain why models treat each image independently in line 423.

Reply: The image is treated independently here as it is a classi\fer model and time information is not

used. Please refer to line 422 in the manuscript.

Comment: 5. The typo in equation 8.

Reply: We have now corrected the typo in equation 8. Please refer to line 418.

Comments from Reviewer 3

Comment: 1. In their literature review, the authors mentioned a state-of-the-art methods to

identify cell-cycle stages, CellCognition and LiveCellMiner. However, they did not compare the

performance of their model with CellCognition and only compared with LiverCellMiner on the

"LiveCellMiner Dataset". In my point of view, other competing methods they adopted were more

like an ablation study. Therefore, I suggest adding some experiments to compare the performance

of their Time Encoded ResNet18 with CellCognition and LiveCellMiner on both the "LiveCellMiner

Dataset" and the "Zhong et al.'s Dataset".

2

Reply: CellCognition is an annotation tool used for \fnding complex cellular dynamics. It uses a whole-

slide 2D+t image which contains multiple cells to detect objects and classify them into cell-cycle stages.

However, in our case, both the datasets are single cell datasets which contain only one cell. Since our

focus is on single cell image datasets, it is not reasonable to compare with CellCognition. Furthermore,

the Zhong et al.'s dataset even uses CellCognition for object detection and feature extraction from the

whole-slide images. Kindly note that, the comparison results of the performance of the Time Encoded

ResNet18 with LiveCellMiner and TC3 are already available.

Comment: 2. In their experiments, the training and testing were performed within the same

dataset. However, in practice, after getting some new time-lapse microscopy images, it is usually

time-consuming to annotate some of them to get a training set. Therefore, I suggest the authors

to test the performance of their model when training on one dataset, say, the LSM710 dataset and

predicting the cell-cycle stages of another dataset containing the same type of cells (maybe after

\fne-tuning on a small fraction of data), say, the LSD1 dataset (also capturing the human HeLa

cells). Or, maybe they can test their model on one dataset they used but with a smaller train-test

ratio, say, 0.5 or even 0.2. I think these settings will make their model more attractive and helpful

for users.

Reply: Thanks for the comment. We guess the performance will drop quite a bit if we change the

modality. The proposed experiment of checking what fraction of ground truth is needed to obtain good

results would, however, be quite interesting and also a hint for potential users how many images they

should annotate to obtain good results. We have done the experiment for train test ratio of 0.5 and

the results are now updated in Table 10. It shows that the numbers do not drop signi\fcantly indicating

that even with a smaller dataset, the model is able to give good performance.

Comment: 3. It seems that the improvements in accuracy, recall and F-score were quite marginal

in some experiments, for example, Table 2, Table 8 and Table 10. Also, I noticed that in several set-

tings, their Time Encoded ResNet18 had a lower F-score than LiveCellMiner for the identi\fcation

of cells belonging to the mitosis stage (Table 3, Table 7, Table 9). Therefore, I was wondering if

the improvements in classi\fcation brought by their model can help obtain some biological \fndings.

For example, the authors of LiveCellMiner [1] analyzed the NikonXLight dataset and found that

downregulation of INO80, SRCAP, EP400 and H2A.Z consistently lengthens early mitotic progres-

sion. They also detailedly analyzed the LSD1 dataset and performed some experiments to support

their \fndings. Hence, I suggest the authors carefully examine the results of at least one of the

datasets in their experiments and explain or discuss whether any possible new biological \fndings

can be found when applying their Time Encoded ResNet18 model.

Reply: Thank you for your comment. We were not able to make any new biological conclusions from

the experiments. However, in our experiments with the 6-class dataset, we observed that there is a

higher rate of misclassi\fcation between adjacent classes in the case of the time-encoded ResNet18 and

the misclassi\fcation value is quite low for the other classes. This phenomenon comes as an additional

advantage stemming from the inclusion of time information. In contrast, for the ResNet18 model

without the incorporation of time information, misclassi\fcations are distributed more evenly across all

classes. We have added this information now in the paper. Please refer to line numbers 654-659 in the

manuscript.

3

Comment: 4. In the "Conclusions" part, the authors did not discuss the limitations of their

model, which are important for users to correctly use their model and avoid possible pitfalls.

Reply: We have updated the 'Conclusions' section of the paper and included the limitations of the

paper and future work. Please refer to line numbers 690-702.

Minor points:

Comment: 5. The authors did not provide their codes for this model, making it difficult to try

their methods.

Reply: The github link (https://github.com/Rijo756/cell-cycle-stages-identi\fcation) is added now.

(See the footnote in page 13.).

Comment: 6. In Table 8, the accuracy of LiveCellMiner was higher than the mean accuracy of

the Time Encoded ResNet18 model. Therefore, LiveCellMiner should be in boldface rather than

Time Encoded ResNet18.

Reply: Thanks for noting this mistake. We have updated Table 8.

Comment: 7. In Table 10 (the analysis of Zhong et al.'s Dataset), I suggest adding the frame-to-

frame accuracy of TC3, since it performed the best in predicting cells belonging to the interphase.

Reply: We have referred the TC3 [19] paper for the accuracy numbers. The accuracy value is not

reported as a number, but the supplementary \fle contains a graph showing how the number of features

affect the accuracy and the highest value reported for the TC3 model in the paper is 94:1%. We have

added that now in the manuscript. Please refer to Table 11.

It is true that the accuracy is slightly lower for the proposed method. Also, please notice that for the

pro, prometa, meta, anaphase, and telo-phases the F-score is higher than the TC3 approach. These

phases occur much more rarely than the interphase and it is thus great that our method performs better

here.

Comment: 8. In the last line on page 3, the names of the authors of reference [26] were missing

(after the words "in 2016").

Reply: Thanks for noting this typo. We have updated it. Please check line 113.

Attachment

Submitted filename: Rebuttal_letter.pdf

pone.0297356.s001.pdf (77.1KB, pdf)

Decision Letter 1

Xiao Luo

4 Jan 2024

Automatic Detection of Cell-cycle Stages using Recurrent

Neural Networks

PONE-D-23-27382R1

Dear Dr. Abin Jose,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Xiao Luo

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

Reviewer #3: N/A

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: All my concerns were addressed. Nice work! No more comments.

All my concerns were addressed. Nice work! No more comments. (Twice for meeting the character count requirement)

Reviewer #3: I thank the authors for their detailed responses of all the reviewer comments. I have no further concerns and feel that it is now suitable for being published.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Reviewer #3: No

**********

Acceptance letter

Xiao Luo

15 Feb 2024

PONE-D-23-27382R1

PLOS ONE

Dear Dr. Jose,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Xiao Luo

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: Rebuttal_letter.pdf

    pone.0297356.s001.pdf (77.1KB, pdf)

    Data Availability Statement

    The data underlying the results presented in this study are in location: https://osf.io/8qdgm/files/osfstorage For information about data please refer to: (https://osf.io/b6gy5/wiki/home/).


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES