Multi-Object Portion Tracking in 4D Fluorescence Microscopy Imagery with Deep Feature Maps

Yang Jiao; Mo Weng; Mei Yang

doi:10.1109/cvprw.2019.00142

. Author manuscript; available in PMC: 2020 Jun 19.

Published in final edited form as: Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2020 Apr 9;2019:1087–1096. doi: 10.1109/cvprw.2019.00142

Multi-Object Portion Tracking in 4D Fluorescence Microscopy Imagery with Deep Feature Maps

Yang Jiao ¹, Mo Weng ², Mei Yang ¹

PMCID: PMC7304548 NIHMSID: NIHMS1043641 PMID: 32565667

Abstract

3D fluorescence microscopy of living organisms has increasingly become an essential and powerful tool in biomedical research and diagnosis. An exploding amount of imaging data has been collected, whereas efficient and effective computational tools to extract information from them are still lagging behind. This is largely due to the challenges in analyzing biological data. Interesting biological structures are not only small, but are often morphologically irregular and highly dynamic. Although tracking cells in live organisms has been studied for years, existing tracking methods for cells are not effective in tracking subcellular structures, such as protein complexes, which feature in continuous morphological changes including split and merge, in addition to fast migration and complex motion. In this paper, we first define the problem of multi-object portion tracking to model the protein object tracking process. A multi-object tracking method with portion matching is proposed based on 3D segmentation results. The proposed method distills deep feature maps from deep networks, then recognizes and matches objects’ portions using an extended search. Experimental results confirm that the proposed method achieves 2.96% higher on consistent tracking accuracy and 35.48% higher on event identification accuracy than the state-of-art methods.

1. Introduction

Fluorescence microscopy has undergone an evolution in the past decades. Using a variety of fluorescent indicators, specific targets such as proteins, lipids, or ions can be tailored and, therefore, visualized in live [1], [2]. Rather than relying on physical sections of chemically fixed tissues, technologies such as confocal and multi-photon microscopy enable the acquisition of optical sections of thick biological objects. By excluding out-of-focus light or specifically activating fluorophores in the focus plane, a 2D image can be obtained. This allows accurate reconstruction of the 3D structures of biological samples and continuous imaging of living cells or organisms. Time-lapse movies of 3D images result in 4D fluorescence microscopy data with temporal information as the additional dimension.

The current bottleneck in biomedical research and diagnosis is to effectively extract the information from the increasingly large and complex biological image dataset in quantitative ways. In doing so, there are two types of challenges. One type is the physical limitations in image acquisition, including the number of pixels per image and signal-to-noise ratio (SNR) [3]. Besides physical size and imaging speeds of the equipment, the number of pixels per image also depends on the temporal resolution in live imaging, which in turn is decided by the time scale of the observed biological activities. Biological samples often suffer from a low SNR. There is always a limitation to how many fluorophores an object of interest can be labeled with. Besides, all fluorophores are subjected to photobleaching [4], a phenomenon where the fluorophore stops emitting fluorescence after repeated exposure to lasers. Fluorescent proteins used in live imaging are especially sensitive to photobleaching. Frequent and long-term imaging lead to a severe reduction in fluorescent signals resulting in a low SNR.

The other type of challenge comes from the intrinsically dynamic behaviors of the biological objects of interest. Living objects can often change their morphologies rapidly, which is especially true for protein machinery inside cells. Furthermore, many biological objects display constant movement and even interact with other objects of the same type. One well-studied example is tracking cells in living organisms [5]. Different from macro objects, e.g. cars or bicycles, cells may present deformations such as elongation, expansion, and shrinkage [6], as well as demonstrate complex motion patterns in a short time period [7]. However, compared to subcellular structures, which are often the machinery that perform tasks for cells, cells largely maintain constant volumes and have a nucleus that is often large and easily trackable. Subcellular structures, such as protein clusters, grow or shrink much more rapidly. They can also interact with each other through split and merge.

Conventional tracking methods used for biomedical image analysis employ either nearest neighbor linking or a motion detector, like a Kalman filter (KF) [5]. These methods can be limited in tracking fast migration and complex motions including split and merge. In recent years, deep learning methods have been applied to biomedical image analysis [8]. Due to the lack of remarkable features, object recognition in biomedical images is rather difficult compared with the detection of macro objects. It has been discovered that instead of using object position or motion path, like nearest neighbor linking and KF, tracking by object recognition is more reliable [9]. However, the physical scale and more dynamic properties of functional protein clusters make conventional cell tracking methods ineffective in producing reliable results.

In this paper, to tackle the challenges faced by cell/protein tracking, we propose a multi-object portion tracking method with portion matching for 3D fluorescence microscopy images. Based on 3D segmentation results, the proposed deep feature map tracking method distills deep feature maps from deep learning networks, then recognizes and matches objects’ portions using an extended search. The proposed method conquers the challenges of rapid deformation, as well as object birth and death, and object split and merge. The method is evaluated by the 3D fluorescence images of E-Cadherin fused with Green Fluorescent Protein (GFP) from developing Drosophila (fruit fly) embryos and compared with four tracking methods.

The remainder of the paper is organized as follows: Section 2 reviews the related work in object segmentation and tracking. Section 3 defines the multi-object portion tracking problem and presents the deep feature map tracking method with portion matching. Section 4 presents the experimental results. Section five concludes the paper.

2. Related Work

A critical challenge in using fluorescence microscopy images is the presence of noise. Denoising filters, such as the Gaussian filter, average filter, etc., are widely employed to reduce such noises [10]–[12]. In the literature, diverse cell and nuclei segmentation methods have been studied, including active contour, watershed, and thresholding methods, as well as deep neural network methods. The active contour model, as known as snakes, delineates the outlines of an object. By means of energy minimization, active contour adapts to differences and noise, but requires a known approximation of the searching boundary [13], [14]. Via extending active contour to 3D active surface, cellular objects in sequences of microscopy images are identified [15]. The watershed method is advanced on separating touching objects [16]. By defining foreground and background locations, marker-controlled watershed segmentation acquires improved performance in nuclei segmentation [17]. Thresholding is another frequently applied segmentation method. In [18], Otsu thresholding [19] is modified to adapt with 2D microscopy image segmentation.

Recently, convolutional neural networks (CNN) [20] lead the frontier in object classification and semantic segmentation. Since 2015, several 2D semantic segmentation architectures have been proposed, including Fully Connected Networks (FCN) [21], SegNet [22], UNet [23], and Fully Connected Dense Networks (FCDN) that adopt ResNet [24]–[26], etc. Among the aforementioned networks, U-Net demonstrates a capacity for biomedical image segmentation using a combined skip-connection that directly connects down-sampling and up-sampling layers [8]. However, stacking 2D segmentation results into 3D volume may cause a problem of misalignment in the depth. Extending to 3D data, a 3D U-Net [27] is built to learn volume information. 3D labeling is able to be generated using Generative Adversarial Networks (GAN) and used in training for 3D segmentation in terms of the labeling challenge for annotators [28].

Tracking is a complex processing, following object detection and segmentation. The popular cell or particle tracking methods can be divided into three categories: Nearest neighbor [29]–[33], Kalman filter [34]–[36] data association [34], [35], and deep learning [7], [10]. The nearest neighbor method is the simplest approach, which links every segmented object to the nearest object in the next frame [37]. It has been applied to track label-free single cells in 3D matrices [29], as well as for particle and whole cell tracking [30]. However, as shown in [29], the nearest neighbor method fails when a cell migrates fast. A tumor cell tracking platform is developed in [31] to recreate a tightly interconnected system of cancer and immune cells with 3D environmental properties. As a method proposed for multiple object tracking (MOT) [32], [33], intersection-over-union that overlaps objects in two frames, is applied to a cell tracking benchmark [38] to perform multi-cell tracking in 2D and 3D space. A Kalman filter (KF) [39] tracks an object by projecting the current state forward (in time) and estimating error covariance to obtain the a priori estimates for the next frame [7]. KF is commonly employed with data association to improve performance, e.g., maximum likelihood, and acceptance gate associated with KF [34], [35]. To improve KF estimation, local graph matching is applied after KF in the case of plant cell tracking because only cell movement is relevant [36]. Further, deep learning architectures can be used to solve tracking problems. The common method is converting the tracking task into a 2D classification, which adopts the t-1 state of an object as input and predicts the t state [7], [10].

Despite the aforementioned methods proposed for cell tracking, cell tracking or protein cluster tracking that support split and merge is rarely studied. According to [5], among 28 tracking methods reviewed, only two [40], [41] that use watershed segmentation and the nearest neighbor tracking method support split and merge. In [42], the split and merge measurements are represented by a sparse matrix and solved by a Markov chain Monte Carlo based auxiliary particle filter. However, the basic assumptions of this approach are: (1) objects are almost non-deformable; (2) the size and shape remain the same after cell events. The Markov chain Monte Carlo data association method is then proposed in [43], which adopts multiple GFP cluster split and merge tracking for 2D frames. By defining object events into five conditions, including object born, vanish, remain, merge, split, and edge, conditions of objects are measured as distance, which is then input to the method.

3. Methods

3.1. Multi-object portion tracking problem

As discussed in Section 1, one important problem in cell/protein tracking is how to track split and merge behaviors. Via breaking the whole object tracking problem into object portion tracking, any type of object relationship, including one-to-many (split) and many-to-one (merge), can be modeled as one-to-one mapping. That is, if a portion is selected as small as possible, there only exists in one or none object in a next time frame. The multi-object portion tracking problem can be modeled using a probabilistic model as follows:

Assume Ω is the collection of all tracks in time period T; Y is the total observation; a single track is defined as,

ω_{j} = Ω (j)

where j is track ID, and a single track at time t is ω_j(t).

Let O_t be the observation set of objects at time t, assume that $O_{t}^{i}$ is an object in O_t, and $δ_{O_{t}^{i}}^{n}$ is a portion of $O_{t}^{i}$ , where i is the object ID of time t and n is the portion ID. For the history observation O_t−τ, τ = 1,2, … T − 1, $O_{t - τ}^{l'}$ is an object in O_t−τ and $δ_{O_{t - τ}^{i'}}^{n'}$ is a portion of $O_{t - τ}^{i'}$ , where τ is the frame gap; i′ is the object ID of time t−τ and n′ is the portion ID.

Assuming that $O_{t - τ}^{i'} \in ω_{j}$ , as $δ_{O_{t - τ}^{i'}}^{n'} \in O_{t - τ}^{i'}$ , the probability of portion matching, i.e., portion $δ_{O_{t}^{i}}^{n}$ is matched with object $O_{t - τ}^{i'}$ , is set as

P (δ_{O_{t}^{i}}^{n} | O_{t - τ}^{i'}) = max_{n ˋ} P (δ_{O_{t}^{i}}^{n} | δ_{O_{t - τ}^{i'}}^{n'})

(1)

Then for all τ, the probability that $δ_{O_{t}^{i}}^{n}$ is in the track of $O_{t - τ}^{i'}$ , can be formulated as

P (δ_{O_{t}^{i}}^{n} | ω_{j} (t)) = \sum_{τ} P (δ_{O_{t}^{i}}^{n} | O_{t - τ}^{i'})

(2)

For all $δ_{O_{t}^{i}}^{n} \in O_{t}^{i}$ , one can determine that $O_{t}^{i}$ is in the track of $O_{t - τ}^{i'}$ by

P (O_{t}^{i} | ω_{j} (t)) = max_{n} P (δ_{O_{t}^{i}}^{n} | ω_{j} (t)) = max_{n} \sum_{τ} P (δ_{O_{t}^{i}}^{n} | O_{t - τ}^{i'})

(3)

which gives

P (O_{t}^{i} | ω_{j} (t)) = max_{n} \sum_{τ} max_{n ˋ} P (δ_{O_{t}^{i}}^{n} | δ_{O_{t - τ}^{i'}}^{n'})

(4)

Assuming that the object birth/death has no impact to the tracking of one existing object, the probability of one consistent tracking in T can be modeled as

P (ω_{j} | Y) = \prod_{t = 1}^{T} P (O_{t}^{i} | ω_{j} (t)) = \prod_{t = 1}^{T} max_{n} \sum_{τ} max_{n ˋ} P (δ_{O_{t}^{i}}^{n} | δ_{O_{t - τ}^{i'}}^{n'})

(5)

The objective of the multi-object portion tracking problem is to maximize P(ω_j|Y) ∀ωj ∈ Ω. Hence, to optimize the multi-object portion tracking, one has to maximize the sum of the maximal portion matching probability among interval τ, i.e.,

\sum_{τ} max_{n ˋ} P (δ_{O_{t}^{i}}^{n} | δ_{O_{t - τ}^{i'}}^{n'})

The optimization can be achieved by (1) reliable segmentation because false segmentation disturbs portion matching results; (2) an efficient matching approach to improving the successful matching rate.

In the proposed approach, segmentation and tracking are based on deep feature maps extracted by a deep learning architecture. The merits of deep feature maps are (1) Abundant: deep feature maps are extracted from deep learning architectures that allow multiple map layers. Taking U-Net as an example, 64 feature map layers are preserved as effective ones before pixel-wise classification, which is preferable to single contour or surface information. (2) Reliable: deep feature maps are weight matrices that are selected and optimized by deep learning architectures. Distilled by multiple encoders and decoders, weight matrices are optimized via a global training and loss function, which raises convergence of error. (3) Complex object events supportive: birth/death, split/merge are able to be identified by recognition through deep feature maps. Traditionally, nearby search and motion prediction accelerate tracking as well as improve tracking performance by error elimination. However, they are limited by high migrating speed, high object density, and complex motion models. Deep feature maps help a tracking mechanism achieve reliable object recognition. The abovementioned limitations are thus released. (4) Multidimensional: a (X, Y, Z, D) size of feature map where X, Y, and Z are length, width, and height of the image respectively, and D is the depth of feature maps. (5) Calculation efficiency: although 64-layer deep feature maps are extracted by UNet, maps with fewer layers can be employed in tracking considering hardware and time efficiency.

3.2. 3D Segmentation

Since ResNets and DenseNets present efficiency in classification, semantic segmentation architectures that adopt Residual blocks and Dense blocks have demonstrated efficiency in pixel-wise classification in recent years. Fully Convolutional DenseNets [26] (FCDN) is one of the efficient semantic segmentation architectures. Here, the 103 layered FCDN-103 is expanded into a 3D architecture to perform segmentation for 3D fluorescence microscopy images. Figure 1 shows the architecture of 3D asymmetric FCDN-103.

3D asymmetric FCDN-103: (a) network architecture; (b) Dense block

In [27] and [28], when building 3D deep networks, the authors employ a symmetric convolution filter and pooling filter (e.g. filter size [3, 3, 3] or [2, 2, 2]). Nevertheless, in this application, the image size ratio of the 3D fluorescence microscopy images is 280:512:13, which is asymmetric. With the resolution, image resizing may compromise the data. As such, the third dimension data is chosen to be fully preserved in the network. The layer arrangement and the other 2D parameter setting are identical as [26]. The parameter setting is shown in Table 1. Figure 2 presents the segmentation results.

Table 1.

3D Asymmetric FCDN-103 Detail

	Layer	Transition Down	Transition Up
Sub-architecture	Batch Norm.	Batch Norm.	3×3×1 Transposed Convolution stride=2
	ReLU	ReLU
	3×3×1 Convolution	1×1×1 Convolution
	Dropout p=0.2	Dropout p=0.2
		2×2×1 Max Pooling

Open in a new tab

Segmentation results of 3D fluorescence image: (a) 3D view; (b) 2D view

3.3. Deep feature map tracking

The proposed deep feature map tracking (DFMT) method distills deep feature maps from the 3D segmentation results, then recognizes and matches objects’ portions using an extended search.

3.3.1. Portions of 4D deep feature maps

After training with the Adam Optimizer and the softmax cross entropy loss function, the final parameter matrix of DB5 is $W_{t}^{m}$ , which is a 64-layer 4D feature map of time t , and m feature maps of all observations O_t at time t can be defined as

Θ_{t} = \cup_{m = 1}^{64} W_{t}^{m}

Instead of using direct image information (e.g. intensity), here, the portion matching is realized by using deep feature maps. The deep feature map portions corresponding to the object portions

δ_{O_{t}^{i}}^{n} \in O_{t}^{i} and δ_{O_{t - τ}^{i'}}^{n'} \in O_{t - τ}^{i ˋ}

are denoted as

θ_{O_{t}^{i}}^{n} \in Θ_{t} (O_{t}^{i}) and θ_{O_{t - τ}^{i'}}^{n'} \in Θ_{t - τ} (O_{t - τ}^{i ˋ}),

respectively.

3.3.2. Extended search and portion matching

Rather than only matching between a pair of portions from two time frames, an extended search employs a set of portions to cover the surroundings in volume. It is applied to Θt to accommodate motions and fluctuations.

Taking a pixel $p (x_{p}, y_{p}, z_{p}) \in O_{t}^{i}$ as the center point in x, y, and z dimension of object $O_{t}^{i}$ , the deep feature map portion is defined as

θ_{O_{t}^{i}}^{n} = {Θ_{t} (x_{p} \pm r_{x}, x_{p} \pm r_{y}, x_{p} \pm r_{z}, m), \forall m}

where 2r_x, 2r_y and 2r_z are the lengths in dimensions x, y, and z of each portion, respectively. For the extended search in frame t – τ, a pixel group is obtained by

p_{τ} \in {(x_{p τ}, y_{p τ}, z_{p τ}) | x_{p τ} \in (x_{p} \pm r_{x}^{e x t}), y_{p τ} \in (y_{p} \pm r_{y}^{e x t}), z_{p τ} \in (z_{p} \pm r_{z}^{e x t})}

where $r_{x}^{e x t}$ , $r_{y}^{e x t}$ , and $r_{z}^{e x t}$ are the extended range in x, y, and z dimension, respectively. Then an object’s deep feature map portion in frame t – τ is defined as ∀ p_τ(x_pτ, y_pτ, z_pτ),

θ_{O_{t - τ}^{i'}}^{n ˋ} = {Θ_{t - τ} (x_{p τ} \pm r_{x}, x_{p τ} \pm r_{y}, x_{p τ} \pm r_{z}, m), \forall m}

The portion matching between $θ_{O_{t}^{i}}^{n ˋ}$ and $θ_{O_{t - τ}^{i'}}^{n ˋ}$ is defined by Pearson’s correlation coefficient as

ρ (θ_{O_{t}^{i}}^{n}) = \frac{\sum_{x, y, z, n} (θ_{O_{t}^{i}}^{n} (x, y, z, m) - E (θ_{O_{t}^{i}}^{n})) (θ_{O_{t - τ}^{i^{'}}}^{n^{'}} (x, y, z, m) - E (θ_{O_{t - τ}^{i^{'}}}^{n^{'}}))}{\sqrt{\sum_{x, y, z, n} {(θ_{O_{t}^{i}}^{n} (x, y, z, m) - E (θ_{O_{t}^{i}}^{n}))}^{2} {(θ_{O_{t - τ}^{i^{'}}}^{n^{'}} (x, y, z, m) - E (θ_{O_{t - τ}^{i^{'}}}^{n^{'}}))}^{2}}}

(6)

where $E (θ_{O_{t}^{i}}^{n})$ is the expectation of the portion $θ_{O_{t}^{i}}^{n}$ . If a portion $θ_{O_{t}^{i}}^{n}$ is associated with l objects in the extended search space among τ time frames ${O_{t}^{i_{1}}, O_{t}^{i_{2}}, \dots O_{t - τ}^{i_{l}}}$ , it has an array of coefficient $ρ (θ_{O_{t}^{i}}^{n})$ describing its correlation with all l objects. The best matched object portion is selected by

M (θ_{O_{t}^{i}}^{n}) = {k | ρ (θ_{O_{t}^{k}}^{n}) = max_{i = i_{1} \to i_{l}} ρ (θ_{O_{t}^{i}}^{n}) \geq γ}

(7)

where γ is a lower bound of acceptance for portion matching. $M (θ_{O_{t}^{i}}^{n})$ is a set of matched object ID. Figure 3 shows a 3D view of the 4D extended search and portion matching.

3D view of the 4D extended search and portion matching using deep feature map(DFM)

3.3.3. Object events identification

Besides one-to-one object mapping, the object events of cells and protein clusters include four types: birth, death, split, and merge, as shown in Figure 4.

Object events (a) birth (b) death (c) split (d) merge

Birth: An object $O_{t}^{l}$ is newborn if none of its portions are matched with any object portion in τ former time frames, i.e.,

\cup_{n} M (θ_{O_{t}^{i}}^{n}) = \emptyset, \forall n .

(8)

New IDs are assigned to newborn objects.

Death: Observations for dead objects at time point t are not available. These objects’ IDs will be held and never assigned to other objects in case of reappearance.

Split: Two or more objects are considered split from one parent if the intersection of unions of the matched object ID set is not empty. New object IDs are assigned to child objects but parents’ IDs are recorded for tracking.

\cup_{n} M (θ_{O_{t}^{i}}^{n}) \cap \cup_{n^{'}} M (θ_{O_{t}^{i'}}^{n'}) \cap_{\dots} \neq \emptyset, \forall n, n^{'}, \dots

(9)

Merge: An object is considered merged if the matched object union contains multiple elements as

\cup_{n} M (θ_{O_{t}^{i}}^{n}) = {i_{1}, i_{2}, \dots} .

(10)

The object ID of the merged object is inherited from the major (larger size) parent object, though all parents’ IDs are recorded.

Via DFMT, the matching relationships between portions and corresponding objects are identified. The collection of all matched objects at time t is defined as

\cup_{i} \cup_{n} M (θ_{O_{t}^{i}}^{n})

(11)

Figure 5 plots the sample correlation map of protein objects in the time interval [t, t−τ]. Through portion matching with extended search in τ time frames, DFMT successfully identifies the split and merge relations among multiple objects. Object IDs are inherited from the parents following the matching relationships.

Correlation maps. Each object is in a unique color.

4. Experimental Results

4.1. Dataset

The 4D dataset used for evaluating the proposed method is a time-lapse movie taken from developing Drosophila (fruit fly) embryos. The surface of each embryo is covered by a single layer of columnar-shaped epithelial cells, with the top of the cells on the surface of the embryo. The dimension of the columnar cell is proximately 6.5 micron wide and 30 microns tall. E-Cadherin fused with Green Fluorescent Protein (GFP) is expressed in the entire embryo to visualize adherent junctions, the major cell-to-cell junctions that physically connect neighbor cells. E-Cadherin is the membrane component of adherent junctions and therefore localize exclusively on the cell membrane. At this point of development, E-Cadherin molecules are organized into small clusters of various sizes on the lateral membrane of the columnar cells.

Images were taken on a Leica SP5 laser scanning confocal microscope. The 488 μm laser was used to excite the GFP. At each time point, a 60-micron by 30-micron area was imaged and a z-stack of optical sections were taken from the top of the cell to six microns below, creating a 3-D data set. The developing embryo was imaged this way every five seconds, generating a 4-D data set.

4.2. Evaluation

Based on the same segmentation result generated by the asymmetric 3D FDCN in Section 3.1, seven tracking methods, including the proposed DFMT method, are tested and compared with the ground truth. For the comparing methods, Method1 (M1) [32] employs intersection-over-union (IOU), which scores the cell relations by Eq.12

I O U (a, b) = \frac{A r e a (a) \cap A r e a (b)}{A r e a (a) \cup A r e a (b)}

(12)

where a and b are two objects. Based on the performance, the threshold σ_IOU is set to 0 to maximize the tracking result. Method2 (M2) [43] uses object linking and expands linking (10 pixels) when objects move. Besides, overtime tracking is applied in Method2. For object events, a split is determined if two objects a and b in frame t are linked to one object c in frame t−1, and merge is determined if object a in frame t has two sources b and c in frame t−1 when size(a) ≥ 1.5 × average size(b, c). Since Method1 does not support object event identification, Method1 is evaluated one more time with the object event identification approach of Method2, shown as M1+M2. Method3 (M3) [44] is applied through Fiji [45] and ImageJ [46]. It supports tracking with (M3 (S&M)) or without (M3) split and merge detection. Both versions are tested. As Figure 6 shows, the evaluation is based on the tracking of 12 objects through 69 time points. Due to the difficulty of determining a single time point of object events by the naked eye, a time range is set for each object event. Figure 7 presents the ground truth of split and merge events in the time range. For evaluation purposes, if a method detects a respond object event within the time range, the detection is positive. Any missed detection is determined as false negative (FN), and any false detection is counted as false positive (FP). The evaluation metrics include accuracy, sensitivity, and precision, as defined by the following equations.

Object tracking and event identification results

Precision = TP / (TP + FP)

(13)

Sensitivity = TP / (TP + FN)

(14)

Accuracy = (TP + TN) / (TP + FP + TN + FN)

(15)

Figure 7 demonstrates the tracking of six example objects. In Table 2, tracking accuracy describes the capacity of consistent tracking in time period T. The split/merge accuracy represents the accuracy of split/merge event identification. Split and merge event accuracy, sensitivity, and precision show the comprehensive performance for both split and merge events. From Figure 7 and Table 2, one can see that without split and merge detection, M1 and M3 lose tracking with higher frequency. The proposed DFMT method achieves long-term tracking at a highly successful rate, among all methods. Besides, it significantly improves the performance of object event identification. Both the split and merge of protein clusters are accurately detected with low FN and FP. The proposed method achieves 2.96% higher on consistent tracking accuracy than Method2 and 35.48% higher on event identification accuracy than Method3, respectively. The efficiency of DFMT is delivered by the valid recognition of objects based on the accuracy and reliability of the deep feature maps. It is confirmed that the deep feature map tracking method is powerful in cases including fast migration, dense distribution, and complex motion models. Figure 8 shows the 3D tracking result by the proposed method.

Table 2.

Object tracking and event identification results

Methods	Tracking Accuracy	Split Accuracy	Merge Accuracy	S&M Accuracy	S&M Sensitivity	S&M Precision
DFMT (proposed)	98.52%	81.82%	85.00%	83.87%	83.87%	96.30%
M1 [32]	43.28%	-	-	-	-	-
M1+M2 [32], [41]	77.31%	63.64%	40.00%	48.39%	50.00%	83.33%
M2 [41]	95.56%	54.55%	40.00%	45.16%	53.85%	82.35%
M3 [44]	65.84%	-	-	-	-	-
M3 (S&M) [44]	93.09%	63.64%	40.00%	48.39%	51.72%	88.24%

Open in a new tab

3D tracking by the proposed approach. Object IDs and colors are inherited if tracking is successful.

5. Conclusion

In this paper, we proposed the deep feature map tracking method for tracking objects in 4D fluorescence imagery. The DFMT method targets to complex motion objects, such as protein clusters, which may demonstrate dense distribution, fast migration, rapid volume alternation, split, and merge. The proposed method works in two main steps. First, based on the 3D segmentation result, the 4D deep feature map is extracted from the last dense block for each 3D fluorescence frame. Next, portion matching between objects in two frames is performed by finding the highest correlation within the extended search space. Consequently, partial or whole object tracking, covering object events like birth, death, split, and merge is realized by the proposed method. The experiment results demonstrate that the DFMT method achieves a high success rate on long-term tracking and object event identification.

Acknowledgements

We acknowledge the support for this research: UNLV TTGRA and NIH Pathway to Independence Award (K99/R00).

References

[1].Giepmans BN, Adams SR, Ellisman MH, and Tsien RY. The fluorescent toolbox for assessing protein location and function. Science, 312(5771):217–24, April 2006. [DOI] [PubMed] [Google Scholar]
[2].Sanderson MJ, Smith I, Parker I, and Bootman MD. Fluorescence microscopy. Cold Spring Harb Protoc, 2014(10):pdb.top071795, October 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[3].Hell SW. Toward fluorescence nanoscopy. Nature Biotechnology, 21(11):1347–1355, October 2003. [DOI] [PubMed] [Google Scholar]
[4].Song L, Hennink EJ, Young IT, and Tanke HJ. Photobleaching kinetics of fluorescein in quantitative fluorescence microscopy. Biophysical J, 68(6):2588–2600. [DOI] [PMC free article] [PubMed] [Google Scholar]
[5].Chenouard N et al. Objective comparison of particle tracking methods. Nature Methods, 11(3):281, January 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[6].Yang F, Mackey MA, Ianzini F Gallardo G and Sonka M. Cell segmentation, tracking, and mitosis detection using temporal context Proc. 2nd Int. Conf. Medical Image Computing and Computer-assisted Intervention, pp. 302–309, 2005. [DOI] [PubMed] [Google Scholar]
[7].He T, Mao H, Guo J, and Yi Z. Cell tracking using deep neural networks with multi-task learning. Image and Vision Computing, 60:142–153, April 2017. [Google Scholar]
[8].Litjens G et al. A survey on deep learning in medical image analysis. Medical Image Analysis, 42:60–88, December 2017. [DOI] [PubMed] [Google Scholar]
[9].Muller M, Bibi A, Giancola S, Alsubaihi S, and Ghanem B. TrackingNet: a large-Scale dataset and benchmark for object tracking in the wild. Proc. 15th European Conf. on Computer Vision. pp. 300–317, 2018. [Google Scholar]
[10].Abousamra S, Adar S, Elia N, and Shilkrot R. Localization and tracking in 4D fluorescence microscopy imagery. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition Workshops, pp. 2290–2298, 2018. [Google Scholar]
[11].Ghazal M, Amer A, and Ghrayeb A. Structure-oriented multidirectional wiener filter for denoising of image and video signals. IEEE Trans. on Circuits and Systems for Video Technology, 18(12):1797–1802, December 2008. [Google Scholar]
[12].Buades A, Coll B, and Morel JM. A review of image denoising algorithms, with a new one. Multiscale Modeling & Simulation, 4(2):490–530, July 2006. [Google Scholar]
[13].Kass M, Witkin A, Terzopoulos D. Snakes: Active contour models. Int. J. of Computer Vision, 1(4):321–331, January 1988. [Google Scholar]
[14].Delgado-Gonzalo R, Uhlmann V, Schmitter D, and Unser M. Snakes on a plane: a perfect snap for bioimage analysis. IEEE Signal Processing Magazine, 32(1):41–48, January 2015. [Google Scholar]
[15].Lorenz KS, Salama P, Dunn KW, and Delp EJ. Three dimensional segmentation of fluorescence microscopy images using active surfaces. Proc. IEEE Int. Conf. on Image Processing, pp. 1153–1157, 2013. [Google Scholar]
[16].Vincent L, and Soille P. Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE Trans. Pattern Analysis and Machine Intelligence, 13(6):583–598, June 1991. [Google Scholar]
[17].Yang X, Li H, and Zhou X. Nuclei segmentation using marker-controlled watershed, tracking using mean-shift, and kalman filter in time-lapse microscopy. IEEE Trans. Circuits and Systems I: Regular Papers, 53(11):2405–2414, November 2006. [Google Scholar]
[18].Jiao Y, Derakhshan H, Schneider B, Regentova E, and Yang M. Automated quantification of white blood cells in light microscopy images of injured skeletal muscle. Proc. IEEE 8th Annu. Computing and Communication Workshop and Conf., pp. 1–7, 2018. [Google Scholar]
[19].Otsu N. A threshold selection method from gray-level histograms. IEEE Trans. Systems, Man, and Cybernetics, 9(1):62–66, January 1979. [Google Scholar]
[20].Lecun Y, Bottou L, Bengio Y, and Haffner P. Gradient-based learning applied to document recognition. Proc. IEEE, 86(11): 2278–2324, November 1998. [Google Scholar]
[21].Long J, Shelhamer E and Darrell T. Fully convolutional networks for semantic segmentation Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 3431–3440, 2015. [DOI] [PubMed] [Google Scholar]
[22].Badrinarayanan V, Kendall A, and Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 39(12):2481–2495, December 2017. [DOI] [PubMed] [Google Scholar]
[23].Ronneberger O, Fischer P, and Brox T. U-Net: convolutional networks for biomedical image segmentation Proc. Int. Conf. on Medical Image Computing and Computer-assisted Intervention, pp. 234–241, 2015. [Google Scholar]
[24].Huang G, Liu Z, van der Maaten L, and Weinberger KQ. Densely connected convolutional networks Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 4700–4708, 2017. [Google Scholar]
[25].He K, Zhang X, Ren S, and Sun J. Deep residual learning for image recognition Proc. IEEE Conf. On Computer Vision and Pattern Recognition, pp. 770–778, 2015. [Google Scholar]
[26].Jégou S, Drozdzal M, Vazquez D, Romero A, and Bengio Y. The one hundred layers Tiramisu: fully convolutional DenseNets for semantic segmentation Proc. IEEE Conf. on Computer Vision and Pattern Recognition Workshops, pp. 11–19, 2017. [Google Scholar]
[27].Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, and Ronneberger O. 3D U-Net: learning dense volumetric segmentation from sparse annotation Proc. Int. Conf. on Medical Image Computing and Computer-Assisted Intervention, pp. 424–432, Springer, 2016. [Google Scholar]
[28].Fu C et al. Three dimensional fluorescence microscopy image synthesis and segmentation Proc. IEEE Conf. on Computer Vision and Pattern Recognition Workshops, pp. 2221–2229, 2018. [Google Scholar]
[29].Sapudom J, Waschke J, Franke K, Hlawitschka M, and Pompe Tilo. Quantitative label-free single cell tracking in 3D biomimetic matrices. Scientific Reports, 7(1):14135, October 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
[30].Yang FW et al. A computational framework for particle and whole cell tracking applied to a real biological dataset. J. of Biomechanics, 49(8):1290–1304, May 2016. [DOI] [PubMed] [Google Scholar]
[31].Parlato S et al. 3D Microfluidic model for evaluating immunotherapy efficacy by tracking dendritic cell behaviour toward tumor cells. Scientific Reports, 7(1):1093, April 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
[32].Bochinski E, Eiselein V, and Sikora T. High-speed tracking-by-detection without using image information Proc. 14th IEEE Int. Conf. on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6, 2017. [Google Scholar]
[33].Bochinski E, Senst T, and Sikora T. Extending IOU based multi-object tracking by visual information Proc. 15th IEEE Int. Conf. on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6, 2018. [Google Scholar]
[34].Landau M, Koltsova E, Ley K, Acton ST. Multi-cell 3D tracking with adaptive acceptance gates. Proc. IEEE Southwest Symp. on Image Analysis & Interpretation (SSIAI), pp. 49–52, 2017. [Google Scholar]
[35].Broeke JHP et al. Automated quantification of cellular traffic in living cells. J. of Neuroscience Methods, 178(2):378–384, April 2009. [DOI] [PubMed] [Google Scholar]
[36].Liu M, He Y, Wei Y, and Xiang P. Plant cell tracking using Kalman filter based local graph matching. Image and Vision Computing, 60:154–161, April 2017. [Google Scholar]
[37].Meijering E, Dzyubachyk O, Smal I. Methods for cell and particle tracking. Methods in Enzymology, 504:183–200, 2012. [DOI] [PubMed] [Google Scholar]
[38].Maška M et al. A benchmark for comparison of cell tracking algorithms. Bioinformatics, 30(11):1609–1617, February 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[39].Kalman RE. A new approach to linear filtering and prediction problems. J. of Basic Engineering, 82(1):35–45, March 1960. [Google Scholar]
[40].Crocker JC, Grier DG. Methods of digital video microscopy for colloidal studies. J. of Colloid and Interface Science, 179(1):298–310, April 1996. [Google Scholar]
[41].Celler K, van Wezel GP, and Willemse J. Single particle tracking of dynamically localizing TatA complexes in Streptomyces coelicolor. Biochemical and Biophysical Research Communications, 438(1):38–42, August 2013. [DOI] [PubMed] [Google Scholar]
[42].Khan Z, Balch T, and Dellaert F. Multitarget tracking with split and merged measurements Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, pp. 605–610, 2015. [Google Scholar]
[43].Quan W, Jean G, and Luby-Phelps K. Markov chain Monte Carlo data association for merge and split detection in tracking protein clusters Proc. 18th Int. Conf. on Pattern Recognition, pp. 1030–1033, 2006. [Google Scholar]
[44].Tinevez JY et al. TrackMate: an open and extensible platform for single-particle tracking. Methods, 115:80–90, February 2017. [DOI] [PubMed] [Google Scholar]
[45].Schindelin J et al. Fiji: an open-source platform for biological-image analysis. Nature Methods, 9(7):676–682, June 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
[46].Schneider CA, Rasband WS, Eliceiri KW. NIH Image to ImageJ: 25 years of image analysis. Nature Methods, 9(7):671–675, July 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] [1].Giepmans BN, Adams SR, Ellisman MH, and Tsien RY. The fluorescent toolbox for assessing protein location and function. Science, 312(5771):217–24, April 2006. [DOI] [PubMed] [Google Scholar]

[R2] [2].Sanderson MJ, Smith I, Parker I, and Bootman MD. Fluorescence microscopy. Cold Spring Harb Protoc, 2014(10):pdb.top071795, October 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] [3].Hell SW. Toward fluorescence nanoscopy. Nature Biotechnology, 21(11):1347–1355, October 2003. [DOI] [PubMed] [Google Scholar]

[R4] [4].Song L, Hennink EJ, Young IT, and Tanke HJ. Photobleaching kinetics of fluorescein in quantitative fluorescence microscopy. Biophysical J, 68(6):2588–2600. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] [5].Chenouard N et al. Objective comparison of particle tracking methods. Nature Methods, 11(3):281, January 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] [6].Yang F, Mackey MA, Ianzini F Gallardo G and Sonka M. Cell segmentation, tracking, and mitosis detection using temporal context Proc. 2nd Int. Conf. Medical Image Computing and Computer-assisted Intervention, pp. 302–309, 2005. [DOI] [PubMed] [Google Scholar]

[R7] [7].He T, Mao H, Guo J, and Yi Z. Cell tracking using deep neural networks with multi-task learning. Image and Vision Computing, 60:142–153, April 2017. [Google Scholar]

[R8] [8].Litjens G et al. A survey on deep learning in medical image analysis. Medical Image Analysis, 42:60–88, December 2017. [DOI] [PubMed] [Google Scholar]

[R9] [9].Muller M, Bibi A, Giancola S, Alsubaihi S, and Ghanem B. TrackingNet: a large-Scale dataset and benchmark for object tracking in the wild. Proc. 15th European Conf. on Computer Vision. pp. 300–317, 2018. [Google Scholar]

[R10] [10].Abousamra S, Adar S, Elia N, and Shilkrot R. Localization and tracking in 4D fluorescence microscopy imagery. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition Workshops, pp. 2290–2298, 2018. [Google Scholar]

[R11] [11].Ghazal M, Amer A, and Ghrayeb A. Structure-oriented multidirectional wiener filter for denoising of image and video signals. IEEE Trans. on Circuits and Systems for Video Technology, 18(12):1797–1802, December 2008. [Google Scholar]

[R12] [12].Buades A, Coll B, and Morel JM. A review of image denoising algorithms, with a new one. Multiscale Modeling & Simulation, 4(2):490–530, July 2006. [Google Scholar]

[R13] [13].Kass M, Witkin A, Terzopoulos D. Snakes: Active contour models. Int. J. of Computer Vision, 1(4):321–331, January 1988. [Google Scholar]

[R14] [14].Delgado-Gonzalo R, Uhlmann V, Schmitter D, and Unser M. Snakes on a plane: a perfect snap for bioimage analysis. IEEE Signal Processing Magazine, 32(1):41–48, January 2015. [Google Scholar]

[R15] [15].Lorenz KS, Salama P, Dunn KW, and Delp EJ. Three dimensional segmentation of fluorescence microscopy images using active surfaces. Proc. IEEE Int. Conf. on Image Processing, pp. 1153–1157, 2013. [Google Scholar]

[R16] [16].Vincent L, and Soille P. Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE Trans. Pattern Analysis and Machine Intelligence, 13(6):583–598, June 1991. [Google Scholar]

[R17] [17].Yang X, Li H, and Zhou X. Nuclei segmentation using marker-controlled watershed, tracking using mean-shift, and kalman filter in time-lapse microscopy. IEEE Trans. Circuits and Systems I: Regular Papers, 53(11):2405–2414, November 2006. [Google Scholar]

[R18] [18].Jiao Y, Derakhshan H, Schneider B, Regentova E, and Yang M. Automated quantification of white blood cells in light microscopy images of injured skeletal muscle. Proc. IEEE 8th Annu. Computing and Communication Workshop and Conf., pp. 1–7, 2018. [Google Scholar]

[R19] [19].Otsu N. A threshold selection method from gray-level histograms. IEEE Trans. Systems, Man, and Cybernetics, 9(1):62–66, January 1979. [Google Scholar]

[R20] [20].Lecun Y, Bottou L, Bengio Y, and Haffner P. Gradient-based learning applied to document recognition. Proc. IEEE, 86(11): 2278–2324, November 1998. [Google Scholar]

[R21] [21].Long J, Shelhamer E and Darrell T. Fully convolutional networks for semantic segmentation Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 3431–3440, 2015. [DOI] [PubMed] [Google Scholar]

[R22] [22].Badrinarayanan V, Kendall A, and Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 39(12):2481–2495, December 2017. [DOI] [PubMed] [Google Scholar]

[R23] [23].Ronneberger O, Fischer P, and Brox T. U-Net: convolutional networks for biomedical image segmentation Proc. Int. Conf. on Medical Image Computing and Computer-assisted Intervention, pp. 234–241, 2015. [Google Scholar]

[R24] [24].Huang G, Liu Z, van der Maaten L, and Weinberger KQ. Densely connected convolutional networks Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 4700–4708, 2017. [Google Scholar]

[R25] [25].He K, Zhang X, Ren S, and Sun J. Deep residual learning for image recognition Proc. IEEE Conf. On Computer Vision and Pattern Recognition, pp. 770–778, 2015. [Google Scholar]

[R26] [26].Jégou S, Drozdzal M, Vazquez D, Romero A, and Bengio Y. The one hundred layers Tiramisu: fully convolutional DenseNets for semantic segmentation Proc. IEEE Conf. on Computer Vision and Pattern Recognition Workshops, pp. 11–19, 2017. [Google Scholar]

[R27] [27].Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, and Ronneberger O. 3D U-Net: learning dense volumetric segmentation from sparse annotation Proc. Int. Conf. on Medical Image Computing and Computer-Assisted Intervention, pp. 424–432, Springer, 2016. [Google Scholar]

[R28] [28].Fu C et al. Three dimensional fluorescence microscopy image synthesis and segmentation Proc. IEEE Conf. on Computer Vision and Pattern Recognition Workshops, pp. 2221–2229, 2018. [Google Scholar]

[R29] [29].Sapudom J, Waschke J, Franke K, Hlawitschka M, and Pompe Tilo. Quantitative label-free single cell tracking in 3D biomimetic matrices. Scientific Reports, 7(1):14135, October 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] [30].Yang FW et al. A computational framework for particle and whole cell tracking applied to a real biological dataset. J. of Biomechanics, 49(8):1290–1304, May 2016. [DOI] [PubMed] [Google Scholar]

[R31] [31].Parlato S et al. 3D Microfluidic model for evaluating immunotherapy efficacy by tracking dendritic cell behaviour toward tumor cells. Scientific Reports, 7(1):1093, April 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] [32].Bochinski E, Eiselein V, and Sikora T. High-speed tracking-by-detection without using image information Proc. 14th IEEE Int. Conf. on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6, 2017. [Google Scholar]

[R33] [33].Bochinski E, Senst T, and Sikora T. Extending IOU based multi-object tracking by visual information Proc. 15th IEEE Int. Conf. on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6, 2018. [Google Scholar]

[R34] [34].Landau M, Koltsova E, Ley K, Acton ST. Multi-cell 3D tracking with adaptive acceptance gates. Proc. IEEE Southwest Symp. on Image Analysis & Interpretation (SSIAI), pp. 49–52, 2017. [Google Scholar]

[R35] [35].Broeke JHP et al. Automated quantification of cellular traffic in living cells. J. of Neuroscience Methods, 178(2):378–384, April 2009. [DOI] [PubMed] [Google Scholar]

[R36] [36].Liu M, He Y, Wei Y, and Xiang P. Plant cell tracking using Kalman filter based local graph matching. Image and Vision Computing, 60:154–161, April 2017. [Google Scholar]

[R37] [37].Meijering E, Dzyubachyk O, Smal I. Methods for cell and particle tracking. Methods in Enzymology, 504:183–200, 2012. [DOI] [PubMed] [Google Scholar]

[R38] [38].Maška M et al. A benchmark for comparison of cell tracking algorithms. Bioinformatics, 30(11):1609–1617, February 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] [39].Kalman RE. A new approach to linear filtering and prediction problems. J. of Basic Engineering, 82(1):35–45, March 1960. [Google Scholar]

[R40] [40].Crocker JC, Grier DG. Methods of digital video microscopy for colloidal studies. J. of Colloid and Interface Science, 179(1):298–310, April 1996. [Google Scholar]

[R41] [41].Celler K, van Wezel GP, and Willemse J. Single particle tracking of dynamically localizing TatA complexes in Streptomyces coelicolor. Biochemical and Biophysical Research Communications, 438(1):38–42, August 2013. [DOI] [PubMed] [Google Scholar]

[R42] [42].Khan Z, Balch T, and Dellaert F. Multitarget tracking with split and merged measurements Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, pp. 605–610, 2015. [Google Scholar]

[R43] [43].Quan W, Jean G, and Luby-Phelps K. Markov chain Monte Carlo data association for merge and split detection in tracking protein clusters Proc. 18th Int. Conf. on Pattern Recognition, pp. 1030–1033, 2006. [Google Scholar]

[R44] [44].Tinevez JY et al. TrackMate: an open and extensible platform for single-particle tracking. Methods, 115:80–90, February 2017. [DOI] [PubMed] [Google Scholar]

[R45] [45].Schindelin J et al. Fiji: an open-source platform for biological-image analysis. Nature Methods, 9(7):676–682, June 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R46] [46].Schneider CA, Rasband WS, Eliceiri KW. NIH Image to ImageJ: 25 years of image analysis. Nature Methods, 9(7):671–675, July 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Multi-Object Portion Tracking in 4D Fluorescence Microscopy Imagery with Deep Feature Maps

Yang Jiao

Mo Weng

Mei Yang

Abstract

1. Introduction

2. Related Work

3. Methods