Abstract
This study aimed to propose an efficient method for self-automated segmentation of the liver using magnetic resonance imaging–derived proton density fat fraction (MRI-PDFF) through deep active learning. We developed an active learning framework for liver segmentation using labeled and unlabeled data in MRI-PDFF. A total of 77 liver samples on MRI-PDFF were obtained from patients with nonalcoholic fatty liver disease. For the training, tuning, and testing of the liver segmentation, the ground truth of 71 (internal) and 6 (external) MRI-PDFF scans for training and testing were verified by an expert reviewer. For 100 randomly selected slices, manual and deep learning (DL) segmentations for visual assessments were classified, ranging from very accurate to mostly accurate. The dice similarity coefficients for each step were 0.69 ± 0.21, 0.85 ± 0.12, and 0.94 ± 0.01, respectively (p-value = 0.1389 between the first step and the second step or p-value = 0.0144 between the first step and the third step for paired t-test), indicating that active learning provides superior performance compared with non-active learning. The biases in the Bland-Altman plots for each step were − 24.22% (from − 82.76 to − 2.70), − 21.29% (from − 59.52 to 3.06), and − 0.67% (from − 10.43 to 4.06). Additionally, there was a fivefold reduction in the required annotation time after the application of active learning (2 min with, and 13 min without, active learning in the first step). The number of very accurate slices for DL (46 slices) was greater than that for manual segmentations (6 slices). Deep active learning enables efficient learning for liver segmentation on a limited MRI-PDFF.
Keywords: Active learning, Abdominal image analysis, Convolution neural network, Deep learning, Proton density fat fraction
Introduction
Semantic segmentation is used to classify meaningful parts in various images with predefined class labels for each pixel; that is, its purpose is pixelwise labeling. The process has been widely researched for analyzing biomedical images [1–5]. Several researchers have developed various semantic segmentation algorithms by applying early techniques, such as edge detection filters and region growing in image processing. Owing to hardware improvements over the past several years, various methods based on convolutional neural networks (CNNs) have been shown to be successful in enhancing the considerable capabilities of semantic segmentation when combined with other machine learning methods, such as recurrent neural networks (RNNs) [6] and conditional random fields (CRFs) [7]. The fully convolutional network (FCN) [8] and DeepLab [9] were developed to improve segmentation performance. Three-dimensional U-net, which comprises a contracting path and a symmetric expanding path, has demonstrated effective segmentation of 3D volumes [10, 11]. Novel cascaded architectures (two-way paths), such as cascaded 3D FCN and segmentation-by-detection networks, have been proposed to improve the performance using a region proposal network (RPN) prior to segmentation [12–16]. Furthermore, a deep CNN can efficiently utilize high-quality input for superior performance. However, several challenges still exist in the field of medical imaging. First, medical imaging datasets are typically scarce because of the costly high-quality annotated data in clinical environments. Second, acquiring sufficient and reliable annotated data for semantic segmentation is difficult, which reflects the potential variations in manual human drawing. In particular, reliable annotated data are relatively difficult to acquire for segmenting medical images (2D slices and 3D volumes) associated with rare diseases or complex cardiac and abdominal structures. Several studies have introduced active learning frameworks to ease manual annotation and generalize models with limited datasets and additional annotations in a human-in-the-loop manner [17–24]. Some researchers applied interactive learning with CNNs to scribble-based segmentation, including a bounding box, thus generalizing previously unseen object classes [25]. Magnetic resonance imaging–derived proton density fat fraction (MRI-PDFF) allows for a noninvasive and accurate calculation of hepatic steatosis in patients with liver disease. Accurate liver segmentation is important in MRI-PDFF for estimating the presence and grading of hepatic steatosis [26].
To test deep active learning on the MRI-PDFF dataset, we propose deep active learning, thus reducing annotation costs and gradually increasing the MRI-PDFF dataset, which would enable effective segmentation of the complex structure of the liver on MRI-PDFF with nonalcoholic fatty liver disease (NAFLD).
Materials and Methods
Datasets and Pre-processing
The study protocol was approved by the Institutional Review Board for Human Investigations at the Korea University Medical Center, which waived the requirement for informed consent owing to the retrospective nature of this study. The entire dataset was anonymized to protect patient information. We confirmed that all the methods were performed in accordance with the relevant guidelines and regulations.
A total of 77 liver samples on MRI-PDFF (N = 71 patients—internal datasets and 6 patients—external datasets) of patients with nonalcoholic fatty liver disease and consecutively admitted patients with NAFLD that were confirmed using liver biopsy, acquired from the Korea University Medical Center between January 2018 and May 2020, were used for training and testing. For validation, six internal (Korea University Anam Hospital; KUAH) and six external (Korea University Ansan Hospital; KUANH) MRI-PDFF cases were evaluated at each step.
Images were acquired using a 3.0-T MRI scanner (MAGNETOM Skyra; Siemens Healthcare, Erlangen, Germany) with a 30-channel body coil. Multi-echo gradient echo sequences were obtained using multi-echo Dixon (Siemens Healthcare, Erlangen, Germany) and VIBE-Dixon (Siemens Healthcare, Erlangen, Germany) for online reconstruction. In a 15-s breath-hold, six fractional echo magnitude images were obtained at 1.09, 2.46, 3.69, 4.92, 6.15, and 7.38 ms of echo time. The repetition time, slice thickness, field of view, and flip angle were 9.00 ms, 3.0 mm, 304 × 380, and 4°, respectively (Table 1). Screening Dixon sequences were used to rapidly and roughly measure the liver fat fraction of the patients. All patients underwent ultrasound-guided percutaneous liver biopsy using a coaxial biopsy system of 18 gauge (Bard Mission needle; Bard Biopsy, Tempe, AZ, USA) with 20 mm penetration depth and targeting segment 5/6. Liver biopsy specimens were fixed in formalin and embedded in paraffin.
Table 1.
Demographic and acquisition parameters of study population by group
| Characteristic | Training and tuning (N = 65 patients) |
Internal validation (N = 6 patients) |
External validation (N = 6 patients) |
|
|---|---|---|---|---|
| Age (per patient) | 4614 | 3817.1 | 266.3 | |
| Gender (per patient) | ||||
| Male | 42 | 2 | 1 | |
| Female | 23 | 4 | 5 | |
| Multi-echo gradient echo sequences |
ME Dixon VIBE-Dixon |
ME Dixon VIBE-Dixon |
ME Dixon VIBE-Dixon |
|
| Flip angle (°) | 4° | 4° | 4° | |
| Echo times |
1.09, 2.46, 3.69, 4.92, 6.15, and 7.38 |
1.09, 2.46, 3.69, 4.92, 6.15, and 7.38 |
1.09, 2.46, 3.69, 4.92, 6.15, and 7.38 |
|
| Field of view (mm) | 304 × 380 mm | 304 × 380 mm | 304 × 380 mm | |
| Repetition time (ms) | 9.00 | 9.00 | 9.00 | |
| Slice thickness (mm) | 3.0 | 3.0 | 3.5 | |
Liver segmentation was performed on the derived PDFF maps from multi-echo Dixon phase. We randomly selected 71 cases for deep active learning, as shown in Fig. 1b. For the training, tuning, and testing of the liver segmentation, the ground truth of 36 (internal) and 6 (external) MRI-PDFF scans for the first step and testing was verified and annotated by an expert reviewer (M. J. K., with 18 years of experience in abdominal imaging) using the ITK-SNAP software. The labeled voxels on the MRI-PDFF slice included both background and non-background (i.e., liver) matter drawn by abdominal radiologists using the ITK-SNAP software. In the second and third steps, 35 MRI-PDFF scans were used for active learning with a human-in-the-loop.
Fig. 1.
a Overall process of the active learning framework for liver segmentation on MRI-PDFF b Dataset details for active learning in MRI-PDFF
All input images were downsampled to 208 × 208 pixels with intensity normalization by subtracting the mean and dividing by the standard deviation. We used third-order spline interpolation, which was accomplished by resampling each label separately using linear interpolation. Simple z-score normalization was applied to each MRI-PDFF dataset. We applied aggressive data augmentation using the batchgenerator framework [30], including spatial augmentations (mirroring, random elastic deformations, random scaling, random rotations), color augmentation (gamma correction augmentation and contrast), and noise augmentation (Gaussian noise).
Training Architecture
We customized the 3D U-net architecture to our nnU-net [11] for liver segmentation in MRI-PDFF, which demonstrated superior accuracy for pixelwise segmentation in various medical images. Fig. 2 shows the architecture of the customized 3D U-net in nnU-net [11]. The architecture uses 30 convolutional filters in the first layer and double this number for the operation of each pooling layer, which is performed by max pooling (3 × 3 × 3). It comprises an encoder and a decoder network with transposed convolutional layers for performing backward operations in the decoder network. The left side (encoder) reduces the dimensionality of the input, whereas the right side (decoder) recovers the original dimensionality. The encoder network is similar to a traditional CNN that successfully aggregates semantic information, which results in the reduction of spatial information and a loss in localization accuracy. In pixelwise segmentation, both spatial and semantic information are important for training medical datasets and making inferences from them. To recover the missing spatial information, the U-net performs deconvolution with a skip connection in the decoder network. It does so by receiving semantic information from the low vertex of “U” and recombining it with the layer on the right side (decoder network) for the skip connection. A prominent feature of the customized 3D nnU-net is the concatenation of the left (encoder network) and right side (decoder network). The concatenation operation improves the pixelwise segmentation by avoiding segmentation losses. In this study, 250 batches constituted an epoch. We replaced the leaky rectified linear unit (ReLU) activation functions with random ReLU in the original 3D nnU-net architecture, in addition to including cross-entropy, dice, and boundary loss functions. We also added adaptive layer-instance normalization (AdaLin) [29] to align the attention-guided model to the shape transformation. To optimize the stochastic gradient descent, Adam, with an initial learning rate of 3 × 10−4 and l2 weight decay of 3 × 10−5, was used. If the exponential moving average of the training loss shows no improvement within the previous 30 epochs, the learning rate is reduced by 0.2 times. Training was stopped after exceeding 1000 epochs or if the learning rate falls below 10−6.
Fig. 2.
Architecture of the customized 3D U-net in the nnU-net
Active Learning
Our active learning framework comprises three steps. (This framework adds a new dataset at each step to improve the model while inferring the remaining labels automatically using the model trained in the previous step. This procedure was applied iteratively to a limited labeled dataset and an unlabeled dataset.)
In the first step, 30 MRI-PDFF scans were manually labeled by an expert radiologist to establish the ground truth. The limited labeled dataset was then initially trained to segment the liver on the MRI-PDFF. After the initial training (first step), the ground truth of the new unlabeled dataset for the next step was acquired for CNN-assisted and post-modified segmentation. In the second step, 30 MRI-PDFF scans from the first step were reused to train with 15 new datasets, as shown in Fig. 1b. After the second step, the CNN-assisted segmentation of the new unlabeled dataset was manually modified for training in the next stage, as performed in the first step. In the final step, 65 scans (45 reused from the second step and 20 new ones) were used to train and improve the model, whereas the remaining six scans (manually labeled in the first step) were used to test each model. The results were evaluated after each step for accurate liver segmentation with 6 internal and 6 external scans. CNN-assisted and post-modified segmentation was conducted using ITK-SNAP open-source software.
For the 100 slices selected from the 12 data cases (537 slices including 6 internal cases (368 slices) and 6 external dataset cases (169 slices)), all manual segmentations and deep active learning–based segmentations for visual assessment were classified between very accurate (grade 1) to inaccurate (grade 4).
Experimental Setup
To generate segmentation labels on the 3D MRI-PDFF volumes, each axial image in the volume was presented sequentially to the model, and multiple 2D segmentation maps were constructed along the z-axis. The experiment was conducted on Ubuntu 18.04 with Python 3.6 and TensorFlow 1.15.0 backend; PyTorch 1.4.0 was used as the deep learning (DL) framework. The model was trained on an NVIDIA Titan RTX graphics card (24 GB). To maximize the training speed and optimize the GPU memory, we attempted to use larger input tiles and set the batch size to 6. In the first step, the training saturated after approximately 100 epochs owing to the small size of the dataset (N = 30). The second and third steps required 120 to 180 epochs owing to the larger datasets (N = 45 and N = 65). The difference in the overall DSCs between the tuning and test datasets in the final model (step 3) was 2.1. The current model for deep active learning did not overfit MRI-PDFF. We also validated CNN-corrected segmentation by measuring consumption time to evaluate the efficiency of segmentation for each step and conducted qualitative results from visual assessment. Lastly, we compared 2D U-net (with and/or without active learning) based on z-axis slices including liver annotations performed by abdominal radiologists with the 3D U-net in the nnU-net [10] based on 3D volumes using active learning.
Statistical Analysis
Segmentation was analyzed using the dice similarity coefficient (DSC), as shown in Eq. (1). The loss functions, including dice loss (DLS), boundary loss (BLS) [31], and binary cross-entropy (BCE), are defined in Eqs. (2), (3), and (4), respectively. The volume parameters of the ground truth and CNN segmentation were denoted as and , respectively.
| 1 |
| 2 |
| 3 |
Here, denotes the region between ; Ω → is a distance map with respect to boundary ∂G; that is, evaluates the distance between point q ∈ Ω and the nearest point z∂G(q) on contour ∂G: .
| 4 |
where and f denote the inferred probability and corresponding desired output, respectively.
In addition, statistical differences in DSC, consumption time, and visual assessment for each step were compared using a paired t-test (open-source R software (version 3.5.1; R Foundation for Statistical Computing, Vienna, Austria)). Statistical significance was set at P < 0.05. The reproducibility of the segmented liver volume was evaluated using the Bland-Altman method [32].
Results
Performance Evaluation
To determine the effectiveness of active learning on the MRI-PDFF, we analyzed our CNN-customized 3D nnU-net in all three steps. The DSCs between the ground truth and the inferences were calculated using only 6 internal and 6 external dataset out of the 65 segmented scans (human-in-the-loop). The worst and best results are presented in Fig. 3. The active learning framework yielded the best DSC in the final step. The figures show the liver segmentation of the 3D volumes on the MRI-PDFF.
Fig. 3.
Worst (first rows) and best (second rows) 3D volume image examples from the test dataset at different analysis points: a first step, b second step, c and last step
Comparison of Segmentation per Step
As the steps progressed, the segmentation results noticeably improved on MRI-PDFF and reduced the erroneous areas outside the liver. The DSCs at each stage were 0.69 ± 0.21, 0.5 ± 0.12, and 0.94 ± 0.01, respectively (p-value = 0.1389 (paired t-test) between the first step and the second step or p value = 0.0144 (paired t-test) between the first step and the last step), as shown in Table 2. The average DSCs for liver segmentation increased after each step, and the final segmentation in the last step showed the best results (Table 2). The reproducibility of the liver volumes (two continuous measurements, the volumes of each stage, and gold standard) was evaluated using the Bland-Altman analysis. The biases (confidence interval, 95%) in the Bland-Altman plots at each step were − 24.22% (from − 82.76 to − 2.70), − 21.29% (from − 59.52 to 3.06), and − 0.67% (from − 10.43 to 4.06), respectively (Fig. 4).
Table 2.
Dice similarity coefficients and limit of agreement of Bland-Altman analysis for the first, second, and third steps of the test dataset (six scans) in Fig. 1b; 3D nnU-net
| First step | Second step | Third step | |
|---|---|---|---|
| Liver (DSC) |
0.69±0.21 (0.45–0.94) |
0.85±0.12 (0.68–0.95) |
0.94±0.01 (0.92–0.95) |
| Liver (Bland-Altman plot analysis) |
−24.22% (from −82.76 to −2.70) |
−21.29% (from −59.52 to 3.06) |
−0.67% (from −10.43 to 4.06) |
p value = 0.1389 (paired t-test) for the first step and the second step or p value = 0.0144 (paired t-test) for the first step and the third step
DSC dice similarity coefficient
Fig. 4.
Bland-Altman plots on 3D volumes for the inference of each step and the gold standard at the a first step, b second step, and c third step
Comparison Between 2D and 3D U-net
Furthermore, we compared the obtained inferences of the 3D U-net with those of the recently published competitive network, the 2D U-net in nnU-net [11], to evaluate the performance of the proposed method. The DSCs for the 2D U-net and 3D U-net were 0.93 ± 0.02 and 0.94 ± 0.01, respectively (internal; p value = 0.54) and 0.78 ± 0.10 and 0.80 ± 0.06 (external; p-value = 0.75), as shown in Table 3.
Table 3.
Dice similarity coefficients and limit of agreement of Bland-Altman analysis for the 2D U-net and the final model for active learning for internal and external test dataset (six scans)
| 2D U-net (last step for active learning) | 3D U-net (last step for active learning) | ||
|---|---|---|---|
| Liver (DSC) | Internal |
0.93±0.02 (0.89–0.95) |
0.94±0.01 (0.92–0.95) |
| External |
0.78 ± 0.10 (0.86–0.61) |
0.80±0.06 (0.88–0.76) |
|
| Liver (Bland-Altman plot analysis) | Internal |
−3.11% (from −16.64 to 2.56) |
−0.67% (from −10.46 to 4.06) |
| External |
2.26% (from 66.09 to 36.6) |
−5.09% (from −49.80 to 23.78) |
|
p value = 0.54 (internal) and p value = 0.75 (external) for the 2D DSC and 3D DSC
DSC dice similarity coefficient
The biases of the Bland-Altman plot between both were − 3.11% (from − 16.64 to 2.56) and − 0.67% (from − 10.46 to 4.06), respectively (Fig. 5a, b). Figure 6 shows the inferences of 2D and 3D imaging in axial, sagittal, coronal, and volume rendering. In addition, a comparison between 2D and 3D nnU-net with six external datasets was shown in Table 3. The biases of the Bland-Altman plot between both were 2.26% (from − 66.09 to 36.6) and − 5.09% (from − 49.80 to 23.78), respectively (Fig. 5c, d).
Fig. 5.
Bland-Altman plots on 3D volumes for the inference of each step and the gold standard at the last step for a active learning in 2D and b active learning in 3D with internal validation datasets and for c active learning in 2D and d active learning external validation datasets
Fig. 6.
Example plots on 3D MRI-PDFF for the inference of 2D, 3D, and the gold standard; the axial images (first column), sagittal images (second column), coronal images (third column), and volume rendering (last column) including the last step for a active learning in 2D, b active learning in 3D, and c the gold standard
Comparison of Consumption Time and Qualitative Results from Visual Assessment
The comparisons of the liver segmentation time between CNN-assisted and manual segmentation are given in Table 4. The time consumed by the CNN-assisted and manually modified segmentation decreased by approximately 120 s for 45 scans in the second step and by approximately 115 s for 65 scans in the last step when compared with that taken in the first step.
Table 4.
Comparison of segmentation times between the manual and CNN-assisted and manually modified segmentation approaches
| First step | Second step | Last step | |
|---|---|---|---|
| Manual segmentation | CNN-assisted and manually modified segmentation | CNN-assisted and manually modified segmentation | |
| Time | 780 s | 120 s | 115 s |
p value = 0.138 (paired t-test) for the 2D DSC and 3D DSC or p value = 0.014 (paired t-test) for the 2D volume and 3D volume
CNN convolutional neural network
Interestingly, the manual and DL segmentations were classified as very accurate to mostly accurate, and there were few inaccurate cases (Table 5). The number of very accurate cases in the DL segmentations was larger than that in manual segmentations (6 vs. 46). The paired t-test comparing the results from a manual drawing by an abdominal radiologist with 18 years of experience and the segmentation DL was evaluated using 100 slices, including grade 1 for very accurate and grade 2 for accurate (p-value = 0.409).
Table 5.
Qualitative results from visual assessment of automatic liver segmentation on MRI-PDFF from 100 randomly selected slices
| Grade | Manual | 3D U-net (last step for active learning) |
|---|---|---|
| 1—very accurate | 6 | 46 |
| 2—accurate | 37 | 37 |
| 3—mostly accurate | 47 | 17 |
| 4—inaccurate | 10 | 0 |
assessment by one radiologist (18 years of experience). p value = 0.409 (paired t-test) for manual and DL
Four-point scale:
1. Very accurate: when the labeled liver part completely matches the original liver (over 95%)
2. Accurate: when the labeled liver almost completely matches the original liver (85–95%)
3. Mostly accurate: when the labeled liver part depicts the site of the original liver area (over
50%)
4. Inaccurate: when the labeled part is outside of the liver or only matches small area of original liver (under 50%)
Discussion
In this study, we proposed an active learning framework for liver segmentation using a customized 3D nnU-net [11] on MRI-PDFF. The selected images in each step were randomized according to the heuristic parameters for 65 scans. Active learning helped to improve abdominal liver segmentation with limited clinical training datasets. In addition, the proposed framework can reduce the effort required to generate a new ground truth. Moreover, the variability of the segmentation results may also be lower than that of the manual annotations.
Three-dimensional U-net has demonstrated superior performance in medical image processing, becoming one of the most popular methods for pixelwise semantic segmentation [16]. However, a few researchers developed this network further by combining detection architectures with a cascading method [12–16]. Tang et al. proposed a cascade framework comprising detection architectures using the VGG-16 model, followed by a segmentation module [16]. Roth et al. also demonstrated a second-stage cascading FCN model that focused on the target boundary areas [14].
To verify the performance of the proposed model without utilizing a cascade method, we compared the performance at each step due to the large size of the liver. As shown in Table 2 and Fig. 4, the best DSCs for liver segmentation were observed in the last step (0.94 ± 0.01) and the bias (confidence interval, 95%) of the Bland-Altman plots for the last step was − .67% (from − 10.43 to 4.06) on average. The average DSCs improved after each step, as shown in Fig. 3. We compared the performance of our framework with that of a recently published 2D U-net in nnU-net [11]. The performance of our model on MRI-PDFF was superior to that of 2D U-net [11], as shown in Table 3. The DSC of 2D and 3D nnU-net-based active learning is similar, but when Bland-Altman analysis was performed, it was found that the 3D nnU-net of the two methods was closer to the volume of the gold standard segmentation, indicated in Figs. 5 and 6. It is important to use 3D volumes for training complex segmentation, such as that for the abdominal liver on MRI-PDFF. The same results were obtained when other hospital datasets (Korea University Ansan Hospital, 6 scans) were used for external validation. The effect of active learning based on the 3D nnU-net is better than that of the 2D nnU-net, as shown in Table 3.
As shown in Table 4, the overall segmentation time was noticeably reduced following the incorporation of active learning; the reduction amounted to over half of the manual segmentation time. Moreover, although the comparison between manual drawing by an expert abdominal radiologist with 18 years of experience and the segmentation of DL was not statistically significant (Table 5), most of the slices that indicated DL’s superior performance were seen in the mistakenly drawn manual segmentations of the liver area on MRI-PDFF (Fig. 6). We observed that the grade 1 instances (very accurate) were greater for DL segmentations than for manual segmentations in qualitative visual assessment. We randomly selected 100 out of 905 slices (12 patients) for visual assessment. Manual segmentation by radiologists is probably the most accurate. However, we only used the PDFF maps as the original images. It is impossible to manually draw the exact liver edge from the lower edge of the liver and the highest dome of the liver that overlaps with other organs. In other words, the annotation tools generating gold standards for inferencing liver segmentation in the test dataset using the last model based on active learning was a limit to the precision of the gold standards. However, the inference of DL can be seen to have improved as it passed the first, second, and third steps. This finding suggests that although manual segmentation is used as a reference standard for initial training in active learning, the accuracy of DL-based segmentation is much higher. This discovery could be explained by the similarity coefficients of DL and manual segmentations being more similar for liver anatomy and specific features on MRI-PDFF datasets, which were initially generated by DL and thereafter by the CNN-assisted and post-modified segmentation.
Previously, applying self-supervised learning or active learning in medical image processing and other fields was proposed to effectively segment and reduce the annotation effort [17, 20, 21, 24]. Specifically, Kim et al. proposed a deep active learning framework based on cascaded 3D U-nets on computed tomography (CT) images, including those of the kidney. Their algorithms demonstrated impressive performance for multiple labels (five classes) on abdominal CT datasets. In the active learning framework, the suggested approach for multiple annotations led to the most effective manual annotations [24].
Finally, active learning, with the correction of manual segmentation, can be used to obtain new ground truth labels and iteratively train a deep CNN with limited datasets. Our proposed method enables fast and effective segmentation of complex abdominal organs with insufficient medical datasets. In other words, because human-produced annotations are not always consistent for obtaining the ground truth owing to interhuman and intrahuman variabilities, active learning can be a useful alternative to conventional DL with limited labeling in clinical environments.
However, this study has several limitations. First, the segmentation performance needs to be improved by increasing the training datasets and using better networks to resolve ambiguities [27, 28]. Second, we used limited datasets (six internal and six external datasets) to test active learning in each step. The segmentation performance for the internal (KUAH) and external (KUANH) datasets was different. The KUAH and KUANH datasets are similar to the protocol and image, but there is a slight difference in the actual image sequence. In the future, we will develop our algorithm by using multicenter datasets or various images to improve the segmentation performance in medical imaging. Third, further validation using additional or multicenter datasets as well as comparison with other segmentation networks such as cascaded networks should be performed to verify the efficiency and stability of the proposed approach. Fourth, because this research was a preliminary study for active learning on MRI-PDFF, we will exploit the analysis of radiomics using machine learning if the segmentation using active learning is superior to others in the clinical environment. Fifth, because only one radiologist performed manual liver segmentation for the first step of initial training, we could not measure interobserver variability. Finally, we first needed to segment the liver on MRI-PDFF to estimate the fat quantification performed in other studies with an image of one channel, and we used active learning based on DL to generate initial gold standards or annotations more efficiently. In the future, we will conduct liver segmentation using multiple channels.
Conclusion
In conclusion, an additional step should be to apply active learning for automatic segmentation generation to estimate fat quantification in the case of, for example, the presence and grading of hepatic steatosis 26. Our results demonstrate that a deep active learning framework (human-in-the-loop) could alleviate annotation efforts and costs by efficiently training on limited 3D MRI-PDFF datasets.
Abbreviations
- BLS
Boundary loss
- BCE
Binary cross-entropy
- CNN
Convolutional neural network
- CRFs
Conditional random fields
- CT
Computed tomography
- DL
Deep learning
- DLS
Dice loss
- DSC
Dice similarity coefficient
- FCN
Fully convolutional network
- MRI-PDFF
Magnetic resonance imaging–derived proton density fat fraction
- RPN
Region proposal network
- 3D
Three dimensional
Author Contribution
Y.C. and M.J.K. wrote the main manuscript. Y.C. performed the experiments and prepared the figures. B.J.P., K.C.S., Y.S.K., Y.E.H., D.J.S., and N.Y.H. prepared the dataset and confirmed the datasets. M.J.K. confirmed the datasets. All authors reviewed the manuscript. All authors were involved in writing the paper and approved the final submitted and published versions.
Funding
The authors received funding for this study from the Basic Science Research Program through the National Research Foundation of Korea funded by the Ministry of Education (2020R1I1A1A01071600).
Availability of Data and Material and Code Availability
Even though the limitations of our dataset (KUAH and KUANH) for public use are regulated by the Personal Information Protection Act in South Korea, we could expect to share our datasets and source code as requested.
Declarations
Ethics Approval
Our institutional review board approved this retrospective case-controlled study, and informed consent was waived. Experiments involving humans and/or the use of human tissue samples have not been performed in this study. In addition, no organs/tissues were procured from prisoners in this study.
Conflict of Interest
The authors declare no competing interests.
Footnotes
Key Points
• Our study enables efficient self-automated segmentation of the liver in magnetic resonance imaging–derived proton density fat fraction through deep active learning
• Active learning helped improve abdominal liver segmentation where limited clinical training datasets were available. The results of the 3D nnU-net with active learning were better than those of the 2D nnU-net with active learning
• Our results demonstrate that a deep active learning framework (human-in-the-loop) can lower the cost and effort of annotation by efficiently training using limited 3D MRI-PDFF datasets
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Rao M, et al. Comparison of human and automatic segmentations of kidneys from CT images. Int J Radiat Oncol Biol Phys. 2005;61:954–960. doi: 10.1016/j.ijrobp.2004.11.014. [DOI] [PubMed] [Google Scholar]
- 2.Pham DL, Xu C, Prince JL. Current methods in medical image segmentation. Annu Rev Biomed Eng. 2000;2:315–337. doi: 10.1146/annurev.bioeng.2.1.315. [DOI] [PubMed] [Google Scholar]
- 3.Chen H, Qi X, Cheng JZ, et al. Deep contextual networks for neuronal structure segmentation. AAAI’16: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. February, 2016;1167–1173.
- 4.Cai H, Verma R, Ou Y, Probabilistic segmentation of brain tumors based on multi-modality magnetic resonance images, , et al. 4th IEEE international symposium on biomedical imaging: from nano to macro. IEEE. 2007;2007:600–603. [Google Scholar]
- 5.Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer Assisted Intervention. 2015;234–241.
- 6.Zheng S, et al. Conditional random fields as recurrent neural networks. In Proceedings of the IEEE International Conference on Computer Vision. 2015;1529–1537.
- 7.Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision. 2015;1520–1528.
- 8.Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell. 2017;39:640–651. doi: 10.1109/TPAMI.2016.2572683. [DOI] [PubMed] [Google Scholar]
- 9.Chen LC, Papandreou G, Kokkinos I, et al. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell. 2018;40:834–848. doi: 10.1109/TPAMI.2017.2699184. [DOI] [PubMed] [Google Scholar]
- 10.Çiçek Ö, Abdulkadir A, Lienkamp SS, et al. 3D U-Net: learning dense volumetric segmentation from sparse annotation. International Conference on Medical Image Computing and Computer-assisted Intervention. 2016;424–432.
- 11.Isensee F, Petersen J, Kohl SAA, et al. nnU-Net: breaking the spell on successful medical image segmentation. ArXiv e-prints https://arxiv.org/abs/1904.08128; 2019.
- 12.Christ PF, et al. Automatic liver and lesion segmentation in CT using cascaded fully convolutional neural networks and 3D conditional random fields. International Conference on Medical Image Computing and Computer-Assisted Intervention. 2016;415–423.
- 13.Tang M, Zhang Z, Cobzas D, et al. Segmentation-by-detection: a cascade network for volumetric medical image segmentation. 2018 IEEE 15th International Symposium on Biomedical Imaging. 2018;1356–1359.
- 14.Roth HR, et al. An application of cascaded 3D fully convolutional networks for medical image segmentation. Comput Med Imag Grap. 2018;66:90–99. doi: 10.1016/j.compmedimag.2018.03.001. [DOI] [PubMed] [Google Scholar]
- 15.Cui S, Mao L, Jiang J, et al. Automatic semantic segmentation of brain gliomas from MRI images using a deep cascaded neural network. J Healthc Eng. 2018;4940593. [DOI] [PMC free article] [PubMed]
- 16.He Y, et al. Towards topological correct segmentation of macular OCT from cascaded FCNs. Fetal Infant Ophthalmic Med Image Anal. 2017;10554:202–209. doi: 10.1007/978-3-319-67561-9_23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Gorriz M, Carlier A, Faure E, et al. Cost-effective active learning for melanoma segmentation. ArXiv e-prints https://arxiv.org/abs/1711.09168; 2017.
- 18.Yang L, Zhang Y, Chen J, et al. Suggestive annotation: a deep active learning framework for biomedical image segmentation. International Conference on Medical Image Computing and Computer-assisted Intervention. 2017;399–407.
- 19.Kasarla T, Nagendar G, Hegde G, et al. Region-based active learning for efficient labeling in semantic segmentation. IEEE Winter Conference on Applications of Computer Vision (WACV). 10.1109/WACV.2019.00123; 2019.
- 20.Mackowiak R, et al. CEREALS - Cost-effective region-based active learning for semantic segmentation. ArXiv e-prints https://arxiv.org/abs/1810.09726; 2018.
- 21.Lubrano di Scandalea M, Perone CS, Boudreau M, et al. Deep active learning for axon-myelin segmentation on histology data. ArXiv e-prints https://arxiv.org/abs/1907.05143v1; 2019.
- 22.Si Wen S, et al. Comparison of different classifiers with active learning to support quality control in nucleus segmentation in pathology images. AMIA Jt Summits Transl Sci Proc. 2018;227–236. [PMC free article] [PubMed]
- 23.Sourati J, Gholipour A, Dy JG, et al. Active deep learning with fisher information for patch-wise semantic segmentation. Deep Learn Med Image Anal Multimodal Learn Clin Decis Support. 2018;11045:83–91. doi: 10.1007/978-3-030-00889-5_10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kim T, et al. Active learning for accuracy enhancement of semantic segmentation with CNN-corrected label curations: evaluation on kidney segmentation in abdominal CT. Sci Rep. 2020;10:366. doi: 10.1038/s41598-019-57242-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Wang G, et al. Interactive medical image segmentation using deep learning with image-specific fine tuning. IEEE Trans Med Imaging. 2018;37:1562–1573. doi: 10.1109/TMI.2018.2791721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Idilman IS, Aniktar H, Idilman R, et al. Hepatic steatosis: quantification by proton density fat fraction with MR imaging versus liver biopsy. Radiology. 2013;267(3):767–75. doi: 10.1148/radiol.13121360. [DOI] [PubMed] [Google Scholar]
- 27.Kohl S, et al. A probabilistic U-Net for segmentation of ambiguous images. Adv Neural Inf Process Syst. 2018;31:6965–6975. [Google Scholar]
- 28.Zhang D, Meng D, Han J. Co-saliency detection via a self-paced multiple-instance learning framework. IEEE Trans Pattern Anal Mach Intell. 2016;39:865–878. doi: 10.1109/TPAMI.2016.2567393. [DOI] [PubMed] [Google Scholar]
- 29.Junho Kim, Minjae Kim, Hyeonwoo Kang, Kwanghee Lee, U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation. arXiv:1907.10830, 2019.
- 30.Qin Y, Kamnitsas K, Ancha S, et al. Autofocus layer for semantic segmentation. https://arxiv.org/abs/1805.08403, arXiv:1805.08403. 2018.
- 31.Kervadec H, Bouchtiba J, Desrosiers C, et al. Boundary loss for highly unbalanced segmentation, https://arxiv.org/pdf/1812.07032.pdf; arXiv:1812.07032v2. 2018 [DOI] [PubMed]
- 32.Altman DG, Bland JM. Measurement in medicine: the analysis of method comparison studies. The Statistician. 1983;32(3):307–317. doi: 10.2307/2987937.JSTOR2987937. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Even though the limitations of our dataset (KUAH and KUANH) for public use are regulated by the Personal Information Protection Act in South Korea, we could expect to share our datasets and source code as requested.






