Skip to main content
PLOS One logoLink to PLOS One
. 2022 Aug 12;17(8):e0272602. doi: 10.1371/journal.pone.0272602

Data augmentation using image translation for underwater sonar image segmentation

Eon-ho Lee 1,#, Byungjae Park 2,#, Myung-Hwan Jeon 3, Hyesu Jang 4, Ayoung Kim 4, Sejin Lee 1,*
Editor: Mahdi Abbasi5
PMCID: PMC9374219  PMID: 35960747

Abstract

In underwater environment, the study of object recognition is an important basis for implementing an underwater unmanned vessel. For this purpose, abundant experimental data to train deep learning model is required. However, it is very difficult to obtain these data because the underwater experiment itself is very limited in terms of preparation time and resources. In this study, the image transformation model, Pix2Pix is utilized to generate data similar to experimental one obtained by our ROV named SPARUS between the pool and reservoir. These generated data are applied to train the other deep learning model, FCN for a pixel segmentation of images. The original sonar image and its mask image have to be prepared for all training data to train the image segmentation model and it takes a lot of effort to do it what if all training data are supposed to be real sonar images. Fortunately, this burden can be released here, for the pairs of mask image and synthesized sonar image are already consisted in the image transformation step. The validity of the proposed procedures is verified from the performance of the image segmentation result. In this study, when only real sonar images are used for training, the mean accuracy is 0.7525 and the mean IoU is 0.7275. When the both synthetic and real data is used for training, the mean accuracy is 0.81 and the mean IoU is 0.7225. Comparing the results, the performance of mean accuracy increase to 6%, performance of the mean IoU is similar value.

Introduction

Recognition of objects underwater is essential for rescue or evidence search operations [1, 2]. However, cameras that are mainly used on land are difficult to use underwater for object recognition because the visibility is poor due to insufficient lighting and floats in water [3]. Unlike cameras, underwater sonar can be used in water because its signals can reach a long distance without being affected by the lighting or suspended solids [4, 5]. However, images obtained from imaging sonar are difficult to use for object recognition because their resolution is not high and they contain noises. Several object recognition methods have been proposed to address this issue [6]. For example, a spectral analysis method has been proposed for seafloor sediment classification [7]. Another study proposed using a measure called lacunarity to classify the characteristics of the seafloor [8].

This paper expands on the results of the previous work and proposes a method that uses a neural network (NN) model for image translation [9] to synthesize realistic underwater sonar image (USI)s and then uses them in data augmentation. If image translation is used, it is possible to transform the style of the synthetic USI similar to that of the real USI. In the previous work, the variability is limited. In contrast, in this paper, the variability of the results of image translation is ensured, which is advantageous for data augmentation.

Furthermore, in the previous work, only the synthesis effect for the background noise can be expected in a limited manner. In contrast, in this paper, the gradation effect of the background noise and the object shadowing effect are produced like real ones by image translation according to the fan shape and location of the background noise, which are basic styles of multibeam imaging sonar images.

For the validation of the effectiveness of the proposed image translation-based data augmentation, we evaluated the quantitative performance of the semantic segmentation NN trained using the proposed method. Semantic segmentation can find not only the location of the object but also the shape in a given image by performing pixel-level classification. As semantic segmentation performs a more complex task than object classification or detection, the number of parameters of the NN for semantic segmentation is greater than that for object classification or detection. As more data are required to train the NN for semantic segmentation than other NNs, it is a good task for validating the effectiveness of the proposed image translation-based data augmentation.

The remainder of this paper is structured as follows. Section 2 briefly introduces the proposed image translation-based data augmentation and the pipeline that performs semantic segmentation for the verification of its performance. Sections 3 and 4 describe the image translation-based data augmentation and image segmentation, respectively. Section 5 introduces information related to the underwater sonar dataset and the training of the image translation NN and semantic segmentation NN. Section 6 presents the experimental results and qualitative performance evaluations, and Section 7 summarizes the conclusions of this paper and the future work.

Related work

Recently, deep-learning-based object recognition methods have been suggested. For instance, a method that uses a convolutional neural network(CNN) to extract features and then uses a support vector machine to perform object classification has been proposed [10]. Furthermore, some researchers have proposed methods that apply an end-to-end approach while using a CNN to extract features for object detection or classification [1113]. There is a critical limitation when deep-learning-based methods are used underwater for object recognition. NN, the most important part of deep learning, consists of numerous parameters, which are trained based on data. If data are not sufficient, the parameters of the NN are not properly trained; consequently, if overfitting occurs, a robust operation cannot be expected.

Unfortunately, it is a challenging task to collect data abundantly to train the NN due to the characteristics of the underwater environment. First, although the underwater environment is very large and has various characteristics, the region where data can be collected is limited. Second, considerable amounts of time and resources are required for data collection in the underwater environment. There are two solutions for training an NN with a small dataset: using transfer learning [14, 15] and data augmentation. Transfer learning method reuses a pretrained model trained with a large dataset as a backbone of an NN for a specific task. The lower layers of the NN copies parameters of a pretrained model, and then the NN trained with a small dataset. Data augmentation refers to the transform of data in a dataset, such as a crop or resize [16]. If transfer learning and data augmentation using a transform are performed, the NN can be trained better; however, when the size of the dataset is small, these solutions may be only a supplementation within a limited range.

Some researchers have proposed a method that performs data augmentation by synthesizing data instead of data augmentation using a transform [16, 17]. This method creates synthetic data by transforming styles, such as the background patterns, as if the data have been obtained underwater. Then, the synthetic data are used with the real underwater data together when training the NN. In our previous work [18], we used a supervised style transfer to transform styles, such as the background patterns of a synthetic USI generated via simulation, into styles similar to those of the real USI; subsequently, we used them to augment the training data of the NN for object detection.

Overview

As shown in Fig 1, the pipeline that performs underwater sonar semantic segmentation using the proposed data augmentation method consists of two stages overall: (1) the training of the image translation NN and the generation of a synthetic USI using the trained image translation NN; (2) the training of the semantic segmentation NN using both the real USI and the synthetic USI generated in the previous stage.

Fig 1. Pipeline of the proposed method.

Fig 1

Data augmentation using image translation

Image translation model

The proposed method uses Pix2Pix, which is an image translation NN using cGAN [9, 19]. Pix2Pix consists of two sub-NNs: generator (G) and discriminator (D). G generates a fake image y for the input source image x and the random noise z. D distinguishes whether the input image is real or fake.

y=G(x,z). (1)

During the training of the image translation NN, G is trained to generate synthetic USI that are difficult to distinguish from real images. Simultaneously, D is trained adversarially to distinguish real and synthetic USIs properly. In the proposed method, a synthetic USI image with a binary mask is used as an input of G; the real USI and the synthetic USI generated by G are used as the inputs of D as shown in Fig 2.

Fig 2. Image translation NN used in the proposed method.

Fig 2

The generator generates a synthetic USI from the binary mask; the discriminator distinguishes the real USI and the synthetic USI generated by the generator. The generator and discriminator are trained together.

G has a structure similar to that of the U-Net model [20], in which multiple skip-connections exist in the encoder-decoder. Multiple skip-connections deliver contexts of multiple levels from the encoder to the decoder directly, contributing to improving the quality of the synthetic USI generated from G. D uses a PatchGAN model [9]. It is used to prevent the phenomenon whereby the trend of updating the parameters of G during the training of the image translation NN focuses more on deceiving D than making the synthetic USI similar to the real USI. If this phenomenon occurs, G generates a blurry synthetic USI. When the PatchGAN model compares the real USI and the synthetic USI, it does not compare the entire image but compares in patch units of certain regions. When the PatchGAN model is used, G generates a sharper synthetic USI.

When the G and D of the image translation NN are trained simultaneously, the following two losses are used together: (1) adversarial loss; (2) L1 loss. The adversarial loss is used to train G and D simultaneously, which are two sub-NNs that have adversarial goals in cGAN [9]:

Ladv(G,D)=Ex,y[logD(x,y)]+Ex,y[log(1-D(x,G(x,z)))]. (2)

The L1 loss is used together to not only deceive D when updating the parameters of G in the training process but also generate synthetic USIs similar to the real USIs [9]:

LL1(G)=Ex,y,z[y-G(x,y)1]. (3)

The final loss function can be defined as follows [9]:

G*=argmaxGminDLadv(G,D)+λ·LL1(G), (4)

In the above equation, λ is a hyperparameter for controlling the influence of the L1 loss.

G consists of 8 encoder blocks and 8 decoder blocks each. In each block of the encoder, a convolution layer is used, the method of normalization is a batch norm, and activation function is Leaky ReLU. In each block of the decoder, transposed convolution layer and dropout is used, a method of normalization is the batch norm, and activation function is ReLU. D consists of 3 blocks. Each block uses a convolution layer, batch norm as a method of normalization, and Leaky ReLU as an activation function.

Underwater sonar image segmentation

Image segmentation model

Semantic segmentation performs the task of predicting each pixel’s class in a given image. Therefore, if semantic segmentation is applied to the USI, we can determine which pixels in the USI are occupied by the underground object that we want to find as shown in Fig 3. A fully convolutional networks (FCN) [21] is applied for the USI segmentation. A FCN is built by modifying VGG16 [22], a well-known NN used in image classification. The fully connected layers in VGG16 [22] are removed, and then 1 × 1 convolution layers, upsampling layers, and skip-connections [20] are added to facilitate dense prediction that outputs the pixel-level classification result with the same size as the input image.

Fig 3. USI segmentation using a fully convolutional neural network.

Fig 3

Underwater sonar image dataset

Synthetic underwater sonar image generation

To train the image translation NN to generate synthetic USIs, we need a training dataset in which the source images and the images targeted to be generated are paired. To create the dataset, the real USIs are collected first, and then, the annotation tool is used to create the binary masks for the objects that will be segmented as shown in Fig 4.

Fig 4. Example of a training dataset.

Fig 4

That has pairs to be used in the image translation NN training for USI generation.

After training the image translation NN, G in the image translation NN is used to generate a USI as shown in Fig 5. The G can synthesize USIs of the object with various poses and lighting conditions, which do not exist in the training dataset.

Fig 5. Real USI and USIs generated by the generator after training the image translation NN.

Fig 5

The generator of the image translation can create USIs of the object with various poses and lighting conditions, which do not exist in the training dataset.

The dataset and the hyperparameters used to train the image translation NN will be described in detail in “Experimental Results”.

Training using synthetic underwater sonar image

The NN for semantic segmentation has a larger number of parameters than object classification and detection because it performs dense prediction. Therefore, a considerable amount of data is required to train the semantic segmentation NN. Moreover, the task of annotating the ground truth for semantic segmentation requires more effort than annotating the ground truth used for image classification or object recognition on an image. To make a ground truth, pixel-level labeling is required for semantic segmentation.

The real USI and the aforementioned USI synthesized using the image translation NN are used together to train the NN for USI segmentation. The following are the advantages of using a synthetic USI: (1) the effort required for additional experiments to obtain real USIs can be reduced; (2) in the case of a synthetic USI, a task of annotating the ground truth is not required.

Environmental conditions of underwater sonar image dataset

We constructed datasets in two actual underwater environments using TELEDYNE BlueView M900–90 sonar to train the image translation NN and the semantic segmentation NN. The sonar had a frequency of 900 kHz, a beam width of 90° in the horizontal direction and 20° in the vertical direction, and a detection range of 100 m.

The first underwater environment was a reservoir Fig 6A. In this environment, we sunk a mannequin to the bottom of the reservoir, and then attached the sonar to the boat to obtain the data. The second environment was a water pool Fig 6B. The pool had a width of 10 m, a length of 12 m, and a depth of 1.5 m. After sinking a mannequin with other artificial objects, such as a box and tire, to the bottom of the pool, we attached the sonar to SPARUS [23], an unmanned underwater vehicle, to obtain the data. When acquiring data in each environment, we adjusted the sensitivity of the sonar to create two conditions in each environment (reservoir high sensitivity—RHS, reservoir low sensitivity—RLS, pool high sensitivity—PHS, and pool low sensitivity—PLS) as shown in Fig 7.

Fig 6. Underwater environments and platforms.

Fig 6

(A) Reservoir and boat, (B) Pool and underwater vehicle.

Fig 7. Four datasets created in two environments.

Fig 7

(A) Reservoir low sensitivity, (B) Reservoir high sensitivity, (C) Pool low sensitivity, (D) Pool high sensitivity.

NVIDIA T4 GPU and Keras package were used to train the image translation NN and semantic segmentation NN. We trained four image translation NNs to generate the synthetic USI for the datasets of the four conditions. In the training dataset for each environment, a binary mask and a USI are paired up, as shown in Fig 4, and there are 200 paired images with a size of 512 × 256. So, real USI used is a total of 800 pages. The synthetic USI generated is 200 pages for each type, for a total of 800 pages with a size of 640 × 480. The image translation NN was trained using the training dataset with Adam optimizer for 150 epochs, and the hyperparameters of the sub-NNs, G and D, were as shown in Table 1.

Table 1. Hyperparameters of the image translation neural network.

G D
Learning rate 0.0004 0.0004
Batch size 1 1
β 1 0.5 0.5
β 2 0.999 0.999

Learning rate, batch size, β1 and β2 are the hyperparameters of the sub-NNs, G and D.

The semantic segmentation NN was trained to classify the pixels occupied by the object from the background in the USI based on the real datasets of the four conditions and the synthetic datasets of four conditions generated through the image translation NN. The training dataset for each condition consisted of a set of paired binary masks and USI as shown in Fig 4. There were 200 paired images, each having a size of 640 × 480. The following table shows the description of the eight combined datasets used to train the semantic segmentation NN.

The semantic segmentation NN was trained using the eight training datasets with Adam optimizer for 200 epochs, and the hyperparameters were shown in Table 2.

Table 2. Hyperparameters of the semantic segmentation neural network.

FCN
Learning rate 0.001
Batch size 5
β 1 0.9
β 2 0.999

Learning rate, batch size, β1 and β2 are the hyperparameters of FCN.

Based on the eight datasets Table 3, we constructed three training datasets Table 4 to validate the effectiveness of the data augmentation using the synthetic USI, which were created using the image translation NN.

Table 3. Real and synthetic underwater sonar image datasets.

Name Environment Sensitivity Real/Synthetic
RHS_Real Reservoir High Real
RHS_Synth Reservoir High Synthetic
RLS_Real Reservoir Low Real
RLS_Synth Reservoir Low Synthetic
PHS_Real Pool High Real
PHS_Synth Pool High Synthetic
PLS_Real Pool Low Real
PLS_Synth Pool Low Synthetic

RHS, reservoir high sensitivity; RLS, reservoir low sensitivity; PHS, pool high sensitivity; PLS, pool low sensitivity; _Real, real underwater sonar image; _Synth, synthetic underwater sonar image.

Table 4. Real, synthetic and augmented underwater sonar image datasets.

Combinations
T_Real RLS_Real + RHS_Real + PLS_Real + PHS_Real
T_Synth RLS_Synth + RHS_Synth + PLS_Synth + PHS_Synth
T_Aug T_Real + T_Synth

T_Real, dataset is a combination of the real datasets of the four conditions; T_Synth, dataset is a combination of the synthetic datasets of the four conditions; T_Aug, dataset is a combination of the real datasets and synthetic datasets.

The T_Real dataset is a combination of the real datasets of the four conditions, and T_Synth is a combination of the synthetic datasets of the four conditions. On the other hand, T_Aug is a combination of the real datasets and synthetic datasets. When the semantic segmentation NN was trained using these datasets, the same number of paired images was used through uniform sampling. When the semantic segmentation NN was trained using the three training datasets, the hyperparameters used were the same as those used when the eight training datasets shown in Fig 2 were used.

Experimental results

Synthetic underwater sonar image generation

Fig 8 shows samples of the results of generating the synthetic USI similar to the real USI of the four conditions using the model for the image translation NN.

Fig 8. Results of synthetic underwater sonar image generation (left: Real underwater sonar image, right: Synthetic underwater sonar image).

Fig 8

(A) Reservoir low sensitivity, (B) Reservoir high sensitivity, (C) Pool low sensitivity, (D) Pool high sensitivity.

Fig 8 confirms that the image translation NN can generate a synthetic USI by reflecting the characteristics of the real USI. The bottoms of the reservoir and pool are darker than the underwater objects. The underwater objects are brighter than the bottoms of the reservoir and pool, whereas there are brightness differences and shadows depending on the location of the sonar. Furthermore, when Fig 8A and 8C are compared with Fig 8B and 8D, it is observed that the effect of the sonar sensitivity on the USI can be determined from the image translation NN.

Underwater sonar image segmentation

Fig 9 shows samples of the results of segmenting a real USI after training the semantic segmentation NN using the synthetic USI created by the underwater image translation (UIT) NN. Fig 9A–9C show that the pixels corresponding to the positions of the objects were segmented properly. However, in the case of Fig 9D, which is a result of segmenting a USI collected by increasing the sonar’s sensitivity in the pool environment, it is observed that the pixels corresponding to the positions of the objects were not segmented properly, and the pixels corresponding to some parts of the pool boundary were segmented incorrectly.

Fig 9. Results of segmenting a real USI with the USI NN trained using the synthetic USI created by the PLS_Synth model.

Fig 9

(A) Reservoir low sensitivity, (B) Reservoir high sensitivity, (C) Pool low sensitivity, (D) Pool high sensitivity.

We used two indicators called mean accuracy and mean intersection over union (IoU) to analyze the results of training the semantic segmentation NN quantitatively using the datasets created using the synthetic USI. Table 5 shows the results of segmenting the real USI with the semantic segmentation NN trained using the real USI datasets and synthetic USI datasets.

Table 5. Semantic segmentation results using only one type of the real underwater sonar image datasets and synthetic underwater sonar image datasets.

Train dataset Test dataset Mean accuracy Mean IoU
RLS_Real RLS_Real 0.77 0.75
RLS_Synth 0.76 0.63
RHS_Real RHS_Real 0.96 0.88
RHS_Synth 0.81 0.71
PLS_Real PLS_Real 0.87 0.76
PLS_Synth 0.59 0.57
PHS_Real PHS_Real 0.82 0.73
PHS_Synth 0.81 0.71

Training dataset: real underwater sonar images and synthetic underwater sonar images.

Results of segmenting a real underwater sonar image with the semantic segmentation neural network trained using only one type of the real underwater sonar image datasets and synthetic underwater sonar image datasets.

The performance values of the semantic segmentation NN trained using the real USI datasets and the synthetic USI datasets are shown in Table 5. This shows that there is no significant difference in performance between the semantic segmentation NNs trained using the synthetic USI datasets and those trained using the real USI datasets.

Data augmentation results

We conducted an experiment to investigate whether the performance of the semantic segmentation NN can be improved when the synthetic USI generated by the proposed image translation NN is used for data augmentation. As shown in Fig 10, we used the dataset that contained both real USI and synthetic USI (T_Aug) and the dataset that contained only the real USI (T_Real) to train the semantic segmentation NN, and then compared the results of segmenting real USIs. Table 6 shows the results of a quantitative analysis performed using the mean accuracy and mean IoU.

Fig 10. USI segmentation results (left: Semantic segmentation NN trained using real USI, right: Semantic segmentation NN trained using real USI and synthetic USI).

Fig 10

(A) Reservoir low sensitivity, (B) Reservoir high sensitivity, (C) Pool low sensitivity, (D) Pool high sensitivity.

Table 6. Semantic segmentation results using a training dataset of real underwater sonar images and synthetic underwater sonar images.

Train dataset Test dataset Mean accuracy Mean IoU
T_Real RLS_Real 0.75 0.73
T_Aug 0.77 0.74
T_Real RHS_Real 0.77 0.75
T_Aug 0.84 0.76
T_Real PLS_Real 0.74 0.71
T_Aug 0.80 0.70
T_Real PHS_Real 0.75 0.72
T_Aug 0.83 0.69

Results of segmenting a real underwater sonar image with the semantic segmentation neural network trained using the real underwater sonar image datasets and synthetic underwater sonar image datasets.

As shown in Table 6 the semantic segmentation NN trained using the real USI datasets and synthetic datasets and the semantic segmentation NN trained using the synthetic USI can segment the real USI properly.

The semantic segmentation NN trained using both the real USI and synthetic USI showed improved performance over that trained using only the real USI. However, when the real USIs collected in the pool environment (PLS_Real and PHS_Real) were segmented, the mean IoUs decreased slightly although the mean accuracies increased. Considering the characteristics of the mean IoU calculation process, we concluded that this occurred because some pixels of the USI were incorrectly segmented.

During the experiment, we used the same number of USI and binary mask pairs in T_Real, and T_Aug for the performance comparison. Nonetheless, when the generation of synthetic USI using the proposed UIT is applied in practice, the performance of the semantic segmentation NN will be improved because the proposed method can generate a large number of synthesized USIs for data augmentation.

Conclusions and future works

In this paper, we proposed a data augmentation method using a UIT NN that can improve the semantic segmentation performance for object recognition in underwater environments where data collection is limited. The UIT is able to generate synthesized USIs with various poses and lighting conditions. If the proposed data augmentation method using the UIT is used, a large amount of synthetic USIs that are similar to real USIs can be created to train the semantic segmentation NN, even if the size of the real USI data is small. S1 Fig is the qualitative semantic segmentation result of Tables 5 and 6. S1 and S2 Tables are quantitative semantic segmentation results that extend Tables 5 and 6. S1 Fig shows the results of training the semantic segmentation NN using the single types of training datasets separately. If the data used for training and testing have the same ‘Environment’ and ‘Sensitivity’ among the data types, the performance of Semantic Segmentation is good. On the other hand, when the types of data were different, the results of semantic segmentation was that the object of interest could not be recognized or the part that was not the object of interest was recognized as an object. These results are obtained when training with single type data, and when augmented data as shown in Table 4 is used for training, image segmentation performance is improved as shown in S1 Fig.

Additionally, if the semantic segmentation NN is trained using both the real USIs and synthetic USIs, the performance of semantic segmentation can be improved compared with when it is trained using only the real USIs. Furthermore, as shown in S1 Fig, S1 and S2 Tables, semantic segment results are good even when using only synthetic USIs when training models, showing the possibility that only synthetic USIs can be used as training data.

The UIT NN used in this paper has the limitation that it needs explicit pairs of binary masks and USI. To mitigate this limitation, in the future, we can use a category-level UIT NN with cycle consistency loss is applied [24]. Furthermore, as the UIT NN used in this paper can generate a synthetic USI for one environment or condition, the parameters of the NN have to be trained again to generate a synthetic USI for another environment or condition. To mitigate this, we can use a UIT NN that can handle multiple domains at once [25].

Supporting information

S1 Fig. Results of image segmentation by type of datasets (horizontal axis: Type of test datasets, vertical axis: Type of training datasets).

(A) Reservoir low sensitivity_Real, (B) Reservoir high sensitivity_Real, (C) Pool low sensitivity_Real, (D) Pool high sensitivity_Real, (E) Reservoir low sensitivity_Synth, (F) Reservoir high sensitivity_Synth, (G) Pool low sensitivity_Synth, (H) Pool high sensitivity_Synth, (I) T_Real, (J) T_Synth, (K) T_Aug.

(TIF)

S1 Table. Mean accuracy results of image segmentation by type of datasets (horizontal axis: Type of test datasets, vertical axis: Type of training datasets).

(A) Reservoir low sensitivity_Real, (B) Reservoir high sensitivity_Real, (C) Pool low sensitivity_Real, (D) Pool high sensitivity_Real, (E) Reservoir low sensitivity_Synth, (F) Reservoir high sensitivity_Synth, (G) Pool low sensitivity_Synth, (H) Pool high sensitivity_Synth, (I) T_Real, (J) T_Synth, (K) T_Aug.

(TXT)

S2 Table. Mean IoU results of image segmentation by type of datasets (horizontal axis: Type of test datasets, vertical axis: Type of training datasets).

(A) Reservoir low sensitivity_Real, (B) Reservoir high sensitivity_Real, (C) Pool low sensitivity_Real, (D) Pool high sensitivity_Real, (E) Reservoir low sensitivity_Synth, (F) Reservoir high sensitivity_Synth, (G) Pool low sensitivity_Synth, (H) Pool high sensitivity_Synth, (I) T_Real, (J) T_Synth, (K) T_Aug.

(TXT)

S3 Table. Abbreviation table.

(TXT)

S1 Dataset

(Z01)

S2 Dataset

(ZIP)

Data Availability

All relevant data are within the paper and its Supporting information files.

Funding Statement

This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. 2019R1F1A1053708, 2021R1F1A1057949) and This research was supported by Development of standard manufacturing technology for marine leisure vessels and safety support robots for underwater leisure activities of Korea institute of Marine Science & Technology Promotion (KIMST) funded by the Ministry of Oceans and Fisheries (KIMST-20220567).

References

  • 1. Cho H.; Gu J.; Joe H.; Asada A.; Yu S.-C. Acoustic beam profile-based rapid underwater object detection for an imaging sonar. Journal of Marine Science and Technology. 2015. Mar;20:180–197. doi: 10.1007/s00773-014-0294-x [DOI] [Google Scholar]
  • 2.Purcell, M.; Gallo, D.; Packard, G.; Dennett, M.; Rothenbeck, M.; Sherrell, A.; et al. Use of REMUS 6000 AUVs in the search for the Air France flight 447. Proceedings of the IEEE/MTS OCEANS Conference and Exhibition, 2011 Sept;1–7.
  • 3.Shin, Y.-S.; Lee, Y.; Choi, H.-T.; Kim, A. Bundle adjustment from sonar images and SLAM application for seafloor mapping. Proceedings of the IEEE/MTS OCEANS Conference and Exhibition, 2015 Oct;1–6.
  • 4.Fallon, M. F.; Kaess, M.; Johannsson, H.; Leonard, J. J. Efficient AUV navigation fusing acoustic ranging and side-scan sonar. Proceedings of the IEEE International Conference on Robotics and Automation, 2011 May;2398–2405.
  • 5.S. M. T. Inc., “Navigator,” 1984 [cited 7 April 2022]. In: Web sites [internet]. Canada Available from: http://www.sharkmarine.com/.
  • 6.Galceran, E.; Djapic, V.; Carreras, M.; Williams, D. P. A real-time underwater object detection algorithm for multi-beam forward looking sonar. IFAC Proceedings Volumes, 2012;45(5):306–311.
  • 7. Zhou X.; Chen Y. Seafloor sediment classification based on multibeam sonar data. Geo-spatial Information Science, 2004;7(4):290–296. doi: 10.1007/BF02828555 [DOI] [Google Scholar]
  • 8. Williams D. P. Fast unsupervised seafloor characterization in sonar imagery using lacunarity. IEEE Transactions on Geoscience and Remote Sensing, 2015. Nov;53(11):6022–6034. doi: 10.1109/TGRS.2015.2431322 [DOI] [Google Scholar]
  • 9.Isola, P.; Zhu, J. -Y.; Zhou, T.; Efros, A. A. Image-to-Image Translation with Conditional Adversarial Networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017;1125–1134.
  • 10.Zhu, P.; Isaacs, J.; Fu, B.; Ferrari, S. Deep learning feature extraction for target recognition and classification in underwater sonar images. Proceedings of the IEEE Conference on Decision and Control, 2017;2724–2731.
  • 11.Williams, D. P. Underwater target classification in synthetic aperture sonar imagery using deep convolutional neural networks. Proceedings of the International Conference Pattern Recognition, 2016 Dec;2497–2502.
  • 12.Kim, J.; Cho, H.; Pyo, J.; Kim, B.; Yu, S.-C. The convolution neural network-based agent vehicle detection using forward-looking sonar image. Proceedings of the IEEE/MTS OCEANS Conference and Exhibition, 2016;1–5.
  • 13.Jin, L.; Liang, H.; Yang, C. Accurate Underwater ATR in Forward-Looking Sonar Imagery Using Deep Convolutional Neural Networks. IEEE Access, 2019;7:125522–125531.
  • 14.McKay, J.; Gerg, I.; Monga, V.; Raj, R. G. What’s mine is yours: Pretrained CNNs for limited training sonar ATR. Proceedings of the IEEE/MTS OCEANS Conference and Exhibition, 2017;1–7.
  • 15.Yun, S.; Han, D.; Oh, S. J.; Chun, S.; Choe, J.; Yoo, Y. CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019;6023–6032.
  • 16.Denos, K.; Ravaut, M.; Fagette, A.; Lim, H. Deep learning applied to underwater mine warfare. Proceedings of the IEEE/MTS OCEANS Conference and Exhibition, 2017;1–7.
  • 17. Huo G.; Wu Z.; Li J. Underwater Object Classification in Sidescan Sonar Images Using Deep Transfer Learning and Semisynthetic Training Data. IEEE Access, 2020;8:47407–47418. doi: 10.1109/ACCESS.2020.2978880 [DOI] [Google Scholar]
  • 18.Lee, S.; Park, B.; Kim, A. Deep Learning from Shallow Dives: Sonar Image Generation and Training for Underwater Object Detection. IEEE ICRA Workshop on Underwater Robotics Perception, Montreal, 2019 May;abs/1810.07990.
  • 19.Mirza, M.; Osindero, S. Conditional Generative Adversarial Nets. arXiv:1411.1784, 2014.
  • 20.Ronneberger, O; Fischer, P;, Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention, 2015;9351:234–241.
  • 21.Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2015;3431–3440.
  • 22.Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556, 2015.
  • 23. Carreras M. et al. Sparus II AUV—A Hovering Vehicle for Seabed Inspection, IEEE Journal of Oceanic Engineering, 2018. Apr;43(2):344–355. doi: 10.1109/JOE.2018.2792278 [DOI] [Google Scholar]
  • 24.Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A. A. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. Proceedings of IEEE International Conference on Computer Vision, 2017 Oct; 2223–2232.
  • 25.Choi, Y.; Uh, Y.; Yoo, J.; Ha, J.-W. StarGAN v2: Diverse Image Synthesis for Multiple Domains. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020;8188–8197.

Decision Letter 0

Mahdi Abbasi

26 May 2022

PONE-D-22-11679Data Augmentation Using Image Translation for Underwater Sonar Image SegmentationPLOS ONE

Dear Dr. LEE,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jul 10 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Mahdi Abbasi, PhD.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. 

When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

4. Thank you for stating the following in the Acknowledgments Section of your manuscript: 

"This research was supported by the National Research Foundation of Korea (NRF)

grant funded by the Korean government (MSIT) (No. 2019R1F1A1053708,

2021R1F1A1057949) and ”Regional Innovation Strategy (RIS)” through the National

Research Foundation of Korea(NRF) funded by the Ministry of Education(MOE)

(2021RIS-004)."

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. 

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: 

"This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. 2019R1F1A1053708, 2021R1F1A1057949) and "Regional Innovation Strategy (RIS)" through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(MOE) (2021RIS-004)."

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In this manuscript, the authors used the Pix2Pix Generative Adversarial Network for image translation and then leverage it for data augmentation. Then a Fully Convolutional Network is used for image segmentation. The main aim of this research is to avoid over-fitting the model. Due to the extensive research conducted recently, it is absolutely necessary to pay attention to the following.

1- I suggest you have a brief overview of the related work on data augmentation and image segmentation. You can transfer a subsection of the "Introduction" to the "Related work" section.

2- The whole manuscript is full of abbreviations without defining them. Please define them before use.

3- Please, describe the details of the neural network architecture used. Includes the number of layers, neurons and activation functions and so on.

4- Please, describe in detail the data used. Includes the number of real data, synthetic data, dimensions of images and so on.

5- Please describe how to combine synthetic and real data to make augmented data.

6- What is the advantage of using FCN? That convinces us to augment data to learn the model.

7- Please provide a comparison without the use of incremental data to show that your method prevents over-fitting

Reviewer #2: This paper has been to improve Underwater Sonar Image Segmentation by data augmentation and reduce the limitations of previous methods. However, before publishing the paper, I suggest some minor revisions, as follows:

1. In the abstract, please mention the exact amount of improvement and the metric that has been improved.

2. Organize the article in a standard format. Some sections, including sections 3 and 4, can be merged into one section.

3. It is necessary to provide an architecture of the proposed method in your work.

4. In the result section, various tables have been provided. However, it is not discussed comprehensively.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Amin Nazari

Reviewer #2: Yes: Fazeleh Tavassolian

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: PLOS ONE.docx

PLoS One. 2022 Aug 12;17(8):e0272602. doi: 10.1371/journal.pone.0272602.r002

Author response to Decision Letter 0


8 Jul 2022

We would like to thank you and the reviewers for thorough reading of our manuscript and the comments which helped us to enhance the quality of the work. Their constructive comments are well-received and highly appreciated. We have carefully considered the reviewers' recommendation to further improve the clarity and quality of our manuscript. We hope that the revision made have improved the manuscript at all levels, and that the changes made in an attempt to address the comments of the reviewers are satisfactory. Detailed information is attached as a rebuttal letter file.

Attachment

Submitted filename: REBUTTAL LETTER_PLOS ONE_To_Reviewer2.docx

Decision Letter 1

Mahdi Abbasi

25 Jul 2022

Data Augmentation Using Image Translation for Underwater Sonar Image Segmentation

PONE-D-22-11679R1

Dear Dr. LEE,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Mahdi Abbasi, PhD.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

Reviewer #2: The authors have clarified several of the questions I raised in my previous review. The title and abstract are appropriate for the content of the text. Furthermore, the article is well constructed, the experiments were well conducted, and analysis was well performed. The conclusion presented by this manuscript seems correct.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Fazeleh Tavassolian

**********

Acceptance letter

Mahdi Abbasi

4 Aug 2022

PONE-D-22-11679R1

Data Augmentation Using Image Translation for Underwater Sonar Image Segmentation

Dear Dr. LEE:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Mahdi Abbasi

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Results of image segmentation by type of datasets (horizontal axis: Type of test datasets, vertical axis: Type of training datasets).

    (A) Reservoir low sensitivity_Real, (B) Reservoir high sensitivity_Real, (C) Pool low sensitivity_Real, (D) Pool high sensitivity_Real, (E) Reservoir low sensitivity_Synth, (F) Reservoir high sensitivity_Synth, (G) Pool low sensitivity_Synth, (H) Pool high sensitivity_Synth, (I) T_Real, (J) T_Synth, (K) T_Aug.

    (TIF)

    S1 Table. Mean accuracy results of image segmentation by type of datasets (horizontal axis: Type of test datasets, vertical axis: Type of training datasets).

    (A) Reservoir low sensitivity_Real, (B) Reservoir high sensitivity_Real, (C) Pool low sensitivity_Real, (D) Pool high sensitivity_Real, (E) Reservoir low sensitivity_Synth, (F) Reservoir high sensitivity_Synth, (G) Pool low sensitivity_Synth, (H) Pool high sensitivity_Synth, (I) T_Real, (J) T_Synth, (K) T_Aug.

    (TXT)

    S2 Table. Mean IoU results of image segmentation by type of datasets (horizontal axis: Type of test datasets, vertical axis: Type of training datasets).

    (A) Reservoir low sensitivity_Real, (B) Reservoir high sensitivity_Real, (C) Pool low sensitivity_Real, (D) Pool high sensitivity_Real, (E) Reservoir low sensitivity_Synth, (F) Reservoir high sensitivity_Synth, (G) Pool low sensitivity_Synth, (H) Pool high sensitivity_Synth, (I) T_Real, (J) T_Synth, (K) T_Aug.

    (TXT)

    S3 Table. Abbreviation table.

    (TXT)

    S1 Dataset

    (Z01)

    S2 Dataset

    (ZIP)

    Attachment

    Submitted filename: PLOS ONE.docx

    Attachment

    Submitted filename: REBUTTAL LETTER_PLOS ONE_To_Reviewer2.docx

    Data Availability Statement

    All relevant data are within the paper and its Supporting information files.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES