Avoiding catastrophic overfitting in fast adversarial training with adaptive similarity step size

Jie-Chao Zhao; Jin Ding; Yong-Zhi Sun; Ping Tan; Ji-En Ma; You-Tong Fang

doi:10.1371/journal.pone.0317023

. 2025 Jan 7;20(1):e0317023. doi: 10.1371/journal.pone.0317023

Avoiding catastrophic overfitting in fast adversarial training with adaptive similarity step size

Jie-Chao Zhao ¹, Jin Ding ^1,^2,^*, Yong-Zhi Sun ¹, Ping Tan ¹, Ji-En Ma ³, You-Tong Fang ³

Editor: Alberto Marchisio⁴

PMCID: PMC11706396 PMID: 39774503

Abstract

Adversarial training has become a primary method for enhancing the robustness of deep learning models. In recent years, fast adversarial training methods have gained widespread attention due to their lower computational cost. However, since fast adversarial training uses single-step adversarial attacks instead of multi-step attacks, the generated adversarial examples lack diversity, making models prone to catastrophic overfitting and loss of robustness. Existing methods to prevent catastrophic overfitting have certain shortcomings, such as poor robustness due to insufficient strength of generated adversarial examples, and low accuracy caused by excessive total perturbation. To address these issues, this paper proposes a fast adversarial training method—fast adversarial training with adaptive similarity step size (ATSS). In this method, random noise is first added to the input clean samples, and the model then calculates the gradient for each input sample. The perturbation step size for each sample is determined based on the similarity between the input noise and the gradient direction. Finally, adversarial examples are generated based on the step size and gradient for adversarial training. We conduct various adversarial attack tests on ResNet18 and VGG19 models using the CIFAR-10, CIFAR-100 and Tiny ImageNet datasets. The experimental results demonstrate that our method effectively avoids catastrophic overfitting. And compared to other fast adversarial training methods, ATSS achieves higher robustness accuracy and clean accuracy, with almost no additional training cost.

Introduction

Deep learning has become a significant focus of artificial intelligence research in recent years, achieving remarkable results in areas such as autonomous driving [1], intelligent security [2], and smart healthcare [3]. However, these high-performance deep models, particularly deep convolutional neural networks (DCNNs), often exhibit surprising vulnerability when confronted with adversarial attacks. Szegedy et al. discovered that minor input perturbations, nearly imperceptible to the human eye, can cause severe classification errors in DCNNs [4]. This phenomenon is not limited to image recognition [5] but also affects other tasks such as speech recognition [6] and natural language processing [7]. The vulnerability of deep learning models to adversarial examples has sparked considerable interest and concern in the field of AI security, leading to the development of various methods aimed at improving model robustness against adversarial attacks.

Adversarial training methods are currently among the most widely used techniques to enhance model robustness [8]. Common adversarial training approaches, such as the Projected Gradient Descent (PGD) adversarial training proposed by Madry et al. [9] and the TRadeoff-inspired Adversarial DEfense via Surrogate-loss minimization (TRADES) proposed by Zhang et al. [10], employ multi-step iterative processes to generate adversarial examples, which are then used to train the model. While these methods significantly enhance model robustness, they also have notable limitations. In particular, adversarial training for deep neural networks requires multiple forward and backward propagations, leading to substantial additional computational costs.

To address the high computational cost of adversarial training, Wong et al. [11] proposed a fast adversarial training method. The main idea is to use the single-step adversarial attack method, Fast Gradient Sign Method (FGSM) [12], instead of the commonly used multi-step attack methods, to generate adversarial examples during training. However, directly using FGSM-generated adversarial examples for training can lead to a phenomenon known as “catastrophic overfitting”. Catastrophic overfitting refers to the situation where, during adversarial training with single-step attack methods, the model’s performance suddenly becomes very good against single-step adversarial attacks but significantly deteriorates against multi-step attacks [13]. As shown in Fig 1, during traditional FGSM adversarial training, the model’s accuracy against multi-step adversarial attacks vanishes as catastrophic overfitting occurs. This issue primarily arises because the model overfits to specific adversarial examples, resulting in high robustness against these particular examples, but nearly completely losing robustness when encountering new adversarial examples.

To prevent catastrophic overfitting, researchers have proposed various improved fast adversarial training methods. Among them, Wong et al. introduced the FGSM with Random Start (FGSM-RS) method [11], which adds random noise to clean samples before generating adversarial examples with FGSM. Huang et al. introduced Adversarial Training with Adaptive Step Size (ATAS) [14], which adapts the perturbation step size based on the gradient norm during training. Jia et al. proposed N-FGSM [15], which adds stronger noise without limiting the total perturbation magnitude to avoid catastrophic overfitting. However, when these methods add random noise and generate adversarial examples using single-step attacks, approaches like FGSM-RS [11] and ATAS [14] often suffer from cancellation between the random noise and adversarial perturbation, resulting in adversarial examples that are not strong enough and leading to insufficient model robustness. On the other hand, N-FGSM [15], due to the excessively large total perturbation, causes difficulty in model convergence and lower accuracy.

In this paper, we propose a fast adversarial training method based on adaptive similarity step size. After adding random noise to the original image samples for data augmentation, the samples are input into the model to compute their gradient information. The perturbation step size of the adversarial examples is then adaptively adjusted based on the similarity between the noise and the gradient direction. We design a similarity algorithm that combines Euclidean distance similarity and cosine similarity to evaluate the similarity between the noise and the gradient direction, thereby generating adversarial examples that are more conducive to effective adversarial training. The experimental results demonstrate that our method successfully avoids catastrophic overfitting while achieving high robustness accuracy and clean accuracy.

The major contributions of this paper are two-folds:

We propose an effective fast adversarial training method—fast adversarial training with adaptive similarity step size (ATSS for short). The core of this method lies in the design of an adaptive step size algorithm, which first calculates the similarity between random noise and the gradient direction using Euclidean distance and cosine similarity. Then, the required perturbation step size is computed based on this similarity, making the step size inversely proportional to the similarity. Finally, adversarial examples are generated based on the calculated step size and used for adversarial training, thereby enhancing the robustness of the target model.
We conducted multiple adversarial attack tests on the ResNet18 and VGG19 models using the CIFAR-10, CIFAR-100 and Tiny ImageNet datasets, employing attack methods including FGSM, PGD, C&W, and AA. The experimental results demonstrate that ATSS effectively avoids catastrophic overfitting. And compared to other fast adversarial training methods, ATSS achieves higher robustness accuracy and clean accuracy, with almost no additional training cost.

Related work

Adversarial examples and adversarial attack methods

Adversarial example is a key concept in the field of deep learning robustness, first introduced by Szegedy et al. in [4]. Adversarial examples are specially crafted samples designed to deceive deep learning models by introducing subtle, carefully calculated perturbations to the original inputs. These perturbations can cause the model to produce incorrect classification results, despite being nearly imperceptible to the human eye.

Gradient-based adversarial attack methods are the most commonly used techniques for generating adversarial examples. Among them, Goodfellow et al. [12] introduced FGSM. This method begins by calculating the gradient of the model’s loss function with respect to the input sample x. The sign of this gradient is then multiplied by a perturbation budget ε. The resulting value is added to the original image to generate the adversarial example x′, represented as:

\begin{matrix} x^{'} = x + ϵ \cdot sign (\nabla_{x} L (x, y)) \end{matrix}

(1)

where ∇_x is the gradient with respect to x. Madry et al. proposed PGD [9], which is an iterative adversarial attack technique. Due to its strong attack capability and broad applicability, PGD is regarded as the benchmark method for generating adversarial examples. PGD can be understood as an iterative version of FGSM, where a smaller perturbation step size is used in each iteration. Additionally, a projection operation is included at each step to ensure that the adversarial examples remain within the predefined perturbation budget [-ε, ε], represented as:

\begin{matrix} x_{i + 1}^{'} = {clip}_{x, ε} {x_{i}^{'} + α \cdot sign (\nabla_{x_{i}^{'}} L (x_{i}^{'}, y))} \end{matrix}

(2)

where i is the step number in PGD, and clip(⋅) is the clipping function that ensures the generated adversarial examples satisfy both the perturbation budget ε constraint and the image’s pixel value range. Building on PGD, researchers have developed various iterative attack methods [16–18]. Additionally, there are optimization-based adversarial attack methods, such as Carlini & Wagner attack (C&W) proposed by Carlini et al. [19]. This method generates adversarial examples by solving a constrained optimization problem, aiming to find the minimal perturbation that leads to the misclassification of the sample.

Croce et al. [20] introduced the AutoAttack (AA) method, a comprehensive adversarial attack testing framework. AutoAttack combines multiple advanced attack methods, including Adaptive PGD with Cross-Entropy loss (APGD-CE) [20], Adaptive PGD with Decision-Based loss Reversion (APGD-DLR) [20], Fast Adaptive Boundary attack (FAB) [21], and Square Attack [22], with the goal of systematically evaluating the robustness of neural network models.

Adversarial defense methods

To counter the threat posed by adversarial examples, researchers have proposed various defense methods against adversarial attacks. Common approaches include adversarial training [10], input detection [23], preprocessing [24], defensive distillation [25], and gradient masking [26].

Among these methods, adversarial training (AT) has garnered the most attention. The fundamental idea of AT is to train the model using adversarial examples generated through adversarial attacks during the training process. This allows the model to gradually adapt to these complex and adversarial examples, thereby improving its robustness. Mathematically, this approach involves solving a min-max optimization problem, which can be expressed as follows:

\begin{matrix} min_{θ} \underset{(x, y) \sim D}{E} [max_{Δ x \in ε} L (f_{θ} (x + Δ x), y)] \end{matrix}

(3)

where D represents the training dataset, x is the input sample, y is the label, Δx is the adversarial perturbation, ε is the perturbation budget, L(⋅) is the loss function, and f_θ represents the deep learning model with parameters θ. The ultimate goal is to optimize the model parameters θ to enhance the adversarial robustness of the DCNNs.

Madry et al. proposed the PGD-AT method [9], which is one of the most successful adversarial training methods. This approach uses the PGD method to iteratively generate adversarial examples for training, thereby improving the robustness of deep learning models. Building on this, Zhang et al. introduced the TRADES adversarial training method [10], which redesigned the loss function with a particular emphasis on enhancing the model’s adversarial robustness while maintaining classification accuracy.

Due to the high computational cost required by multi-step adversarial training methods, Shafahi et al. proposed the Free Adversarial Training (Free-AT) [27] method. This approach involves multiple iterations of training on the same mini-batch of samples, where the gradient information obtained from the previous iteration is reused to update both the model parameters and the adversarial examples. This eliminates the additional computational cost of generating adversarial examples. However, since Free-AT requires multiple iterations on each mini-batch, the training time remains long, and the resulting model’s robustness is still insufficient. Consequently, researchers began exploring the use of single-step adversarial attack methods in adversarial training to replace multi-step methods for generating adversarial examples, leading to the development of fast adversarial training methods [11].

However, simply using single-step adversarial attack methods for adversarial training can result in catastrophic overfitting, where the model loses robustness. To address this issue, Wong et al. proposed the FGSM-RS [11] method, which involves adding random noise to the input sample before generating adversarial perturbations, represented as:

\begin{matrix} x_{n o i s e} = x + Uniform (- k ε, k ε) \end{matrix}

(4)

where k is the noise coefficient, representing the noise range of [-kε,kε], and x_noise is the sample after adding random noise.

Similarly, Jia et al. proposed the Fast Gradient Sign Method with prior from the Momentum of all Previous Epoch (FGSM-MEP) [28], which introduces a momentum mechanism that accumulates the adversarial perturbations generated in previous iterations and uses them as the initial perturbation for the next iteration, with periodic resets after a certain number of iterations. Although these methods can partially mitigate catastrophic overfitting, they still lag behind multi-step adversarial training methods in terms of robustness and can still experience catastrophic overfitting when the perturbation budget is large. And Riushchenko et al. introduced a regularizer called GradAlign [29], which aligns the gradient directions of the model for clean samples and their corresponding adversarial examples during adversarial training. This alignment helps prevent the neural network model from exhibiting locally highly nonlinear behavior, thereby avoiding catastrophic overfitting. However, since GradAlign requires multiple gradient computations, its training efficiency is lower than that of FGSM-RS. Huang et al. proposed a fast adversarial training method with adaptive step size adjustment, known as ATAS [14]. The core of this method lies in adjusting the step size based on the l₂ norm of the gradient for the training samples. Jorge et al. proposed N-FGSM [15], which effectively avoids catastrophic overfitting and further enhances model robustness by adding stronger noise to the original clean samples and not restricting the total perturbation during training. However, the drawback of this approach is that the excessive perturbation leads to lower model accuracy.

Existing fast adversarial training methods generally suffer from issues such as weak robustness, low accuracy, and a tendency toward catastrophic overfitting. To address these problems, we improve upon existing methods and propose the ATSS method. We design a similarity evaluation approach and calculate the required perturbation step size based on the similarity between the random noise and the gradient direction. Experimental results show that this method can effectively prevent catastrophic overfitting and achieve high accuracy on both adversarial examples and clean samples.

Fast adversarial training with adaptive similarity step size

Currently, fast adversarial training methods that generate adversarial examples using the single-step adversarial method FGSM have garnered widespread attention. Compared to traditional multi-step adversarial training methods, this approach significantly reduces the required computational cost. However, it also has some drawbacks, with catastrophic overfitting being the most prominent issue, leading to a loss of model robustness. As shown in Fig 2, when using the traditional single-step attack method FGSM to perform adversarial training on a ResNet18 model, catastrophic overfitting occurred after 20 epochs of training, resulting in an almost complete loss of defense against multi-step attacks. Although several methods have been developed to avoid this phenomenon, they generally suffer from either insufficient robustness or inadequate classification accuracy on clean samples.

To address these issues, this paper proposes a fast adversarial training method based on adaptive similarity step size, which aims to overcome the problem of catastrophic overfitting while simultaneously improving robust accuracy and clean accuracy.

Motivation

One of the main reasons for the occurrence of catastrophic overfitting is that the magnitude of the perturbations in the generated adversarial examples is overly uniform [30]. To enhance the diversity of the training data and avoid catastrophic overfitting, existing fast adversarial training methods generally use random noise initialization. This involves adding random noise perturbations to the input samples before generating adversarial perturbations. The samples with added random noise are then input into the model to compute gradients and obtain adversarial perturbations. The sum of the random perturbations and the adversarial perturbations constitutes the total perturbation.

Adding random noise for initialization can change the magnitude of the total perturbation, thereby avoiding catastrophic overfitting. In FGSM-RS, the strategy is adopted where a clipping function is used to limit the total perturbation within a preset perturbation budget. This method can prevent the total perturbation from becoming excessively large.

We find that, attack methods using random noise initialization and limiting the total perturbation size exhibit a significant decrease in attack strength compared to the original FGSM attack method. Table 1 compares the attack success rates of the attack methods used by FGSM-RS and the original FGSM attack method against a ResNet18 model trained with TRADES adversarial training, with the maximum perturbation limit set to 8/255. In FGSM-RS, the step size is set to 8/255 and 10/255, respectively. From the table, it can be seen that the attack success rate of the FGSM-RS method is significantly lower than that of the original FGSM attack method in both settings of step size. This also leads to the reduced robustness of fast adversarial training methods like FGSM-RS [11] and FGSM-MEP [28], which generate adversarial examples using attack methods that employ random noise initialization and limit the total perturbation size.

Table 1. Attack success rates of different single-step attack methods.

Attack methods	Attack success rate
FGSM	48.48
FGSM-RS(α = 8/255)	42.36
FGSM-RS(α = 10/255)	45.10

Open in a new tab

The main reason for the reduced attack strength of FGSM after random initialization is that the direction of the random perturbation partially opposes the gradient direction, causing the random perturbation and the adversarial perturbation to cancel each other out. If the maximum perturbation is further limited, this results in a decreased average total perturbation. Table 2 compares the average total perturbation magnitudes of adversarial examples generated by the attack methods used in FGSM-RS and the original FGSM attack method, with the maximum perturbation limit set to 8/255. In FGSM-RS, the step size is set to 8/255 and 10/255, respectively. From the table, it can be seen that the average total perturbation magnitude of adversarial examples generated by FGSM-RS in both settings of step size is less than the perturbation limit, which leads to insufficient strength of the generated adversarial examples.

Table 2. Average total perturbation magnitudes generated by different single-step attack methods.

Attack methods	Average perturbation
FGSM	8.000/255
FGSM-RS(α = 8/255)	6.000/255
FGSM-RS(α = 10/255)	6.875/255

Open in a new tab

In N-FGSM [15], larger noise is used to initialize the samples when generating adversarial examples, and the maximum perturbation is not limited, ensuring the attack strength of the adversarial examples. However, the total perturbation magnitude of adversarial examples generated by N-FGSM is excessively large, which leads to lower classification accuracy of the model. Table 3 compares the maximum total perturbation magnitudes of the attack methods used in N-FGSM with those generated by FGSM-RS and FGSM, all under a perturbation budget of 8/255. From the table, it is evident that the maximum total perturbation magnitude of the adversarial examples generated by N-FGSM significantly exceeds the perturbation budget, resulting in insufficient clean accuracy for the model.

Table 3. Maximum total perturbation magnitudes generated by different single-step attack methods.

Attack methods	Maximum perturbation
FGSM	8.000/255
FGSM-RS	8.000/255
N-FGSM	24.000/255

Open in a new tab

Based on the above analysis, we propose a new fast adversarial training method called ATSS. Unlike traditional methods that use simple clipping to limit the perturbation magnitude, ATSS adaptively adjusts the perturbation step size based on the similarity between the random noise and the gradient direction, making the step size inversely proportional to the similarity. When the similarity is low, the perturbation step size is increased to ensure the attack strength of the adversarial examples; when the similarity is high, the step size is decreased to prevent the total perturbation from becoming too large. A comparison of adversarial examples generated by different methods is shown in Fig 3.

ATSS

To ensure that the adversarial examples generated during fast adversarial training maintain both attack strength and diversity while limiting excessive perturbation, this paper proposes a fast adversarial training method based on adaptive similarity step size. The flowchart of this method is illustrated in Fig 4.

After extracting an initial batch of samples x from the dataset, a random noise η with the same shape as x is generated, where the values of η are within the range [-1,1]. This random noise is then scaled and added to the initial samples. The augmented samples are subsequently input into the model to compute the gradient direction v, represented as:

\begin{matrix} η = Uniform (- 1, 1) \end{matrix}

(5)

\begin{matrix} x_{noise} = x + ε \cdot η \end{matrix}

(6)

\begin{matrix} v = sign (\nabla_{x} L (f (x_{noise}), y)) \end{matrix}

(7)

where ε is the perturbation budget, f is the target model.

Next, the similarity between the added noise η and the gradient direction v is calculated. To evaluate the similarity between two vectors, both their magnitude and directional similarities should be assessed simultaneously. As shown in Fig 5(a), using only Euclidean distance similarity can neglect differences in direction. Similarly, in Fig 5(b), using only cosine similarity will neglect differences in magnitude (In (a), the Euclidean distance similarity between v and η₁ is the same as that between v and η₂. In (b), the cosine similarity between v and η₁ is the same as that between v and η₂.). To better measure the similarity between the two vectors, taking into account both magnitude and direction information, this paper uses a method that combines Euclidean distance similarity and cosine similarity to evaluate the overall similarity.

Firstly, calculate the Euclidean distance D and cosine value C between η and v. represented as:

\begin{matrix} D (η, v) = \sqrt{\sum_{i = 1}^{n} {(η_{i} - v_{i})}^{2}} \end{matrix}

(8)

\begin{matrix} C (η, v) = cos (η, v) = \frac{\sum_{i = 1}^{n} η_{i} \cdot v_{i}}{‖ η ‖ \cdot ‖ v ‖} \end{matrix}

(9)

where n is the dimension of the vectors, and η_i and v_i are the i-th elements of η and v, respectively.

Then, normalize them to obtain the Euclidean distance similarity S_ed and cosine similarity S_cos, represented as:

\begin{matrix} S_{ed} (η, v) = - \frac{D (η, v) - m e a n (D)}{s t d (D)} \end{matrix}

(10)

\begin{matrix} S_{cos} (η, v) = \frac{C (η, v) - m e a n (C)}{s t d (C)} \end{matrix}

(11)

where mean(D) is the mean of the Euclidean distances, std(D) is the standard deviation of the Euclidean distances, mean(C) is the mean of the cosine values, and std(C) is the standard deviation of the cosine values.

The similarity s is obtained by adding the Euclidean distance similarity and cosine similarity. Then, the perturbation step size α, which is inversely proportional to s is generated, represented as:

\begin{matrix} s = S_{ed} (η, v) + S_{cos} (η, v) \end{matrix}

(12)

\begin{matrix} α = (1 - β \cdot s) \cdot α_{0} \end{matrix}

(13)

where β is the influence coefficient and α₀ is the standard step size.

The adversarial perturbations are generated based on the step size α, and they are added to x_noise to obtain the adversarial examples. The calculation formula for the adversarial example is as follows:

\begin{matrix} x^{'} = x_{noise} + α \cdot v \end{matrix}

(14)

Finally, the generated adversarial example x′ is fed into the model to update the model parameters. The pseudocode for the ATSS algorithm is presented in Algorithm1.

Algorithm 1 ATSS

Input: Number of training epochs N, perturbation budget ε, dataset D, samples x, labels y, size of each training batch M, influence coefficient β, standard step size α₀, loss function L, target model f and its parameters θ, learning rate γ

Output: Optimized model f_θ

1: for t = 1 to N do

2: for i = 1 to M do

3: η_i ← Uniform(−1, 1)

4: x_noise,i ← x_i + ε ⋅ η_i;

5: $v_{i} \leftarrow sign (\nabla_{x_{i}} L (f (x_{noise, i}), y_{i}))$

6: s ← S_ed(η_i, v_i) + S_cos(η_i, v_i)

7: α ← (1 − β ⋅ s) ⋅ α₀

8: $x_{i}^{'} \leftarrow x_{noise, i} + α \cdot v_{i}$

9: $θ \leftarrow θ - γ \nabla_{θ} L (f (x_{i}^{'}), y_{i})$

10: end for

11: end for

Experiments

To validate the practicality of the proposed method, we conduct experiments to assess whether ATSS can achieve a model with both strong robustness and high classification accuracy while maintaining a relatively low computational cost. We compare ATSS with several adversarial training methods. All experiments are performed on a platform equipped with a 3.0GHz i9-13900k CPU, 128GB of RAM, and an RTX 4080 GPU, using the PyTorch framework.

Datasets

In the experimental section, we use three publicly available benchmark datasets: CIFAR-10 [31], CIFAR-100 [31], and Tiny ImageNet [32]. CIFAR-10 and CIFAR-100 were released by the Canadian Institute for Advanced Research (CIFAR). The CIFAR-10 dataset contains a total of 60,000 images across 10 categories, such as “airplan” and “car”, with 50,000 images used for training and 10,000 for testing. The CIFAR-100 dataset is similar to CIFAR-10 but includes 60,000 images divided into 100 categories, with the same 50,000 images used for training and 10,000 for testing. Both datasets consist of RGB images with a size of 32×32×3. The Tiny ImageNet dataset is a smaller version of the ImageNet dataset [33], divided into 200 categories, containing 100,000 training images and 10,000 testing images. The Tiny ImageNet dataset consists of RGB images with a size of 64×64×3.

Model parameters

For the experimental setup, we select ResNet18 [34] and VGG19 [35] as the target models for testing, as these models are widely used in various image recognition tasks. The models do not use any pre-trained weights. The optimizer used in the experiments is the SGD optimizer, with a momentum of 0.9 and a weight decay coefficient of 5e-4. The initial learning rate is set to 0.01, and the batch size for training samples is set to 128. The influence coefficient β is set to 0.04, the standard step size α₀ is set to 10/255 (see Section Hyperparameter Experiments).

Attack methods

To verify the generalizability of the proposed method, we employ several commonly used adversarial attack methods to attack the resulting models, including FGSM [12], PGD [9], C&W [19], and AA [20]. In the experiments, the perturbation budget ε for all adversarial attacks is set to 8/255. The step size α for each PGD attack is set to 2/255, with the number following PGD indicating the number of iterations (e.g., PGD-10 represents 10 steps of PGD). The C&W adversarial attack method used is based on modifying the C&W loss with PGD, with 20 iterations.

Baselines

To verify the effectiveness of the proposed method, we compare it with other fast adversarial training methods: FGSM-RS [11], N-FGSM [15], FGSM-MEP [28], ATAS [14], and FGSM-GA [29]. The experimental parameters for the other fast adversarial training methods are set according to the configurations provided in their respective papers and source codes.

Metrics

The metrics we use in the experiments is the classification accuracy. The formula for calculating classification accuracy is as follows:

\begin{matrix} r = \frac{p}{t} \cdot 100 % \end{matrix}

(15)

where r represents the classification accuracy, t represents the total number of test examples, and p represents the total number of examples correctly predicted by the model.

Experiments on preventing catastrophic overfitting

To verify that the proposed method can effectively prevent catastrophic overfitting, we conduct long-term adversarial training over 100 epochs on the CIFAR-10 dataset using ResNet18 and VGG19 as the target models. After each training epoch, we test the models’ classification accuracy on clean samples, under FGSM attacks, and under PGD-10 attacks. The experimental results are shown in Figs 6 and 7.

The results indicate that the proposed fast adversarial training method based on adaptive similarity step size effectively prevents catastrophic overfitting. Throughout the 100 epochs of adversarial training, both the ResNet18 and VGG19 models maintain good classification accuracy when facing the single-step attack method FGSM and the multi-step attack method PGD.

We also perform the same tests on the CIFAR-100 and Tiny ImageNet datasets. The model used is ResNet18. On both datasets, ATSS can effectively avoid catastrophic overfitting, shown in Figs 8 and 9.

Additionally, we compare the clean accuracy and robust accuracy during the training process of ATSS, FGSM-RS [11], and N-FGSM [15] on ResNet18 model and CIFAR-10 dataset. As shown in Fig 10, compared to the other two fast adversarial training methods, ATSS demonstrates better performance during the training process.

Experiments on adversarial robustness and training cost

We evaluate the adversarial robustness and the training time required for the models trained using the proposed method. To validate the effectiveness of our method, we compare it with five other fast adversarial training methods: FGSM-RS [11], N-FGSM [15], FGSM-MEP [28], ATAS [14], and FGSM-GA [29]. For reference, we also test two traditional multi-step adversarial training methods, PGD-10-AT [9] and TRADES [10]. The experimental parameters for the other adversarial training methods are set according to the configurations provided in their respective papers and source codes. In FGSM-RS, the step size is set to 10/255.

All methods are trained for 60 epochs, and the total training time is recorded. For robustness testing, we use five attack methods, i.e., FGSM, PGD-10, PGD-50, C&W, and AA, to attack the trained models and record the classification accuracy under these attacks. The detailed experimental results are shown in Table 4.

Table 4. Classification accuracy and training time under adversarial attacks for different adversarial training methods.

Models	Method type	Training method	Classification accuracy under different attacks(%)						Training time (min)
Models	Method type	Training method	Clean	FGSM	PGD-10	PGD-50	C&W	AA	Training time (min)
ResNet18	Multi-step	PGD-10-AT [9]	80.22	52.23	49.12	48.55	47.81	45.61	103
	Multi-step	TRADES [10]	80.71	52.44	49.67	48.98	48.13	45.94	111
	Single-step	FGSM-RS [11]	82.33	51.52	45.96	45.22	44.89	42.60	21
		N-FGSM [15]	80.27	51.31	47.13	46.50	45.83	43.33	21
		FGSM-MEP [28]	80.67	50.99	46.61	46.11	45.45	42.95	22
		ATAS [14]	83.81	49.79	43.85	43.62	43.29	40.98	23
		FGSM-GA [29]	81.46	50.70	46.67	46.31	45.71	43.07	51
		ATSS(ours)	81.78	51.65	48.53	47.91	47.09	44.65	22
VGG19	Multi-step	PGD-10-AT [9]	77.17	50.57	47.21	46.63	45.87	42.41	137
	Multi-step	TRADES [10]	77.68	50.81	47.80	46.95	46.17	42.67	149
	Single-step	FGSM-RS [11]	79.27	50.03	44.11	43.21	42.94	39.23	31
		N-FGSM [15]	77.21	49.72	46.21	45.46	44.58	40.67	31
		FGSM-MEP [28]	77.61	49.30	46.07	45.20	44.22	40.30	32
		ATAS [14]	80.75	48.20	42.44	41.72	41.24	37.52	34
		FGSM-GA [29]	78.48	49.12	45.42	44.41	43.80	39.77	76
		ATSS(ours)	78.92	49.96	46.60	45.87	45.02	41.19	32

Open in a new tab

As shown in Table 4, compared to other fast adversarial training methods, our method consistently achieves higher classification accuracy under all multi-step attack methods. Additionally, the classification accuracy on clean samples remains at a high level. In terms of training time, since our method does not require additional backpropagation compared to methods like FGSM-RS, the training duration is similar to other fast adversarial training methods.

When comparing ATSS with the traditional multi-step adversarial training method TRADES, the proposed method demonstrates comparable robust accuracy and much faster training process, saving more than 80% training time, shown in Fig 11.

To further validate the effectiveness of our method across different datasets, we conduct additional tests on the CIFAR-100 and Tiny ImageNet datasets using the PGD-10 attack method to evaluate classification accuracy. The target models are compared against the five fast adversarial training methods. The experimental results are shown in Table 5. As can be seen from Table 5, our method also maintains better robustness on both the CIFAR-10 and Tiny ImageNet datasets compared to other fast adversarial training methods.

Table 5. Classification accuracy against PGD-10 attacks on different datasets.

Training method	CIFAR-100	Tiny ImageNet
FGSM-RS [11]	25.12	19.80
N-FGSM [15]	25.71	20.76
FGSM-MEP [28]	25.62	20.52
ATAS [14]	24.78	19.56
FGSM-GA [29]	25.55	20.61
ATSS(ours)	26.11	21.57

Open in a new tab

Ablation study

To better assess the similarity between two vectors while considering both magnitude and direction information, we combine Euclidean distance similarity and cosine similarity to evaluate overall similarity and generate the perturbation step size. To demonstrate the effectiveness of this method, we compare it with strategies that use only Euclidean distance similarity or only Euclidean distance for generating the step size, conducting an ablation study on CIFAR-10 and ResNet18. The experimental results, shown in Table 6, indicate that combining Euclidean distance similarity and cosine similarity for generating perturbation step sizes leads to better robustness in adversarial training.

Table 6. Test results under different similarity calculation strategies.

	Clean acc.(%)	PGD-10 acc.(%)
Only S_cos	81.70	47.39
Only S_ed	81.57	47.51
S_ed + S_cos	81.78	48.53

Open in a new tab

Hyperparameter experiments

To illustrate the impact of the various hyperparameters in the proposed method and to identify the most suitable parameter values, we conduct the experiments on CIFAR-10, using ResNet18.

Influence coefficient

As shown in Eq (13), the influence coefficient β is used to control to which extent similarity affects the perturbation step size. If β is set to 0, the generated perturbation step size equals the standard step size.

Here, we test the classification accuracy of models trained with different influence coefficients under adversarial attacks. The experimental results are shown in Table 7.

Table 7. Test results under different influence coefficient.

Influence coefficient	Clean acc.(%)	PGD-10 acc.(%)
β = 0.00	81.54	46.56
β = 0.02	81.65	47.77
β = 0.04	81.78	48.53
β = 0.06	81.81	47.58
β = 0.20	82.61	0.70

Open in a new tab

The results indicate that setting the influence coefficients too high or too low can also lead to decreased robustness. Therefore, in our experiments, the influence coefficients β is set to 0.04.

Standard step size

As shown in Eq (13), the standard step size α₀ controls the magnitude of the average generated adversarial perturbation. Here, we test the classification accuracy of models trained with different standard step size under adversarial attacks. The experimental results are shown in Table 8.

Table 8. Test results under different standard step size.

Standard step size	Clean acc.(%)	PGD-10 acc.(%)
α₀ = 7/255	82.17	45.90
α₀ = 8/255	82.12	46.54
α₀ = 9/255	81.94	47.13
α₀ = 10/255	81.78	48.53
α₀ = 12/255	82.35	0.01

Open in a new tab

The results indicate that as the value of the standard step size increases, the robustness of the model gradually improves. However, excessively large standard step size can lead to instability during model training. Therefore, in our experiments, the standard step size α₀ is set to 10/255.

Discussions

The proposed fast adversarial training method based on ATSS can effectively enhance the ability of deep convolutional neural networks to resist adversarial image examples. It is noteworthy that, this technique can be readily extended to adversarial defense for speech [6] and video [36] data. In speech and video classification, recurrent neural networks are a commonly used method [37]. Similar to convolutional neural networks, adversarial training techniques can also be applied to recurrent neural networks to enhance their robustness against adversarial examples [38]. ATSS, as an acceleration technique for adversarial training and avoiding catastrophic overfitting, can be seamlessly applied to recurrent neural networks to enhance their robustness to adversarial speech and video examples.

Conclusion

To improve the efficiency of adversarial training and enhance the robustness of models while avoiding catastrophic overfitting during training, we propose a fast adversarial training method with adaptive similarity step size (ATSS for short). This method involves first initializing the samples with random noise and then feeding them into the model to obtain gradient information. The perturbation step size for each sample is calculated based on the similarity between the added random noise and the gradient direction. This approach ensures the diversity and attack strength of the adversarial examples while preventing excessive perturbations, thereby generating adversarial examples that are more conducive to effective model training.

We conduct extensive comparative experiments on various datasets, including CIFAR-10, CIFAR-100, and Tiny ImageNet, using attack methods such as FGSM, PGD, C&W, and AA to compare the proposed method with other adversarial training methods. The experimental results demonstrate that ATSS successfully addresses the catastrophic overfitting issue in fast adversarial training without significantly increasing training costs. Compared to other fast adversarial training methods, ATSS achieves better adversarial robustness while maintaining high accuracy on clean samples.

The robustness of the models trained by the proposed method still lags behind that of the current state-of-the-art multi-step adversarial training methods. In future works, more complex and sophisticated adaptive step size strategies could be explored to further enhance the model’s robustness.

Acknowledgments

The authors want to thank Dr. Zhen Xu, Dr. Yong-Bo Wu, and Dr. Huo-Ping Yi for their insights on adversarial training.

Data Availability

All relevant data are available at https://doi.org/10.6084/m9.figshare.27991265.v1.

Funding Statement

This work was supported by the National Natural Science Foundation of China (No.52293424), and Open Foundation of the State Key Laboratory of Fluid Power and Mechatronic Systems(GZKF-202329). The funders played no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Liu Y, Diao S. An automatic driving trajectory planning approach in complex traffic scenarios based on integrated driver style inference and deep reinforcement learning. PLoS One. 2024; 19(1):e0297192. doi: 10.1371/journal.pone.0297192 [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Abid MM, Mahmood T, Ashraf R, Faisal CMN, Ahmad H, Niaz AA. Computationally intelligent real-time security surveillance system in the education sector using deep learning. PLoS One. 2024; 19(7):e0301908. doi: 10.1371/journal.pone.0301908 [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Singh R, Kalra MK, Nitiwarangkul C, Patti JA, Homayounieh F, Padole A, et al. Deep learning in chest radiography: detection of findings and presence of change. PLoS One. 2018; 13(10):e0204155. doi: 10.1371/journal.pone.0204155 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, et al. Intriguing properties of neural networks. ArXiv. 2013; abs/1312.6199.
5. Kurakin A, Goodfellow IJ, Bengio S. Adversarial examples in the physical world. In: Artificial Intelligence Safety and Security. Chapman and Hall/CRC; 2018; 99–112. [Google Scholar]
6. Carlini N, Wagner D. Audio adversarial examples: targeted attacks on speech-to-text. In: 2018 IEEE Security and Privacy Workshops (SPW). 2018; 1–7. doi: 10.1109/SPW.2018.00009 [DOI] [Google Scholar]
7. Zhang WE, Sheng QZ, Alhazmi A, Li C. Adversarial attacks on deep-learning models in natural language processing: a survey. ACM Trans Intell Syst Technol (TIST). 2020; 11(3):1–41. doi: 10.1145/3377553 [DOI] [Google Scholar]
8.Bai T, Luo J, Zhao J, Wen B, Wang Q. Recent advances in adversarial training for adversarial robustness. ArXiv. 2021; /abs2102.01356.
9.Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A. Towards deep learning models resistant to adversarial attacks. ArXiv. 2017; /abs1706.06083.
10.Zhang H, Yu Y, Jiao J, Xing E, El Ghaoui L, Jordan M. Theoretically principled trade-off between robustness and accuracy. International Conference on Machine Learning. 2019; 7472–7482.
11.Wong E, Rice L, Kolter JZ. Fast is better than free: revisiting adversarial training. ArXiv. 2020; /abs2001.03994.
12.Goodfellow IJ, Shlens J, Szegedy C. Explaining and harnessing adversarial examples. ArXiv. 2014; /abs1412.6572.
13.Zhao M, Zhang L, Kong Y, Yin B. Fast adversarial training with smooth convergence. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023; 4720–4729.
14. Huang Z, Fan Y, Liu C, Zhang W, Zhang Y, Salzmann M, et al. Fast adversarial training with adaptive step size. IEEE Trans Image Process. 2023; 32:6102–6114. doi: 10.1109/TIP.2023.3326398 [DOI] [PubMed] [Google Scholar]
15.De Jorge Aranda P, Bibi A, Volpi R, Sanyal A, Torr P, Rogez G, et al. Make some noise: reliable and efficient single-step adversarial training. Proceedings of the 36th Conference on Neural Information Processing Systems, 2022; 12881–12893.
16.Moosavi-Dezfooli SM, Fawzi A, Frossard P. Deepfool: a simple and accurate method to fool deep neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016; 2574–2582.
17.Dong Y, Liao F, Pang T, Su H, Zhu J, Hu X, Li J. Boosting adversarial attacks with momentum. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018; 9185–9193.
18.Yao Z, Gholami A, Xu P, Keutzer K, Mahoney MW. Trust region based adversarial attack on neural networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019; 11350–11359.
19.Carlini N, Wagner D. Towards evaluating the robustness of neural networks. 2017 IEEE Symposium on Security and Privacy (S&P). 2017; 39–57.
20.Croce F, Hein M. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. International Conference on Machine Learning. 2020; 2206–2216.
21.Croce F, Hein M. Minimally distorted adversarial examples with a fast adaptive boundary attack. International Conference on Machine Learning. 2020; 2196–2205.
22.Andriushchenko M, Croce F, Flammarion N, Hein M. Square attack: a query-efficient black-box adversarial attack via random search. European Conference on Computer Vision. 2020; 484–501.
23.Xu W, Evans D, Qi Y. Feature squeezing: detecting adversarial examples in deep neural networks. ArXiv. 2017; /abs1704.01155.
24.Xie C, Wu Y, van der Maaten L, Yuille AL, He K. Feature denoising for improving adversarial robustness. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019; 501–509.
25.Papernot N, McDaniel P. On the effectiveness of defensive distillation. ArXiv. 2016; /abs1607.05113.
26.Athalye A, Carlini N, Wagner D. Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. International Conference on Machine Learning. 2018; 274–283.
27. Shafahi A, Najibi M, Ghiasi MA, Xu Z, Dickerson J, Studer C, et al. Adversarial training for free! Proceedings of the 33rd Advances in Neural Information Processing Systems. 2019; 3358–3369. [Google Scholar]
28.Jia X, Zhang Y, Wei X, Wu B, Ma K, Wang J, et al. Prior-guided adversarial initialization for fast adversarial training. European Conference on Computer Vision. 2022; 567–584.
29.Andriushchenko M, Flammarion N. Understanding and improving fast adversarial training. Proceedings of the 34th Conference on Neural Information Processing Systems. 2020; 16048–16059.
30.Kim H, Lee W, Lee J. Understanding catastrophic overfitting in single-step adversarial training. Proceedings of the AAAI Conference on Artificial Intelligence. 2021; 35(9):8119–8127.
31.Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. Technical Report, University of Toronto. 2009.
32. Le Y, Yang X. Tiny imagenet visual recognition challenge. CS 231N. 2015; 7(7):3. [Google Scholar]
33.Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L. Imagenet: A large-scale hierarchical image databaserecognition. 2009 IEEE conference on computer vision and pattern recognition. 2009; 248–255.
34.He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016; 770–778.
35.Simonyan K. Very deep convolutional networks for large-scale image recognition. ArXiv. 2014; /abs1409.1556.
36.Wei X, Zhu J, Yuan S, Su H. Sparse adversarial perturbations for videos. Proceedings of the AAAI Conference on Artificial Intelligence. 2019; 33(01):8973–8980.
37.Graves A, Mohamed A, Hinton G. Speech recognition with deep recurrent neural networks. 2013 IEEE international conference on acoustics, speech and signal processing. 2013; 6645–6649.
38.Papernot N, McDaniel P, Swami A, Harang R. Crafting adversarial input sequences for recurrent neural networks. MILCOM 2016-2016 IEEE Military Communications Conference. 2016; 49–54.

PLoS One. doi: 10.1371/journal.pone.0317023.r001

Decision Letter 0

Alberto Marchisio

30 Oct 2024

PONE-D-24-43758Avoiding Catastrophic Overfitting in Fast Adversarial Training with Adaptive Similarity Step SizePLOS ONE

Dear Dr. Ding,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The reviewers raised major comments that need to be addressed.

Please submit your revised manuscript by Dec 14 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Alberto Marchisio

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. Thank you for uploading your study's underlying data set. Unfortunately, the repository you have noted in your Data Availability statement does not qualify as an acceptable data repository according to PLOS's standards. At this time, please upload the minimal data set necessary to replicate your study's findings to a stable, public repository (such as figshare or Dryad) and provide us with the relevant URLs, DOIs, or accession numbers that may be used to access these data. For a list of recommended repositories and additional information on PLOS standards for data deposition, please see https://journals.plos.org/plosone/s/recommended-repositories.

Additional Editor Comments (if provided):

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

Reviewer #3: Yes

Reviewer #4: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes

Reviewer #4: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Title: Avoiding Catastrophic Overfitting in Fast Adversarial Training with Adaptive Similarity Step Size

The article discusses strengths and weaknesses of adversarial learning when training deep neural network models. A new method implemented by the authors, called “Fast adversarial training with adaptive similarity step size (ATSS)”, is presented and tested on ResNet18 and VGG19 models by using CIFAR10, CIFAR100 and Tiny ImageNet datasets. The results confirm the validity of the approach, since ATSS avoids catastrophic overfitting, without extra computational costs

General

The article is well written and organized, the quality of English is good. I found some minor issues, which I listed below. The description of the ATSS algorithm is well structured and allows the readers to fully understand how the method works. The presentation of results on benchmark datasets could be improved in order to enhance a full comprehension: did the authors use pre-trained weights for ResNet18 and VGG19? This information is relevant. Moreover, sometimes the authors do not fully explain their outcomes (see below). Finally, the ATSS method achieves better results than other techniques with respect to accuracy, but the advantage seems limited. I understand the difficulty of achieving significantly better performances, but the authors might add some plots/statistical data supporting their claims.

Overall, the authors need to substantially modify the presentation of the experiments and their results in order to improve the quality of the article and meet the Journal requirements.

Detailed

The formatting of the paper is not justified. I suggest the authors change it.

When describing data in Tables 1 and 2, the authors do not mention the last line of both tables, that is data concerning the case “FGSM-RS(α=10/255)”. Therefore, I guess that this condition is irrelevant. Why did they insert this information?

When presenting the experimental setup, the authors state “The optimizer used in the experiments is the SGD optimizer, with a momentum of 0.9 and a weight decay coefficient of 5e-4”. Did they consider the possibility of using the Adam optimizer [1]?

[1] Kingma, D. P. & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

When the authors discuss the results of the experiments about prevention of catastrophic overfitting, they claim “Throughout the 100 epochs of adversarial training, the ResNet18 and VGG19 models maintain a consistent level of classification accuracy against the multi-step attack method PGD, without losing robustness due to catastrophic overfitting”. What about FGSM? They do not say anything about this technique.

Errors

Section Introduction, page 1: in the sentence “Szegedy et al. discovered that minor input perturbations, nearly imperceptible to the human eye, can cause severe classification errors in DCNNs. [4]”, move the cited reference before the dot.

Section Introduction, page 2: in the sentence “Common adversarial training approaches, such as the Projected Gradient Descent (PGD) adversarial training proposed by Madry et al. [9] and the TRADES proposed by Zhang et al. [10], employ multi-step iterative processes to generate adversarial examples, which are then used to train the model”, the acronym TRADES is not explained. I recommend the authors to add this information.

Section Introduction, page 2: in the sentence “The experimental results demonstrate that our method successfully avoids catastrophic overfitting while achieving high robustness accuracy and clean accuracy..”, please remove one of the two dots.

Section Related Work (Adversarial examples and adversarial attack methods), page 3: in the sentence “Additionally, there are optimization-based adversarial attack methods, such as C&W attack proposed by Carlini et al. [19].”, I recommend the authors to provide the meaning of the C&W acronym.

Section Related Work (Adversarial examples and adversarial attack methods), page 4: in the sentence “AutoAttack combines multiple advanced attack methods, including APGD-CE [20], APGD-DLR [20], FAB [21], and Square Attack [22], with the goal of systematically evaluating the robustness of neural network models.”, can the authors provide the meanings of the acronyms APGD-CE, APGD-DLR and FAB?

Section Related Work (Adversarial defense methods), page 5: in the sentence “Similarly, Jia et al. proposed FGSM-MEP [28], which introduces a momentum mechanism that accumulates the adversarial perturbations generated in previous iterations and uses them as the initial perturbation for the next iteration, with periodic resets after a certain number of iterations”, I recommend the authors to add the explanation of the FGSM-MEP acronym.

Section ATSS, page 8: can the authors explain the symbol f in equation 7?

Section Experiments (Datasets), page 9: in the sentence “In the experimental section, we use three publicly available benchmark datasets: CIFAR-10, CIFAR-100, and Tiny ImageNet.”, can the authors insert references to both CIFAR-10 [2], CIFAR-100 [2] and Tiny ImageNet [3] datasets? I report the references here below:

[2] Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images.

[3] Le, Y., & Yang, X. (2015). Tiny imagenet visual recognition challenge. CS 231N, 7(7), 3.

Section Experiments (Datasets), page 9: in the sentence “The Tiny ImageNet dataset is a smaller version of the ImageNet dataset, divided into 200 categories, containing 100,000 training images and 10,000 testing images”, can the authors insert references to the ImageNet [4] dataset? I report the references here below:

[4] Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248-255). IEEE.

Section Experiments (Experimental setup), page 9: in the sentence “The influence coefficient β is set to 0.04, the standard step size α0 is set to 10/255.”, the authors could add “(see section Hyperparameter Experiments)” at the end of the phrase, since they later explain how they choose these two values.

Section Experiments (Experimental setup), page 10: in the sentence “The CW adversarial attack method used is based on modifying the CW loss with PGD, with 20 iterations.”, the authors use the CW acronym, while they previously employed the C&W acronym. I recommend the authors to keep coherence in the text.

attacks.”, the authors use the CW acronym, while they previously employed the C&W acronym. Still I suggest the authors use the same acronym across the whole manuscript.

Section Experiments (Experiments on Adversarial Robustness and Training Cost), page 11: in Table 4, can the authors modify the last column from “training time” to “Training time”?

Section Conclusion, page 13: in the sentence “In future’s work, more complex and sophisticated adaptive step size strategies could be explored to further enhance the model’s robustness.”, replace “future’s work” with “future works”.

Reviewer #2: 1. There is no need to write another \$ p(\\theta) \$ in equation 3; it's more standard to place the limits of integration below the \$ E \$.

2. Is the reason the images are displayed at the end of the article due to the template?

3. In the motivation section, you attributed the magnitude of the perturbations to the main cause; have you considered the direction?

4. If you think the magnitude of the perturbations is a problem, then why not simply scale the magnitude?

Reviewer #3: The paper proposes a dynamic step to deal with catastrophic overfitting in fast adversarial training. The proposal is theoretically, with good arguments for its derivation, and practically consistent, with good support from numerical results. I believe the paper will be ready for publication after minor changes.

Specific comments:

- The algorithm 1 ATSS let the proposition very clear, but maybe it should consider the learning rate in line 9 (I believe it was used, but omitted in the algorithm) and batches instead of single images (or at least let clear what M “samples” means).

- One of the core ideas in the paper is the similarity measure, which has two components: a Euclidean and a cosine distance. In the beginning they were introduced without any good argument to be so, but it became clear that it was empirical in the results. I think this point can be better presented in the text.

- I suggest using explicit functions for Cos_{\\mean} and Cos_{\\sd}. I think it would be something like \\mean \\cos(\\eta, v) and \\std \\cos(\\eta, v).

- Do not use more than one symbol to represent one variable, parameter or adornation to not be misunderstood as multiplication of several parameters or variables (e.g. do not let “noise” in x_{noise} in italic, or use \\cos instead of Cos), specially because of uniformity since it was ok for x_{\\mean}, for instance.

- There is no need for explicit multiplication operators.

Reviewer #4: This paper introduces an enhanced adversarial training method, termed Fast Adversarial Training with Adaptive Similarity Step Size (ATSS), designed to mitigate catastrophic overfitting while improving robustness and accuracy in deep learning models. The key innovation is the use of an adaptive step size, determined by the similarity between random noise and gradient direction, to control perturbations. However, there are some improvements as follows:

1, some sections could benefit from clearer subheadings, particularly within the experimental setup, to separate discussions of model parameters, datasets, and metrics.

2, the paper would benefit from a deeper analysis of the specific limitations and cases where multi-step methods might be preferable.

3, the authors should add more flowcharts or visual explanations of the ATSS mechanism and its comparative performance, especially for non-expert readers.

4, the paper would benefit from a dedicated discussion section on potential enhancements or challenges in extending ATSS to other domains, such as speech.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Reviewer #4: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: Review.pdf

pone.0317023.s001.pdf^{(85.9KB, pdf)}

PLoS One. 2025 Jan 7;20(1):e0317023. doi: 10.1371/journal.pone.0317023.r002

Author response to Decision Letter 0

9 Dec 2024

Avoiding Catastrophic Overfitting in Fast Adversarial Training with Adaptive Similarity Step Size (PONE-D-24-43758)

We would like to express many thanks to reviewers and associate editor for their valuable comments, which help to enhance and improve the quality of manuscript considerably. We endeavored to address all the comments and our reflections are provided below point by point. An updated version of the paper being closed is modified based on the proposed comments. Now we summarize the responses to the reviewers’ comments as follows:

Reviewer 1:

Reviewer Comments:

Did the authors use pre-trained weights for ResNet18 and VGG19?

Authors Responses:

Thanks for your comments. ResNet18 and VGG19 are trained from the scratch. The pre-trained weights are not used. The statement has been added in the revised manuscript for clarity (see Experiments-Model Parameters Section).

Reviewer Comments:

The authors might add some plots/statistical data supporting their claims.

Authors Responses:

Thanks for your comments. Fig. 5 is added for illustrating why we need to consider both Euclidean distance similarity and cosine similarity. Fig. 8 and Fig. 9 are added for demonstrating the adversarial training process on the CIFAR-100 dataset and Tiny ImageNet dataset. Fig. 10 is added for comparing training performance among ATSS, FGSM-RS, and N-FGSM. Fig. 11 is added for comparing ATSS and TRADES in robust accuracy and training time. The added figures are marked in red.

Reviewer Comments:

The formatting of the paper is not justified. I suggest the authors change it.

Authors Responses:

Thanks for your comments. In Experiments Section, “model parameters”, “attack methods”, “baselines”, and “metrics” subheadings have been added for clarity (marked in red).

Reviewer Comments:

Authors Responses:

Thanks for your comments. In Tables 1 and 2, we set two different step sizes, 8/255 and 10/255, for the FGSM-RS method. As we can see, under these two different step size settings, both the attack success rate and average perturbation magnitude of FGSM-RS are lower than those of the original FGSM. In the 3rd and 4th paragraph of “Motivation” subsection, the statement of the step size settings of FGSM-RS has been added for clarity (marked in red).

Reviewer Comments:

Authors Responses:

Thanks for your comments. The authors considered the Adam optimizer in their experiments and tested its performance with SGD. The results, as shown in the tables below, indicate that the Adam optimizer did not perform better than SGD. Therefore, SGD was used in the experiments.

Table R1 Classification accuracy of Adam and SGD on both ResNet18 and VGG19 models on CIFAR-10 dataset

ResNet18 VGG19

SGD 48.53% 46.60%

Adam 46.72% 42.02%

Table R2 Classification accuracy of Adam and SGD on both CIFAR-100 and Tiny ImageNet datasets on ResNet18 model

CIFAR-100 Tiny ImageNet

SGD 26.11% 21.57%

Adam 24.73% 19.27%

Reviewer Comments:

Authors Responses:

Thanks for your comments. When attacked by FGSM, both the ResNet18 and VGG19 models also can maintain a consistent level of classification accuracy. The statement has been revised in the second paragraph of “Experiments on Preventing Catastrophic Overfitting” subsection (marked in red).

Reviewer Comments:

Authors Responses:

Thanks for your comments. The cited reference has been moved before the dot in the revised manuscript (marked in red).

Reviewer Comments:

Authors Responses:

Thanks for your comments. The acronym TRADES has been explained in the revised manuscript (marked in red).

Reviewer Comments:

Authors Responses:

Thanks for your comments. One of two dots has been removed in the revised manuscript (marked in red).

Reviewer Comments:

Authors Responses:

Thanks for your comments. C&W acronym has been explained in the revised manuscript (marked in red).

Reviewer Comments:

Authors Responses:

Thanks for your comments. The acronyms APGD-CE, APGD-DLR and FAB have been explained in the revised manuscript (marked in red).

Reviewer Comments:

Authors Responses:

Thanks for your comments. The acronym FGSM-MEP has been explained in the revised manuscript (marked in red).

Reviewer Comments:

Section ATSS, page 8: can the authors explain the symbol f in equation 7?

Authors Responses:

Thanks for your comments. f represents the target model. The explanation has been added below Equation 7 (marked in red).

Reviewer Comments:

Authors Responses:

Thanks for your comments. The authors have inserted references the reviewer provided to CIFAR-10, CIFAR-100, Tiny ImageNet, and ImageNet datasets (marked in red).

Reviewer Comments:

Authors Responses:

Thanks for your comments. The phrase “(see Section Hyperparameter Experiments)” has been added following the sentence “The influence coefficient β is set to 0.04, the standard step size α0 is set to 10/255.” (marked in red).

Reviewer Comments:

Section Experiments (Experiments on Adversarial Robustness and Training Cost), page 10: in the sentence “All methods are trained for 60 epochs, and the total training time is recorded. For robustness testing, we use five attack methods, i.e., FGSM, PGD-10, PGD-50, CW, and AA, to attack the trained models and record the classification accuracy under these attacks.”, the authors use the CW acronym, while they previously employed the C&W acronym. Still I suggest the authors use the same acronym across the whole manuscript.

Authors Responses:

Thanks for your comments. All “CW” acronyms have been changed to “C&W” (marked in red).

Reviewer Comments:

Section Experiments (Experiments on Adversarial Robustness and Training Cost), page 11: in Table 4, can the authors modify the last column from “training time” to “Training time”?

Authors Responses:

Thanks for your comments. “training time” has been modified to “Training time” in Table 4 (marked in red).

Reviewer Comments:

Authors Responses:

Thanks for your comments. “future’s work” has been replaced with “future works” in Section Conclusion (marked in red).

Reviewer 2:

Reviewer Comments:

There is no need to write another \$ p(\\theta) \$ in equation 3; it's more standard to place the limits of integration below the \$ E \$.

Authors Responses:

Thanks for your comments. Equation 3 has been modified for clarity (marked in red).

Reviewer Comments:

Is the reason the images are displayed at the end of the article due to the template?

Authors Responses:

Thanks for your comments. Yes.

Reviewer Comments:

In the motivation section, you attributed the magnitude of the perturbations to the main cause; have you considered the direction? If you think the magnitude of the perturbations is a problem, then why not simply scale the magnitude?

Authors Responses:

Thanks for your comments. Adding random noise for initialization is a common-used strategy in fast adversarial training to avoid catastrophic overfitting. In FGSM-RS, adversarial examples are generated by adding noise for initialization and then applying clip operation. The adversarial examples generated by this method have a lower average perturbation magnitude compared to the original FGSM method, resulting in a lower attack success rate than the original FGSM method, as shown in Tables 1 and 2. In N-FGSM, adversarial examples are generated by adding noise for initialization but not applying clip operation. Although the adversarial examples generated by this method have a higher maximum perturbation magnitude compared to the original FGSM method, as shown in Table 3, the models trained with this method have a lower classification accuracy on clean examples, as shown in Table 4.

The reason for insufficient performance of FGSM-RS and N-FGSM is that, both methods do not consider magnitude and direction simultaneously. In the proposed ATSS, the adversarial examples are generated by computing the direction similarity and magnitude similarity between the added noise and the gradient simultaneously. Experiments show this method has better performance compared to existing fast adversarial training methods.

Reviewer 3:

Reviewer Comments:

The algorithm 1 ATSS let the proposition very clear, but maybe it should consider the learning rate in line 9 (I believe it was used, but omitted in the algorithm) and batches instead of single images (or at least let clear what M “samples” means).

Authors Responses:

Thanks for your comments. Algorithm 1 has been modified to include the learning rate and batches instead of single images (marked in red).

Reviewer Comments:

One of the core ideas in the paper is the similarity measure, which has two components: a Euclidean and a cosine distance. In the beginning they were introduced without any good argument to be so, but it became clear that it was empirical in the results. I think this point can be better presented in the text.

Authors Responses:

Thanks for your comments. Before introducing the similarity measure, the importance of combining Euclidean distance and cosine distance has been illustrated in Fig. 5 and explained. The 3rd paragraph of Section ATSS has been revised and marked in red.

Reviewer Comments:

I suggest using explicit functions for Cos_{\\mean} and Cos_{\\sd}. I think it would be something like \\mean \\cos(\\eta, v) and \\std \\cos(\\eta, v).

Authors Responses:

Thanks for your comments. Equation 10 and Equation 11 have been modified accordingly (marked in red).

Reviewer Comments:

Do not use more than one symbol to represent one variable, parameter or adornation to not be misunderstood as multiplication of several parameters or variables (e.g. do not let “noise” in x_{noise} in italic, or use \\cos instead of Cos), specially because of uniformity since it was ok for x_{\\mean}, for instance.

Authors Responses:

Thanks for your comments. Italic “noise” in Equation 6, 7, and 14 have been modified to non-italic (marked in red).

Reviewer Comments:

There is no need for explicit multiplication operators.

Authors Responses:

Thanks for your comments. Equation 9 has been modified accordingly (marked in red).

Reviewer 4:

Reviewer Comments:

some sections could benefit from clearer subheadings, particularly within the experimental setup, to separate discussions of model parameters, datasets, and metrics.

Authors Responses:

Thanks for your comments. “Experimental Setup” has been separated into “Model Parameters”, “Attack Methods”, “Baselines”, and “Metrics” (marked in red).

Reviewer Comments:

the paper would benefit from a deeper analysis of the specific limitations and cases where multi-step methods might be preferable.

Authors Responses:

Thanks for your comments. Fig. 11 has been added to illustrate the robust accuracy and training time of ATSS and multi-step method—TRADES. From the figure, it can be seen that, ATSS demonstrates comparable robust accuracy and much faster training process compared to TRADES, saving more than 80% training time. The 4th paragraph in Section “Experiments on Adversarial Robustness and Training Cost” has been marked in red.

Reviewer Comments:

the authors should add more flowcharts or visual explanations of the ATSS mechanism and its comparative performance, especially for non-expert readers.

Authors Responses:

Reviewer Comments:

the paper would benefit from a dedicated discussion section on potential enhancements or challenges in extending ATSS to other domains, such as speech.

Authors Responses:

Thanks for your comments. The Discussion Section has been ad

Attachment

Submitted filename: Response to Reviewers.docx

pone.0317023.s002.docx^{(26.3KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0317023.r003

Decision Letter 1

Alberto Marchisio

20 Dec 2024

Avoiding Catastrophic Overfitting in Fast Adversarial Training with Adaptive Similarity Step Size

PONE-D-24-43758R1

Dear Dr. Ding,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Alberto Marchisio

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

Reviewer #4: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: No

Reviewer #3: Yes

Reviewer #4: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

6. Review Comments to the Author

Reviewer #1: The authors have addressed all the comments raised in the reviews. The manuscript is now clearer and readers can appreciate the work. The presentation of the results has improved too. I recommend the article for publication.

Reviewer #2: (No Response)

Reviewer #3: All my concerns have been addressed. I believe the paper is now ready for publication. A minor suggestion is to define operators for mean and std, so that they do not appear in italic inside equations.

Reviewer #4: The authors have addressed my concerns well,

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Reviewer #4: No

**********

PLoS One. doi: 10.1371/journal.pone.0317023.r004

Acceptance letter

Alberto Marchisio

27 Dec 2024

PONE-D-24-43758R1

PLOS ONE

Dear Dr. Ding,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Alberto Marchisio

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Attachment

Submitted filename: Review.pdf

pone.0317023.s001.pdf^{(85.9KB, pdf)}

Attachment

Submitted filename: Response to Reviewers.docx

pone.0317023.s002.docx^{(26.3KB, docx)}

Data Availability Statement

All relevant data are available at https://doi.org/10.6084/m9.figshare.27991265.v1.

[pone.0317023.ref001] 1. Liu Y, Diao S. An automatic driving trajectory planning approach in complex traffic scenarios based on integrated driver style inference and deep reinforcement learning. PLoS One. 2024; 19(1):e0297192. doi: 10.1371/journal.pone.0297192 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0317023.ref002] 2. Abid MM, Mahmood T, Ashraf R, Faisal CMN, Ahmad H, Niaz AA. Computationally intelligent real-time security surveillance system in the education sector using deep learning. PLoS One. 2024; 19(7):e0301908. doi: 10.1371/journal.pone.0301908 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0317023.ref003] 3. Singh R, Kalra MK, Nitiwarangkul C, Patti JA, Homayounieh F, Padole A, et al. Deep learning in chest radiography: detection of findings and presence of change. PLoS One. 2018; 13(10):e0204155. doi: 10.1371/journal.pone.0204155 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0317023.ref004] 4.Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, et al. Intriguing properties of neural networks. ArXiv. 2013; abs/1312.6199.

[pone.0317023.ref005] 5. Kurakin A, Goodfellow IJ, Bengio S. Adversarial examples in the physical world. In: Artificial Intelligence Safety and Security. Chapman and Hall/CRC; 2018; 99–112. [Google Scholar]

[pone.0317023.ref006] 6. Carlini N, Wagner D. Audio adversarial examples: targeted attacks on speech-to-text. In: 2018 IEEE Security and Privacy Workshops (SPW). 2018; 1–7. doi: 10.1109/SPW.2018.00009 [DOI] [Google Scholar]

[pone.0317023.ref007] 7. Zhang WE, Sheng QZ, Alhazmi A, Li C. Adversarial attacks on deep-learning models in natural language processing: a survey. ACM Trans Intell Syst Technol (TIST). 2020; 11(3):1–41. doi: 10.1145/3377553 [DOI] [Google Scholar]

[pone.0317023.ref008] 8.Bai T, Luo J, Zhao J, Wen B, Wang Q. Recent advances in adversarial training for adversarial robustness. ArXiv. 2021; /abs2102.01356.

[pone.0317023.ref009] 9.Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A. Towards deep learning models resistant to adversarial attacks. ArXiv. 2017; /abs1706.06083.

[pone.0317023.ref010] 10.Zhang H, Yu Y, Jiao J, Xing E, El Ghaoui L, Jordan M. Theoretically principled trade-off between robustness and accuracy. International Conference on Machine Learning. 2019; 7472–7482.

[pone.0317023.ref011] 11.Wong E, Rice L, Kolter JZ. Fast is better than free: revisiting adversarial training. ArXiv. 2020; /abs2001.03994.

[pone.0317023.ref012] 12.Goodfellow IJ, Shlens J, Szegedy C. Explaining and harnessing adversarial examples. ArXiv. 2014; /abs1412.6572.

[pone.0317023.ref013] 13.Zhao M, Zhang L, Kong Y, Yin B. Fast adversarial training with smooth convergence. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023; 4720–4729.

[pone.0317023.ref014] 14. Huang Z, Fan Y, Liu C, Zhang W, Zhang Y, Salzmann M, et al. Fast adversarial training with adaptive step size. IEEE Trans Image Process. 2023; 32:6102–6114. doi: 10.1109/TIP.2023.3326398 [DOI] [PubMed] [Google Scholar]

[pone.0317023.ref015] 15.De Jorge Aranda P, Bibi A, Volpi R, Sanyal A, Torr P, Rogez G, et al. Make some noise: reliable and efficient single-step adversarial training. Proceedings of the 36th Conference on Neural Information Processing Systems, 2022; 12881–12893.

[pone.0317023.ref016] 16.Moosavi-Dezfooli SM, Fawzi A, Frossard P. Deepfool: a simple and accurate method to fool deep neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016; 2574–2582.

[pone.0317023.ref017] 17.Dong Y, Liao F, Pang T, Su H, Zhu J, Hu X, Li J. Boosting adversarial attacks with momentum. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018; 9185–9193.

[pone.0317023.ref018] 18.Yao Z, Gholami A, Xu P, Keutzer K, Mahoney MW. Trust region based adversarial attack on neural networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019; 11350–11359.

[pone.0317023.ref019] 19.Carlini N, Wagner D. Towards evaluating the robustness of neural networks. 2017 IEEE Symposium on Security and Privacy (S&P). 2017; 39–57.

[pone.0317023.ref020] 20.Croce F, Hein M. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. International Conference on Machine Learning. 2020; 2206–2216.

[pone.0317023.ref021] 21.Croce F, Hein M. Minimally distorted adversarial examples with a fast adaptive boundary attack. International Conference on Machine Learning. 2020; 2196–2205.

[pone.0317023.ref022] 22.Andriushchenko M, Croce F, Flammarion N, Hein M. Square attack: a query-efficient black-box adversarial attack via random search. European Conference on Computer Vision. 2020; 484–501.

[pone.0317023.ref023] 23.Xu W, Evans D, Qi Y. Feature squeezing: detecting adversarial examples in deep neural networks. ArXiv. 2017; /abs1704.01155.

[pone.0317023.ref024] 24.Xie C, Wu Y, van der Maaten L, Yuille AL, He K. Feature denoising for improving adversarial robustness. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019; 501–509.

[pone.0317023.ref025] 25.Papernot N, McDaniel P. On the effectiveness of defensive distillation. ArXiv. 2016; /abs1607.05113.

[pone.0317023.ref026] 26.Athalye A, Carlini N, Wagner D. Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. International Conference on Machine Learning. 2018; 274–283.

[pone.0317023.ref027] 27. Shafahi A, Najibi M, Ghiasi MA, Xu Z, Dickerson J, Studer C, et al. Adversarial training for free! Proceedings of the 33rd Advances in Neural Information Processing Systems. 2019; 3358–3369. [Google Scholar]

[pone.0317023.ref028] 28.Jia X, Zhang Y, Wei X, Wu B, Ma K, Wang J, et al. Prior-guided adversarial initialization for fast adversarial training. European Conference on Computer Vision. 2022; 567–584.

[pone.0317023.ref029] 29.Andriushchenko M, Flammarion N. Understanding and improving fast adversarial training. Proceedings of the 34th Conference on Neural Information Processing Systems. 2020; 16048–16059.

[pone.0317023.ref030] 30.Kim H, Lee W, Lee J. Understanding catastrophic overfitting in single-step adversarial training. Proceedings of the AAAI Conference on Artificial Intelligence. 2021; 35(9):8119–8127.

[pone.0317023.ref031] 31.Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. Technical Report, University of Toronto. 2009.

[pone.0317023.ref032] 32. Le Y, Yang X. Tiny imagenet visual recognition challenge. CS 231N. 2015; 7(7):3. [Google Scholar]

[pone.0317023.ref033] 33.Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L. Imagenet: A large-scale hierarchical image databaserecognition. 2009 IEEE conference on computer vision and pattern recognition. 2009; 248–255.

[pone.0317023.ref034] 34.He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016; 770–778.

[pone.0317023.ref035] 35.Simonyan K. Very deep convolutional networks for large-scale image recognition. ArXiv. 2014; /abs1409.1556.

[pone.0317023.ref036] 36.Wei X, Zhu J, Yuan S, Su H. Sparse adversarial perturbations for videos. Proceedings of the AAAI Conference on Artificial Intelligence. 2019; 33(01):8973–8980.

[pone.0317023.ref037] 37.Graves A, Mohamed A, Hinton G. Speech recognition with deep recurrent neural networks. 2013 IEEE international conference on acoustics, speech and signal processing. 2013; 6645–6649.

[pone.0317023.ref038] 38.Papernot N, McDaniel P, Swami A, Harang R. Crafting adversarial input sequences for recurrent neural networks. MILCOM 2016-2016 IEEE Military Communications Conference. 2016; 49–54.

PERMALINK

Avoiding catastrophic overfitting in fast adversarial training with adaptive similarity step size

Jie-Chao Zhao

Jin Ding

Yong-Zhi Sun

Ping Tan

Ji-En Ma

You-Tong Fang

Roles

Abstract

Introduction

Fig 1. Adversarial training process where catastrophic overfitting occurs.

Related work

Adversarial examples and adversarial attack methods

Adversarial defense methods

Fast adversarial training with adaptive similarity step size

Fig 2. Adversarial training processes in traditional FGSM adversarial training.

Motivation

Table 1. Attack success rates of different single-step attack methods.

Table 2. Average total perturbation magnitudes generated by different single-step attack methods.

Table 3. Maximum total perturbation magnitudes generated by different single-step attack methods.

Fig 3. Comparison of adversarial samples generated by different single-step attack methods.

ATSS

Fig 4. Flowchart of fast adversarial training method based on adaptive similarity step size.

Fig 5. Illustrative examples for Euclidean distance similarity and cosine similarity.

Experiments

Datasets

Model parameters

Attack methods

Baselines

Metrics

Experiments on preventing catastrophic overfitting

Fig 6. Adversarial training process of ATSS on the ResNet18 model.

Fig 7. Adversarial training process of ATSS on the VGG19 model.

Fig 8. Adversarial training process of ATSS on the CIFAR-100 dataset.

Fig 9. Adversarial training process of ATSS on the Tiny ImageNet dataset.

Fig 10. Performance comparison during training among ATSS, FGSM-RS, and N-FGSM.

Experiments on adversarial robustness and training cost

Table 4. Classification accuracy and training time under adversarial attacks for different adversarial training methods.

Fig 11. Comparison of ATSS and multi-step adversarial training method in robust accuracy and training time.

Table 5. Classification accuracy against PGD-10 attacks on different datasets.

Ablation study

Table 6. Test results under different similarity calculation strategies.

Hyperparameter experiments

Influence coefficient

Table 7. Test results under different influence coefficient.

Standard step size

Table 8. Test results under different standard step size.

Discussions

Conclusion

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Alberto Marchisio

Roles

Author response to Decision Letter 0

Decision Letter 1

Alberto Marchisio

Roles

Acceptance letter

Alberto Marchisio

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases