Improving Adversarial Robustness of Deep Neural Networks via Adaptive Margin Evolution

Linhai Ma; Liang Liang

doi:10.1016/j.neucom.2023.126524

. Author manuscript; available in PMC: 2024 Sep 28.

Published in final edited form as: Neurocomputing (Amst). 2023 Jul 7;551:126524. doi: 10.1016/j.neucom.2023.126524

Improving Adversarial Robustness of Deep Neural Networks via Adaptive Margin Evolution

Linhai Ma ^a, Liang Liang ^a,^*

PMCID: PMC10426748 NIHMSID: NIHMS1918603 PMID: 37587916

Abstract

Adversarial training is the most popular and general strategy to improve Deep Neural Network (DNN) robustness against adversarial noises. Many adversarial training methods have been proposed in the past few years. However, most of these methods are highly susceptible to hyperparameters, especially the training noise upper bound. Tuning these hyperparameters is expensive and difficult for people not in the adversarial robustness research domain, which prevents adversarial training techniques from being used in many application fields. In this study, we propose a new adversarial training method, named Adaptive Margin Evolution (AME). Besides being hyperparameter-free for the user, our AME method places adversarial training samples into the optimal locations in the input space by gradually expanding the exploration range with self-adaptive and gradient-aware step sizes. We evaluate AME and the other seven well-known adversarial training methods on three common benchmark datasets (CIFAR10, SVHN, and Tiny ImageNet) under the most challenging adversarial attack: AutoAttack. The results show that: (1) On the three datasets, AME has the best overall performance; (2) On the Tiny ImageNet dataset, which is much more challenging, AME has the best performance at every noise level. Our work may pave the way for adopting adversarial training techniques in application domains where hyperparameter-free methods are preferred.

Keywords: Deep neural networks, adversarial robustness, adversarial training, optimal adversarial training sample, hyperparameter-free

1. Introduction

Deep neural networks (DNNs), especially convolutional neural networks (CNNs), have achieved remarkable state-of-the-art performances in various applications. However, DNNs have a vital weakness: a small amount of specially designed perturbations may fool a DNN model, and these perturbations are often invisible or inconspicuous [1, 2, 3]. These perturbations, which are called adversarial noises, were first studied in [4, 5]. Adversarial noises can affect a broad range of DNN-based applications, such as visual tracking [6], EEG signal analysis [7], ECG signal analysis [8], infrared object detection [9], road sign classification [10], speech recognition [11], text understanding [12], and malicious software detection [13].

Adversarial noises can be generated by three kinds of adversarial attacks: white-box attack, black-box attack, and gray-box attack [1]. To perform a white-box attack, the attacker knows everything about the target model, including its architecture, parameters, gradient information, etc. For a black-box attack, the attacker knows nothing about the inner structure of the target model, and therefore the attacker has to query the target model as a black box: obtain an output for an input. For a gray-box attack, the attacker only knows partial information about the target model, e.g., knowing the architecture but having no access to the target model’s parameters. Because white-box attackers know the most about the target model, white-box attacks are often the most powerful among the three classes of attacks. Thus, white-box attacks (e.g., PGD attack [14]) are the most popular for evaluating a model’s adversarial robustness. Sometimes, white-box attacks may be blocked by gradient obfuscation [15], which gives a false sense of security. Therefore, it is still necessary to use black-box attacks for assessing a model’s adversarial robustness. Currently, AutoAttack [16] is considered a comprehensive approach for adversarial robustness evaluation, which consists of three white-box attacks and one black-box attack.

Many researchers have studied the cause of adversarial vulnerability from different perspectives. It is shown that the adversarial vulnerability of DNN models stems from two factors [17]: the first factor is data sparsity which means in the high dimensional input data space, there are large regions outside the support of the data distribution; and the second factor is the existence of many redundant parameters in DNN models. It is also shown that a theoretical trade-off could exist between adversarial robustness and accuracy on clean data [18]. However, empirical evidence shows that the conflict between robustness and accuracy may not exist in some scenarios [8].

Efforts have been made to improve the adversarial robustness of DNNs [1, 2, 3, 14, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37], and among those, adversarial training is the most general strategy. By generating adversarial samples to train a DNN model, the model may gain robustness against adversarial noises. The vanilla adversarial training (AT) [14, 23] generates adversarial training samples with a fixed adversarial noise upper bound. Many advanced adversarial training methods have been proposed to improve AT by generating “higher quality” adversarial samples. The most well-known method could be TRADES [18] that uses loss regularization terms to make a trade-off between adversarial robustness and accuracy on clean data. TE [35] improves AT by mitigating a memorization issue. IAAT [28], FAT [29], and MMA [30] adapt the noise upper bound for adversarial training samples in the training process. According to the vulnerability of each training sample, GAIRAT [34] adds sample-wise weights to the loss to improve adversarial training. LAS [38] uses a learnable attack strategy to generate adversarial training samples. Recently, an edge enhancement-based method [36] is designed to preprocess the input image to assist adversarial training. By constraining adversarial training in a carefully extracted subspace, Sub-AT [39] controls the growth of the gradient to resolve the robust overfitting problem. Besides adversarial training, there are other methods for improving adversarial robustness. NADAR [40] improves the adversarial robustness of neural networks by dilating the neural architecture. RNA [41] replaces the batch normalization layers in the neural networks with different selected types of normalization to reduce the adversarial transferability, which improves the adversarial robustness of DNNs. For a complete review of adversarial attack and defense methods, we refer the reader to the recent excellent surveys [1, 2, 3].

Most adversarial training methods have hyperparameters, among which, the training noise upper bound is a very common hyperparameter. Often, a very large training noise upper bound leads to a substantial accuracy degradation on clean data, while a very small one may not be enough for improving adversarial robustness. Tuning these hyperparameters incurs high computation costs and is time-consuming (e.g., weeks to months if a fine grid search is used). Tuning these hyperparameters may also require an understanding of algorithm details, and it will cause an extra burden for application developers who are not in the adversarial robustness research domain. This issue has prevented adversarial training techniques from being adopted in many application domains.

We present a new adversarial training method, named Adaptive Margin Evolution (AME). Our contributions are as follows.

(I) We prove the existence of an optimal state for adversarial training under certain conditions.

(II) Based on the theorem of the optimal state, we develop AME, a new adversarial training method that is hyperparameter-free for the user. By gradually expanding the exploration range with self-adaptive and gradient-aware step sizes, adversarial training samples can be placed into the optimal locations in the input data space. We evaluate AME and the other seven well-known adversarial training methods on three commonly used benchmark datasets (CIFAR10, SVHN, and Tiny ImageNet) under the most challenging adversarial attack: AutoAttack [16], and AME has shown the following advantages: (1) AME has the best overall performance; (2) On the Tiny ImageNet dataset that is much more challenging, AME has the best performance at every noise level; and (3) unlike the other methods that need to tune hyperparameters (e.g., the training noise upper bound), our method AME has no hyperparameters to tune, and therefore it may pave the way for adopting adversarial training techniques in application domains where hyperparameter-free methods are preferred.

The source code is available at https://github.com/SarielMa/Adaptive-Margin-Evolution.

2. Related Work

2.1. Glossary

The glossary is shown in Table 1.

Table 1:

Glossary

Terms	Explanations
Noise level	The upper bound $ϵ$ of the adversarial noises for training or testing.
Clean accuracy	The model’s accuracy on clean data, which is also called standard accuracy in some literature.
Noisy accuracy	The model’s accuracy on noisy (adversarial) data, and it measures the model’s adversarial robustness, which is also called robust accuracy or adversarial accuracy in some literature.
Margin	The shortest distance between a sample and the decision boundary of a classifier in the input data space.
Clean sample	A sample without adversarial noises.
Noisy sample	A noisy (adversarial) sample.
Clean model	A naturally trained model, without using any adversarial training.
Robust model	An adversarially-trained model.

Open in a new tab

2.2. Vanilla Adversarial Training

Without loss of generality, we use the most widely studied untargeted adversarial attack as an example. Let ( $x, y$ ) be a pair of a clean sample $x$ and its true label $y$ . The objective for generating an adversarial sample is:

\max_{\tilde{x}} ™ (f (\tilde{x}), y) s . t . \tilde{x} \in {x^{'} ∣ ‖ x - x^{'} ‖_{p} \leq ϵ}

(1)

where $™ (.)$ is an objective function, e.g., the cross-entropy loss function, $\tilde{x}$ is the adversarial sample to be generated, $f (.)$ is the DNN model under attack, $ϵ$ is the attacking noise level, and $‖ . ‖_{p}$ is $L_{p}$ vector norm.

Given a clean sample ( $x, y$ ), there are many ways to solve Eq. (1) to obtain an adversarial sample $\tilde{x}$ . One popular method is Projected Gradient Descent (PGD) [14, 23], which approximates the optimal solution through many iterations:

x^{(k + 1)} \leftarrow Π_{ϵ} (α \cdot h (\frac{\partial}{\partial x^{(k)}} ™ (f (x^{(k)}), y)) + x^{(k)})

(2)

where $h (.)$ is the normalization function, $α$ is the step size, $x^{(k)}$ is the adversarial sample at the iteration $k$ , and $Π_{ϵ} (.)$ is a clip operation to ensure the generated perturbation to be within an $ϵ$ -ball, i.e., $‖ x^{(k)} - x ‖_{p} \leq ϵ$ . After K iterations (called K-PGD), it outputs the adversarial sample $\tilde{x} = x^{(K)}$ . This process is intuitively explained in Fig. 1.

Figure 1: — Create an adversarial sample $\tilde{x}$ by PGD (Eq. (2)): a solid blue arrow shows one PGD iteration; a dashed blue arrow shows that when $\tilde{x}$ moves out of the $ϵ$ -ball, it will be projected back onto the $ϵ$ -ball; the solid red arrow denotes the generated adversarial noise ( $\tilde{x} - x$ ) from the final iteration.

The objective of the vanilla Adversarial Training (AT) [14, 23] is:

min_{θ} \max_{\tilde{x}} ™ (f_{θ} (\tilde{x}), y) s . t . \tilde{x} \in {x^{'} ∣ ‖ x - x^{'} ‖_{p} \leq ϵ_{t r a i n}}

(3)

where $f_{θ} (.)$ is a DNN model with the parameter $θ$ , $ϵ_{t r a i n}$ is the noise level for adversarial training. The AT method uses a large number of noisy samples { $\tilde{x}, y$ } to train the DNN model $f_{θ}$ . In this way, the DNN model gains resistance against adversarial attacks. For AT, adversarial training samples are generated by using the PGD method (Eq. (2)) with a fixed noise level $ϵ_{t r a i n}$ for every clean sample { $x, y$ }, and this training noise level $ϵ_{t r a i n}$ affects the performance of AT, as shown in our experiments.

2.3. The Optimality of Adversarial Training Samples

Here, we provide an intuitive explanation of the optimality of adversarial training samples, and a more rigorous analysis can be found in the method section. From Fig. 2, the optimal location of an adversarial training sample $\tilde{x}$ should be very close to the true/optimal decision boundary, and the optimal noise level $ϵ_{t r a i n}$ should be the distance between $x$ and the optimal $\tilde{x}$ . This distance is similar to the “margin” in Support Vector Machine (SVM) [42]. So, we will call this optimal noise level as “margin” in this paper. The margin equals the shortest distance between $x$ and the corresponding decision boundary. However, finding the margin of a sample is challenging because the decision boundaries of a DNN are usually highly nonlinear in the input data space. Since samples may have different margins, the training noises should be tailored for the individual samples.

Figure 2: — (a) Adding adversarial noise to a sample/data point $x$ is to push it towards a direction in the input space. Once the data point crosses the decision boundary, the predicted class label from a DNN model is changed. The closer $x$ is to a decision boundary, the more likely $\tilde{x}$ is pushed across the decision boundary. (b) Given a training set ( $X, Y$ ), where $X$ denotes the set of samples and $Y$ denotes the set of true labels, ideally, adversarial training samples $\tilde{X}$ should be placed closer to the true/optimal decision boundary. By training the model with both ( $X, Y$ ) and ( $\tilde{X}, Y$ ), the model decision boundary is pushed far away from $X$ , and therefore adversarial robustness is improved. However, the distribution of training samples is not always ideal. The margin of each training sample is different. A uniform and fixed adversarial training perturbation can make adversarial training samples go across the true decision boundary, and a model trained on these adversarial samples may have low performance. (c) The optimal adversarial training samples $\tilde{X}$ should be about to cross the decision boundary. Training with too small noise $‖ \tilde{x} - x ‖_{p}$ is not effective enough. Training with too large noise $‖ \tilde{x} - x ‖_{p}$ leads to large accuracy degradation: the clean accuracy of the robust model is much lower than that of the clean model. This is an intuitive explanation, and our method does not assume a linear decision boundary.

2.4. The Margin Overestimation Problem

Several methods have tried to use different training noises for different samples, including IAAT [28], FAT [29], and MMA [30]. From the margin perspective (Section 2.3), the weakness of these methods can be explained. IAAT uses intuitive heuristics to adjust training noises, which do not prevent the generated samples from going across the decision boundary (i.e., margin overestimation). FAT uses PGD with an early-stop strategy to generate adversarial samples for training, and the generated samples are guaranteed to cross the decision boundary when the number of training epochs exceeds a pre-defined number (i.e., margin overestimation). MMA uses PGD with a large noise level to generate adversarial samples for training, and therefore it may overestimate sample margins (see Fig. 4(b) and the appendix). Margin overestimation will lead to lower performance, as shown in our experiments.

Figure 4: — Left: in AME, the estimated margin (shown by the dark dash line) is closer to the real margin (shown by the green line); Right: using the standard PGD, searching for the margin in a large range may result in margin overestimation. This is an intuitive explanation, and our method does not assume a linear decision boundary.

3. Method

3.1. Adaptive Margin Evolution (AME) and An Optimal State

Motivated by 2.2, 2.3, and 2.4, instead of using a fixed large noise level $ϵ_{t r a i n}$ for generating adversarial samples, gradually expanding the training noise level for an individual sample is a better strategy (see Fig. 3). Thus, in our AME method, the adversarial training samples will be gradually pushed onto the decision boundaries. AME not only can largely avoid the margin overestimation problem (Section 2.4) but also has one crucial property: an optimal state for adversarial training may be reached, which means that the noise level $ϵ_{t r a i n}$ no longer needs to be tuned manually. The overall cross-entropy loss is minimized at the optimal state, and the adversarial training samples are placed near the decision boundaries.

Figure 3: — Illustration of the adaptive margin evolution (AME) strategy: in the training process of our method, the sample-wise noise levels $ℰ (x)$ expand gradually, until an optimal state of adversarial training is reached, where for each $\tilde{x} \in c_{i}$ , $B_{i j}$ , in its softmax output, $p (i) = p (j)$ . (Section 3.1). This is an intuitive explanation using a simple case, and our method does not assume linear separability between classes.

Theorem 3.1 If the spatial distributions of the adversarial training samples in different classes (denoted by $q_{i} (\tilde{x})$ , $i$ is the index of class- $i$ $(c_{i})$ ) are the same on each of the decision boundaries (i.e., $q_{i} (\tilde{x}) = q_{j} (\tilde{x})$ for every $\tilde{x} \in B_{i j}$ , where $B_{i j}$ is the decision boundary between the two classes $c_{i}$ and $c_{j}$ ) and the DNN has sufficient capacity, the cross-entropy loss of the adversarial training samples (i.e., $\sum_{i} E_{\tilde{x} \in c_{i}} (- l o g (p_{i} (\tilde{x})))$ ) can reach to a minimum.

Without loss of generality, we assume there are $N$ classes. The softmax output of the DNN model $f (.)$ has $N$ components: $p_{1}, p_{2}, \dots, p_{N}$ , corresponding to the $N$ classes. If a data point $\tilde{x}$ is about to cross the decision boundary $B_{i j}$ between class- $i$ ( $c_{i}$ ) and class- $j$ ( $c_{j}$ ) (i.e., $\tilde{x}$ is on the class- $y$ side of the decision boundary), then $p_{i} (\tilde{x}) = p_{j} (\tilde{x})$ . The mathematical expectation of the cross-entropy loss of the generated adversarial training samples is:

l o s s = \sum_{i = 1}^{N} E_{\tilde{x} \in c_{i}} (- l o g (p_{i} (\tilde{x})))

(4)

Our AME method pushes the adversarial training samples towards/onto the decision boundaries. Then, the adversarial samples $\tilde{x} \in c_{i}$ are split into $N - 1$ sets: those pushed to decision boundary $B_{i k}$ are denoted by $\tilde{x} \in c_{i}$ , $B_{i k} (k \neq i)$ . So, we can get:

E_{\tilde{x} \in c_{i}} (- l o g (p_{i} (\tilde{x}))) = \sum_{k \neq i} E_{\tilde{x} \in c_{i}, B_{i, k}} (- l o g (p_{i} (\tilde{x})))

(5)

If the generated adversarial training samples (random variables) $\tilde{x} \in c_{i}$ and $\tilde{x} \in c_{j}$ have the same spatial distribution on the decision boundary $B_{i j}$ between the two classes, then:

l o s s_{i j} = E_{\tilde{x} \in c_{i}, B_{i j}} (- l o g (p_{i} (\tilde{x}))) + E_{\tilde{x} \in c_{j}, B_{i j}} (- l o g (p_{j} (\tilde{x}))) = E_{\tilde{x} \in B_{i j}} (- l o g (p_{i} (\tilde{x})) - l o g (p_{j} (\tilde{x}))) = E_{\tilde{x} \in B_{i j}} (- l o g (p_{i} (\tilde{x}) p_{j} (\tilde{x}))) \geq E_{\tilde{x} \in B_{i j}} (- l o g (\frac{a}{2} \times \frac{a}{2}))

(6)

The above inequality is based on the fact that $p_{i} (\tilde{x}) + p_{j} (\tilde{x}) = a \leq 1$ , $0 \leq p_{i} (\tilde{x}) \leq 1$ , and $0 \leq p_{j} (\tilde{x}) \leq 1$ , and therefore the product $p_{i} (\tilde{x}) p_{j} (\tilde{x})$ reaches the maximum when $p_{i} (\tilde{x}) = p_{j} (\tilde{x}) = a ∕ 2$ . The scalar $a$ can vary between 0 and 1. When $a = 1$ , the inequality shows the lower bound.

The total loss defined by Eq. 4 can be represented by $l o s s = \sum_{i \neq j} l o s s_{i j}$ . By Eq. 6, $l o s s_{i j}$ can reach its minimum. In Fig. 3, $l o s s = l o s s_{12} + l o s s_{23} + l o s s_{13}$ , and the adversarial samples for calculating $l o s s_{12}$ are different from those for calculating $l o s s_{23}$ and $l o s s_{13}$ . With enough capacity, the DNN can be trained such that all the three loss terms reach to minimum, which is ensured by the universal approximation theorem [43]. The analysis concludes the theorem (QED).

We note that each clean training sample is inside an $ϵ$ -ball during training and therefore away from the decision boundaries. Thus, these clean samples have little contribution to robustness. We also note that the theorem only shows the existence of an optimal state and it does not provide a method to check if a DNN has enough capacity.

We have shown that an optimal state exists when the generated adversarial training samples have the same spatial distribution on the (final) decision boundary. Here, we outline what will happen if the initial spatial distributions of the generated adversarial training samples in different classes are not the same on the current decision boundary. We note that our method will actively generate and push the adversarial training samples towards (“close to” due to numerical precision) the current decision boundary of the DNN model, and the decision boundary of the model will be adjusted dynamically during training. Let’s consider the following two terms.

F_{i} ≜ E_{\tilde{x} \in c_{i}, B_{i j}} = - \int q_{i} (\tilde{x}) l o g (p_{i} (\tilde{x})) d \tilde{x}

(7)

F_{j} ≜ E_{\tilde{x} \in c_{j}, B_{i j}} = - \int q_{j} (\tilde{x}) l o g (p_{j} (\tilde{x})) d \tilde{x}

(8)

where $q_{i} (\tilde{x})$ and $q_{j} (\tilde{x})$ are the distributions (i.e., densities) of the adversarial/noisy training samples on the current decision boundary between the two classes, and $q_{i} (\tilde{x})$ and $q_{j} (\tilde{x})$ may not be equal to each other. $F_{i}$ and $F_{j}$ can be interpreted as two forces that try to expand the margins of the samples in the two classes against each other. By dividing the decision boundary into small regions (i.e., linear segments), the two integrals can be evaluated in the individual regions. In a region, if $q_{i} (\tilde{x}) \leq q_{j} (\tilde{x})$ (i.e., more noisy training samples are generated in class- $i$ ), then the current state is not optimal: after training the model using the noisy samples, most of the noisy samples in class- $i$ will be correctly classified and most of the noisy samples in class- $j$ will be incorrectly classified, which is a simple result of classification with imbalanced data in the region. As a result, the decision boundary will move away from class- $i$ (then $q_{i} (\tilde{x})$ may decrease) and shift towards class- $j$ (then $q_{j} (\tilde{x})$ may increase), and therefore the margins of the corresponding samples in class- $i$ will expand and the margins of the corresponding samples in class- $j$ will shrink. The decision boundary will stop changing when the local densities of noisy samples in different classes are the same along the decision boundary, i.e., $q_{i} (\tilde{x})$ becomes equal to $q_{j} (\tilde{x})$ , which means an optimal state is reached. Since there are no mathematical tools available for quantitative analysis of DNN dynamics during training, the analysis in this paragraph has to be qualitative.

Thus, our AME method pushes the noisy samples close to the decision boundaries during adversarial training and may eventually reach the optimal state, and therefore the noise level $ϵ_{t r a i n}$ for generating adversarial training samples no longer needs to be tuned manually.

3.2. The Algorithms of Our AME Method

3.2.1. Self-adaptive Step Size for Margin Expansion

In AME, the training noise level, which is equal to the estimated sample-wise margin $ℰ (x)$ , is individualized for each sample $x$ . The training noise level for a sample will gradually expand (i.e., margin expansion) with a step size $Δ_{ϵ}$ , which poses a challenge to the optimizer: it needs to catch up with a loss function that will change its value due to the update of the adversarial samples for training, which is an analogy to the challenge of tracking a fast-moving target. While pushing an adversarial training sample $\tilde{x}$ away from the clean sample $x$ during training, the terrain of the loss function may vary rapidly. When $\tilde{x}$ is on a flat terrain, it may take a few iterations for the optimizer to adjust the model parameters. When $\tilde{x}$ is on a steep terrain, the optimizer may need much more iterations to adjust the model parameters. Since the number of iterations is usually fixed during training (in fact, it is one iteration per minibatch), the step size for margin expansion must not be constant and should be adjusted adaptively according to each adversarial sample and the local terrain.

Initial step size:

Let $x$ be a sample in the input data space. Then the initial step size $Δ_{ϵ}$ is:

Δ_{ϵ} = c e i l ((h (x) ∕ N) ∕ h (x ∕ 255)) \cdot h (x ∕ 255) = c e i l (255 ∕ N) \cdot h (x ∕ 255)

(9)

where $h (.)$ is the vector Lp-norm function, $c e i l (.)$ gives the smallest nearest integer no smaller than the input, and $N$ is the number of training epochs. The idea is that in the training process with the step size $Δ_{ϵ}$ , most of the input space should be covered. For example, when L-inf norm is used, the number of epochs is 150, and the pixel value ranges from 0 to 1, then $h (x) = 1$ , and $Δ_{ϵ} = c e i l ((1 ∕ 150) ∕ (1 ∕ 255)) \times (1 ∕ 255) = 2 ∕ 255$ . We chose the constant of 255 by using the fact that natural image usually is stored by 8-bit (uint8) precision.

To handle the challenge of tracking a moving target/loss by the optimizer, we use two types of information to adjust the step size: sample margin (i.e., training noise level) and gradient of the loss.

Step size adjustment using sample margin:

The step size $Δ_{ϵ}$ should be decayed such that a smaller step should be used for a larger sample margin. $ℰ (x)$ is the current margin (i.e., training noise level) for a clean sample $x$ . Then the decay rate corresponding to the clean sample $x$ is:

γ_{1} (x) = ρ^{f l o o r (ℰ (x) ∕ h (x ∕ 255))}

(10)

The larger $ℰ (x)$ is, the smaller $γ_{1} (x)$ is. $ρ$ is derived from a coarse estimation of the margin expansion process (see Appendix B).

Step size adjustment using gradient information:

We also let the step size $Δ_{ϵ}$ be gradient-aware. Let $\tilde{x}$ be the generated adversarial sample from the clean sample $x$ , $f (.)$ be the DNN model, $y$ be the ground truth, and $™ (.)$ be the loss function for generating adversarial samples. Then the gradient-aware decay rate for the clean sample $x$ is:

γ_{2} (x) = 1 ∕ (1 + h (\frac{\partial ™ (f (\tilde{x}), y)}{\partial \tilde{x}}))

(11)

The idea is shown in Fig. 5. $γ_{2} (x)$ ranges between 0 to 1. When the gradient at $\tilde{x}$ is 0, $γ_{2} (x)$ is 1, which means no need for decay on a flat terrain; when the gradient at $\tilde{x}$ is close to $\infty$ , $γ_{2} (x)$ is close to 0, which means the smallest step is needed on an extremely steep terrain.

Figure 5: — $\tilde{x}$ is an adversarial sample in the current iteration. After expansion with one step $Δ_{ϵ}$ , ${\tilde{x}}_{1}$ will be “pushed” to ${\tilde{x}}_{2}$ . If the gradient at ${\tilde{x}}_{1}$ is large (right), the loss is significantly changed, which will create a heavier load for the optimizer to adjust model parameters to fit the new data. In this case, the step size $Δ_{ϵ}$ should be reduced to let the optimizer keep up with the loss change.

3.2.2. Algorithms

One epoch of the AME training process is shown in Algorithm 1 (Python-like pseudo-code). It computes the loss and updates the DNN model (Line 2 to 10), and then it updates the sample-wise margin estimation (Line 11 to 12). It uses Algorithm 2 to generate the adversarial/noisy samples for training, in which: Line 3 to 7 explore and record the indexes of noisy samples that cross the decision boundaries; and Line 8 to 15 refine the margin estimation with a binary search. The flow chart of Algorithm 1 is shown in Fig. 6. In Algorithm 2, the generated adversarial training samples, $X_{o u t}$ , are guaranteed to be close to but not to cross the decision boundary.

Figure 6: — The flow chart of Algorithm 1 from the perspective of an individual sample.

Algorithm 1 AME Training in One Epoch

\begin{matrix} Input : the training set S; the DNN model f (.); g (.) is the function that transforms the output of f (.) to \\ a predicted class label, e.g., a r g m a x; L o s s is the loss function for training the model f; ℰ is the array of \\ the estimated sample margins: ℰ (i) is the margin of the sample indexed by the unique ID i, every ℰ (i) is \\ initialized to be Δ_{ϵ} . \\ Parameters : Δ_{ϵ} is the expansion step size given by Eq . (9) and adjusted by Eq . (10) and Eq . (11) \\ Output : Updated model f after this training epoch \\ Process : \\ 1 : for each batch of training samples (X, Y) with ID I d s in S do \\ 2 : Run the model f on clean samples: Z \leftarrow f (X) \\ 3 : L_{0} \leftarrow L o s s (Z, Y) \\ 4 : F l a g 1 \leftarrow [g (Z) = = Y] F l a g 1 records the indexes of correctly classified clean samples \\ 5 : Generate noisy samples using the Algorithm 2 : \\ \tilde{X}, F l a g 2 \leftarrow E P G D (X [F l a g 1], Y [F l a g 1], ℰ [F l a g 1], f, L o s s) \\ 6 : Use \tilde{X}, f (.), ℰ and Y to cmpute γ_{1} and γ_{2} (Eq . (10) and Eq . (11)) \\ 7 : Run the model f on noisy samples: \tilde{Z} \leftarrow f (\tilde{X}) \\ 8 : L_{1} \leftarrow L o s s (\tilde{Z}, Y [F l a g 1]) \\ 9 : L \leftarrow (L_{0} + L_{1}) ∕ 2 \\ 10 : Back-propagate from L and update the model f (.) \\ 11 : ℰ [F l a g 1 & F l a g 2] \leftarrow ℰ [F l a n g 1 & F l a g 2] + Δ_{ϵ} \cdot (γ_{1} \cdot γ_{2}) [F l a g 1 & F l a g 2] \\ 12 : ℰ [\sim F l a g 1 or \sim F l a g 2] \leftarrow ‖ X [\sim F l a g 1 or \sim F l a g 2] - \tilde{X} [\sim F l a g 2] ‖_{p} \\ 13 : end for \end{matrix}

Open in a new tab

Here is a detailed description of Algorithm 1. $X$ is a minibatch of clean training samples with true labels in $Y$ and unique IDs in Ids (Line 1). After the clean samples are processed by the DNN model $f (.)$ , the loss $L_{0}$ on these samples is obtained (Line 2-3). For the correctly classified samples, their indexes in $X$ are “recorded” by the binary array Flag1 (Line 4), and Algorithm 2 will be used to generate adversarial/noisy samples $\tilde{X}$ (Line 5). $γ_{1}$ and $γ_{2}$ are computed via Eq. (10) and Eq. (11). After the model processes the noisy samples, the loss $L_{1}$ on the noisy samples (Line 7-8) is obtained. Then, the model $f (.)$ is updated by backpropagation from the combined loss (Line 10). If both the clean samples and the corresponding noisy samples are correctly classified, which means the samples’ margins are not large enough to reach the decision boundary, the margins $ℰ [f l a g 1 & f l a g 2]$ of these training samples will be expanded (Line 11). Otherwise, the samples’ margins $ℰ [\sim f l a g 1 OR \sim f l a g 2]$ are too large, and the adversarial training noises with these magnitudes have already pushed $\tilde{X}$ across the decision boundary, and therefore, these sample margins should be refined smaller (Line 12).

Here is a detailed description of Algorithm 2 (EPGD). In each iteration (Line 2), $X$ is modified to get $\tilde{X}$ (Line 3-4, see Eq. (3)). The indexes of those noisy samples misclassified by the current model $f (.)$ are “recorded” by the array Counter (Line 5). Then, a binary search (Line 8-15) is applied to find the new $\tilde{x}$ , which is just about to cross the decision boundary. $X_{o u t}$ stores the noisy samples closer to the decision boundary but have not crossed it. This binary search runs for a fixed number of 10 iterations, refining margin estimation by the resolution of 2¹⁰ = 1024.

Algorithm 2 (Exploration-PGD): generate noisy samples

\begin{matrix} Input : training sample batch (X, Y); the estimated margins ℰ, currently; the DNN model f (.); g (.) is \\ the function that transforms the output of f (.) to a predicted class label, e.g., a r g m a x; the Loss function \\ L (.) . \\ Parameters : maximum PGD iteration number K \leftarrow 20; PGD step size α \leftarrow (4 \times ϵ) ∕ K for each ϵ in ℰ \\ Output : the generated noisy sample batch \tilde{X}, the F l a g 2 that records the indexes of correctly classified \\ noisy ∕ adversarial samples; \\ Function EPGD (X, Y, ℰ, f, L (.)) : \\ 1 : X_{o u t} \leftarrow X, C o u n t e r \leftarrow [0, \dots, 0] (length is the batch size) \\ 2 : while K > 0 do \\ 3 : \tilde{X} \leftarrow Π_{ℰ} (α \cdot h (\nabla_{X} L (f (X), Y)) + X) \\ 4 : X \leftarrow \tilde{X} \\ 5 : C o u n t e r [g (f (X)) \neq Y] \leftarrow 1 \\ 6 : K \leftarrow K - 1 \\ 7 : end while \\ 8 : N \leftarrow 10 \\ 9 : \tilde{X} \leftarrow (\tilde{X} + X_{o u t}) ∕ 2 \\ 10 : while N > 0 do \\ 11 : \tilde{Y} = g (f (\tilde{X})) \\ 12 : X_{o u t} [\tilde{Y} = = Y] \leftarrow \tilde{X} [\tilde{Y} = = Y] \\ 13 : \tilde{X} \leftarrow (\tilde{X} + X_{o u t}) ∕ 2 \\ 14 : N \leftarrow N - 1 \\ 15 : end while \\ 16 : F l a g 2 = (C o u n t e r = = 0) \\ 17 : return X_{o u t}, F l a g 2 \end{matrix}

Open in a new tab

4. Experiments

4.1. Environment Settings

Pytorch 1.9.0 [44] is used for model implementation and evaluation. Nvidia V100 GPUs are used for model training and testing. L-inf norm is the most widely used metric to measure adversarial noises, and therefore it is used in our experiments.

4.2. Comparing with Related Work

We compare our AME method with the other seven adversarial training methods [14, 18, 29, 30, 34, 35, 36], which have open-source code in Pytorch.

For the reader’s convenience, we highlight some differences between our AME and the other seven methods. The comparison of loss functions is shown in Table 2. AT [14] uses a fixed training noise level for every sample, whereas AME uses adaptive training noise levels for individual samples. TE [35] improves AT by mitigating a memorization issue. EE [36] preprocesses the input image edges to improve adversarial training, whereas AME does not preprocess the input. TRADES [18] uses a KL divergence-based regularization term in the loss, whereas AME uses the standard cross-entropy loss for both clean samples and noisy samples. GAIRAT [34] applies sample-wise weights in the loss, but the weights cannot prevent adversarial training samples from crossing the decision boundary, and large noise levels for training will lead to lower performance, as shown by the experimental results. FAT [29] and MMA [30] have been discussed in Section 2.4. Besides the margin overestimation problem, MMA [30] heavily relies on the soft-logit margin loss that may not lead to the optimal state in our Theorem 3.1. The most significant difference is that AME is based on our theorem 3.1, and therefore it is hyperparameter-free for the user.

Table 2:

Loss function comparison: $L (.)$ is the loss function of the clean model.

Methods	Loss Functions (sample-wise)
AT	$L (f (\tilde{x}), y)$ , where $\tilde{x}$ is an adversarial sample
FAT	$L (f (\tilde{x}), y)$ , where $\tilde{x}$ is from early-stop iterative PGD.
TRADES	$L (f (\tilde{x}), y) + β \cdot K L (p (y ∣ x) ‖ p (y ∣ \tilde{x}))$ , where $K L$ is the KL-Divergence
MMA	$L (f (x), y) . I (g (f (x)) \neq y) + L (f (\tilde{x}), y) . I (g (f (x)) = y)$ , where $I (.)$ is the indicator function
GAIRAT	$ω (x, y) \cdot L (f (\tilde{x}), y)$ , where $ω (x, y)$ is the sample-wise geometry-aware weight.
TE	$L (f (\tilde{x}), y) + ω \cdot ‖ f (\tilde{x} - \hat{p}) ‖_{2}^{2}$ , where $\hat{p}$ is the normalization of $p$ , the ensemble prediction, which is updated in each training epoch $p \leftarrow β p + (1 - β) f (x)$ .
EE	$L (f (\tilde{x}), y)$ , where $\tilde{x}$ is preprocessed with edge information.
AME (Ours)	$L (f (x), y) + L (f (\tilde{x}), y) . I (g (f (x)) = y)$ , where $\tilde{x}$ generated on the decision boundary.

Open in a new tab

Many adversarial training methods have more than one hyperparameter. In the experiments, we only study the sensitivity to $ϵ_{t r a i n}$ for these methods, and we use the default values for the other hyperparameters if a method has more than one hyperparameter. IAAT [28] has too many hyperparameters, and the original paper has shown that it is very sensitive to the hyperparameter values, and therefore it is excluded from our study. Also, optimizer settings are not considered as hyperparameters of a method in this study.

Settings for our AME:

AME has no hyperparameters to tune. The training settings (e.g., optimizer settings, the number of training epochs) are the same as those for training models on clean data only.

Settings for AT [14]:

Two values of $ϵ_{t r a i n}$ are used to show its sensitivity. The max number of PGD iterations is 20 (endorsed in [14]). The training settings are the same as those in [14].

Settings for MMA [30]:

For CIFAR10, we use the pretrained models from [30]. For Tiny ImageNet and SVHN, all settings are the same as those in [30]. These models have different $ϵ_{t r a i n}$ to show their sensitivity.

Settings for FAT [29], TE [35], TRADES [18], GAIRAT [34] and EE [36]:

we use their official code to train the models, and all settings are the same as those in the corresponding papers, except for the values of $ϵ_{t r a i n}$ .

4.3. CIFAR10 Experiment Settings

CIFAR10 [45] contains 32×32×3 color images in 10 classes. There are 6000 images per class, with 5000 for training and 1000 for testing. We apply all the methods to a widely used network: WideResNet-28-4 (WRN-28-4) [46]. When it is trained on clean data only, this model is denoted as “STD”, and the training settings are the same as those in [30].

4.4. SVHN Experiment Settings

The SVHN dataset [47] contains 32 × 32 × 3 color images in 10 classes, including 73257 images for training and 26032 images for testing. The network is ResNet-18 [48]. When it is trained on clean data only, this model is denoted as “STD”, and the training settings are the same as those in [35].

4.5. Tiny ImageNet Experiment Settings

Tiny ImageNet [49] is a miniature version of ImageNet. The training set contains 100000 color images in 200 classes, and the testing set has 10000 color images. Each image is downsized to 64×64. The network is ResNet-18 [48]. When it is trained on clean data only, this model is denoted as “STD”, and the training settings are the same as those in [36].

4.6. Adversarial Attacks for Method Evaluation

AutoAttack (AA) [16] is used to evaluate the adversarial robustness of DNNs in this paper. AutoAttack is a framework that contains different adversarial attack methods. While applying AutoAttack to the target network, the success of this “attack” is determined by whether any of these selected attack methods can successfully break the target network. In this work, the AA consists of four adversarial attacks: AutoPGD (a white-box untargeted attack, stronger than the standard PGD), APGD-t (a white-box targeted attack), FAB-t (a white-box targeted attack), and Square (a black-box attack). This combination is also defined as the “standard” version of AA in the source code provided by [16], which is currently the most reliable and challenging evaluation method. AA is parameter-free, so there is no need to configure it manually. This AA framework enables a thorough evaluation of the adversarial robustness of the target network by using a combination of attacks, having more scientific rigor than using a single attack.

Besides AutoAttack, we also evaluated model robustness using other well-known adversarial attacks available in the advertorch library [50], and the results can be found in Appendix C.

4.7. Results

We performed experiments with AutoAttack. The results are shown in Table 3, 4 and 5. In Table 6 and 7, we show the sensitivity study of the parameters, $K$ and $N$ , in Algorithm 2. The ablation study is shown in Table 8.

Table 3:

Results on SVHN under AutoAttack: (1) N. L. denotes “Noise Level” for adversarial attacks. (2) ${AT}_{ϵ}$ denotes AT with $ϵ_{t r a i n} = ϵ$ . Similar notations are used for others. (3) The value of $ϵ_{t r a i n}$ in Method with “*” is endorsed in the corresponding paper; (4) The largest value in each column is bolded.

N. L.	0	2/255	4/255	6/255	Avg.
STD	95.82	50.06	15.20	0	40.27
AT8*[14]	89.00	72.47	64.49	54.50	70.12
AT16	19.59	19.59	19.59	19.59	19.59
FAT8*[29]	90.50	81.53	69.98	57.39	74.85
FAT16	88.44	78.98	67.54	54.87	72.46
GAIRAT8*[34]	91.05	81.34	69.13	55.36	74.22
GAIRAT16	89.37	79.19	64.63	49.93	70.78
TRADES8*[18]	88.60	80.20	70.98	59.92	74.92
TRADES16	82.27	71.32	62.05	53.92	67.39
EE16*[36]	19.59	19.59	19.59	19.59	19.59
EE8	88.45	70.55	66.37	53.02	69.59
TE8*[35]	90.09	82.39	70.04	59.26	75.45
TE16	83.29	76.07	67.61	58.52	71.37
MMA12*[30]	90.26	80.54	67.99	54.82	73.40
MMA20*	88.69	78.57	66.19	53.68	71.78
AME	90.74	82.84	72.25	60.76	76.65

Open in a new tab

Table 4:

Results on CIFAR10 under AutoAttack: (1) N. L. denotes “Noise Level” for adversarial attacks. (2) ${AT}_{ϵ}$ denotes AT with $ϵ_{t r a i n} = ϵ$ . Similar notations are used for others. (3) The value of $ϵ_{t r a i n}$ in Method with “*” is endorsed in the corresponding paper; (4) The largest value in each column is bolded.

N. L.	0	2/255	4/255	6/255	8/255	Avg.
STD	94.94	1.57	0	0	0	19.31
AT8*[14]	85.31	76.62	66.34	54.84	43.42	65.31
AT12	77.83	51.39	63.33	55.24	45.18	58.59
FAT8*[29]	88.15	78.33	65.63	51.88	38.50	64.49
FAT16	86.89	76.59	64.53	51.42	38.11	63.51
GAIRAT8*[34]	80.96	68.36	53.40	38.46	25.97	53.43
GAIRAT16	77.96	65.53	51.48	38.63	27.26	52.17
TRADES8*[18]	88.25	79.32	67.38	54.46	41.17	66.11
TRADES16	81.75	68.08	53.00	39.07	27.56	53.89
EE16*[36]	77.18	68.64	59.16	49.47	39.55	58.80
EE8	85.64	73.83	58.77	42.43	28.51	57.83
TE8*[35]	84.69	76.75	67.66	55.77	45.26	66.03
TE12	78.23	70.93	63.12	54.54	44.72	62.31
MMA12*[30]	88.59	79.30	67.47	54.27	41.13	65.95
MMA20*	86.56	77.14	65.70	53.82	42.58	65.16
AME	87.94	79.35	68.39	56.25	45.23	67.43

Open in a new tab

Table 5:

Results on Tiny ImageNet under AutoAttack: (1) N. L. denotes “Noise Level” for adversarial attacks. (2) ${AT}_{ϵ}$ denotes AT with $ϵ_{t r a i n} = ϵ$ . Similar notations are used for others. (3) The value of $ϵ_{t r a i n}$ in Method with “*” is endorsed in the corresponding paper; (4) The largest value in each column is bolded.

N. L.	0	2/255	4/255	6/255	Avg.
STD	44.90	3.04	0.09	0	11.99
AT8*[14]	27.25	18.12	11.79	7.34	16.13
AT4	34.37	21.33	12.60	7.27	18.89
FAT8*[29]	33.74	20.56	12.39	7.09	18.45
FAT12	31.26	19.89	12.52	7.51	17.80
GAIRAT8*[34]	28.82	18.80	11.95	3.98	15.89
GAIRAT4	34.51	21.46	12.55	3.75	18.07
TRADES8*[18]	32.40	19.34	13.30	8.86	18.48
TRADES4	36.43	19.99	11.86	6.95	18.81
MMA12*[30]	30.83	19.20	11.76	6.84	17.16
MMA20*	25.42	17.91	10.95	7.07	15.34
EE16*[36]	27.57	19.76	15.10	10.21	18.16
EE8	30.09	21.00	15.37	10.17	19.16
TE8*[35]	28.31	18.34	11.01	6.93	16.15
TE4	34.41	20.44	11.56	6.27	18.17
AME	40.84	25.95	16.22	10.22	23.30

Open in a new tab

Table 6:

Sensitivity study of $K$ in Algorithm 2 at noise level of 6/255. The dataset and settings are the same as those in the Tiny ImageNet experiment, except for $K$ .

K	16	17	18	19	20
Clean Acc.	40.11	39.70	40.14	39.95	40.84
Noisy Acc.	10.16	10.60	10.07	10.01	10.22
K	21	22	23	24	25
Clean Acc.	39.75	39.82	40.63	40.59	39.66
Noisy Acc.	10.01	10.32	10.24	10.37	10.04

Open in a new tab

Table 7:

Sensitivity study of $N$ in Algorithm 2 at noise level of 6/255. The dataset and settings are the same as those in the Tiny ImageNet experiment, except for $N$ .

N	6	7	8	9	10
Clean Acc.	40.93	40.08	40.28	39.24	40.84
Noisy Acc.	10.01	10.23	10.13	10.04	10.22
N	11	12	13	14	15
Clean Acc.	39.73	40.95	40.25	40.12	40.10
Noisy Acc.	10.32	10.07	10.44	10.07	10.50

Open in a new tab

Table 8:

Ablation study. The noise level is 6/255. The dataset and network settings are the same as those in the Tiny ImageNet experiment. All symbols in this table can be found in Algorithm 1.“AME-TRADES” means to replace $L_{0}$ by the KL divergence [18]. The $γ_{2}$ variant is $γ_{2} (x) = 1 ∕ (1 + e^{- h (\frac{\partial ™ (f (\tilde{x}), y)}{\partial \tilde{x}})})$

Methods	Clean Acc.	Noisy Acc.	Avg.
Original	40.84	10.22	25.53
Train without $L_{0^{}}$ ^	37.76	10.78	24.27
Train without $γ_{1}$	38.52	10.75	24.64
Train without $γ_{2}$	39.60	10.04	24.82
AME-TRADES	40.71	8.71	24.71
Train with $γ_{2}$ variant	39.60	10.03	24.82

Open in a new tab

Note: Because at the beginning of training, there are very few correctly classified samples, we removed the $f l a g 1$ while training with only $L 1$ .

4.8. Discussion

From the results in Tables 3, 4, and 5, we have the following observation. AME has the best overall performance on the three datasets. The other methods are sensitive to the hyperparameter $ϵ_{t r a i n}$ . On the Tiny ImageNet dataset (Table 5), AME outperforms the other methods at every noise level.

On the SVHN dataset (Table 3), the second best method is TE with $ϵ_{t r a i n} = 8 ∕ 255$ . However, when $ϵ_{t r a i n}$ is changed from 8/255 to 16/255, the average accuracy of TE decreases by >4%. For AT, when $ϵ_{t r a i n}$ is changed from 8/255 to 16/255, the average accuracy decreases by >50%. Other methods also show sensitivity to $ϵ_{t r a i n}$ .

On the CIFAR10 dataset (Table 4), the second best method is TRADES with $ϵ_{t r a i n} = 8 ∕ 255$ , and when $ϵ_{t r a i n}$ is changed from 8/255 to 16/255, the average accuracy of TRADES decreases by >10%. For AT, when $ϵ_{t r a i n}$ is changed from 8/255 to 12/255, the average accuracy decreases by >8%. Other methods also show sensitivity to $ϵ_{t r a i n}$ .

AME may not guarantee the best result at every noise level on every dataset, but it always has the best overall performance (average accuracy). It is because AME generates appropriate adversarial training samples that may lead to an optimal state. We refer the reader to the appendix for additional supportive results. A limitation of this study is that we only compared AME with the other seven well-known adversarial training methods, as it is computationally infeasible to evaluate all adversarial training methods in the literature. With no tunable hyperparameters, AME performs consistently well on the three datasets, which has demonstrated its major benefit.

We conducted a study of $K$ and $N$ in Algorithm 2, as shown in Table 6 and Table 7. The results indicate that our method is not very sensitive to $K$ and $N$ . $K$ is the number of iterations in our Exploration-PGD that generates adversarial training samples, and a large enough $K$ ensures the chance of finding an adversarial sample. $N$ is the number of iterations for binary-search to locate an adversarial training sample on (close to) the decision boundary, and a large enough $N$ ensures that the generated adversarial training sample $\tilde{x}$ is close to the decision boundary. Therefore, we can simply fix the values of $K$ and $N$ , and our method becomes hyperparameter-free for the user.

We also conducted ablation studies of Algorithm 1, and the results are shown in Table 8. The clean accuracy drops when the DNN is trained without using the loss on clean samples. The clean accuracy also drops if $γ_{1}$ is not used, and both clean accuracy and noisy accuracy drop if not using $γ_{2}$ . When using the KL divergence loss, the noisy accuracy drops. We also tried another version of $γ_{2} (x)$ (see Table 8), and the results show that the original formula works better.

5. Conclusion

In this study, based on the theorem of the optimal state, we designed Adaptive Margin Evolution (AME), a new adversarial training method that adaptively adjusts the training noise levels of individual samples and does not need the user to tune hyperparameters. AME performs consistently well on the three commonly used benchmark datasets.

We hope our work may pave the way for adopting adversarial training techniques in application domains, such as medical field that prefers hyperparameter-free methods. We note that biomedical signals (e.g., EEG [51], ECG [52]) are different from the images in those benchmark datasets, attack and defense for these signals could become very complex. For example, smoothed adversarial examples can be constructed to fool ECG signal classifiers [53], and the noise patterns are quite different from those generated by PGD. Therefore, adversarial training methods may need to generate noises with diverse patterns to improve DNN robustness. Compared to images, an EEG signal could have hundreds of channels, and the attackers may choose to attack a few channels. In such scenarios, adding training noises to all the channels may not be an efficient approach. Therefore, to design an appropriate defense method, one may need to take into account the differences between natural images and bio-signals.

Acknowledgement

This work was supported in part by the NIH grant R01HL158829.

Appendix A. Additional Supportive Results

Appendix A.1. Margin Overestimation

Fig. A.7 shows the margin overestimation problem of MMA.

Figure A.7: — MMA uses the PGD with a large noise level to estimate sample-wise margins. Here, we use the PGD to estimate sample margins of a clean model (WRN-28-4) trained for 199 epochs on CIFAR10. The x-axis is the value of each margin. The y-axis is the number of samples in each bin of margins. The left plot shows estimated margins from the PGD with the noise level of 16/255. The right plot shows estimated margins from the PGD with the noise level of 8/255. Estimated margins using the noise level of 16/255 are much larger than those using the noise level of 8/255. Thus, the estimated margins from MMA are very sensitive to the noise level used in the PGD.

Appendix A.2. Margin Distributions of Adversarially-trained Models

Fig. A.8 shows the distribution of the estimated margins from our AME method. Fig. A.9 shows the distribution of the estimated margins from the MMA method. The distributions are from the CIFAR10 dataset. The x-axis is the value of each margin. The y-axis is the number of samples in each bin.

(1) Fig. A.8 shows a Gaussian-like distribution, which supports the existence of the optimal state that can be reached by AME using the adaptive margin evolution strategy.

(2) Fig. A.9 shows the significant margin overestimation problem of the MMA method, which has been discussed in Section 2.4. The MMA-estimated margins are much larger than the AME-estimated margins. Since AME outperforms MMA at every noise level on CIFAR10 (Table 4), we infer that MMA has the margin overestimation problem, leading to lower performance.

Figure A.8: — Estimated margin distribution of the model trained by AME

Figure A.9: — Estimated margin distribution of the model trained by MMA

Appendix B. The Value of $ρ$

We consider an extreme case that the margin of a sample keeps expanding in each epoch

ℰ_{n} = ℰ_{n - 1} + Δ_{ϵ} \cdot ρ^{ℰ_{n - 1} ∕ h (x ∕ 255)}

(B.1)

where n is from 1 to $N$ , and $ℰ_{0} = 0$ .

ℰ_{N} < ℰ_{m a x} = 32 ∕ 255

(B.2)

Here, $ℰ_{m a x} = 32 ∕ 255$ is the “impossible-to-be-robust” noise level [30]. Even with adversarial training, the accuracy of a model at this noise level will be very low on most datasets.

Using this noise level, we can obtain the upper bound of $ρ$ , which will be used in AME. For example, if $Δ_{ϵ} = 2 ∕ 255$ and $N = 150$ , then $ρ \approx 0.9$ .

Appendix C. Evaluation with More Adversarial Attacks

In addition to AutoAttack, we performed experiments with two other well-known adversarial attacks, PGD and C&W. The results are shown in Table C.9, C.10 and C.11. The adversarial attacks are available in the advertorch library [50]. The testing noise level is 6/255, which is measured by the L-inf norm. For both of the attacks, the number of iterations is 1000. Other parameters are set as default.

Table C.9:

Results on SVHN dataset under PGD and C&W attacks: (1)“natural” denotes “Clean Accuracy”. (2) ${AT}_{ϵ}$ denotes AT with $ϵ_{t r a i n} = ϵ$ . Similar notations are used for others. (3) $ϵ_{t r a i n}$ is 6/255 measured by L-inf norm. (4) PGD and C&W run for 1000 iterations. (5) The largest value in each column is bolded.

	Accuracy			Avg.
Method	natural	PGD	C&W	Avg.
AME	90.74	63.06	63.91	72.57
AT8	89.00	52.77	53.05	64.94
AT16	19.59	19.59	19.59	19.59
FAT8	90.50	59.96	60.23	70.23
FAT16	88.44	58.67	57.36	68.15
GAI8	91.05	56.47	53.79	67.10
GAI16	89.37	62.91	52.24	68.17
TRADES8	88.60	62.93	62.39	71.31
TRADES16	82.27	57.58	55.50	65.12
EE8	88.45	56.94	57.80	67.73
EE16	19.59	19.59	19.59	19.59
TE16	83.29	62.52	59.68	68.50
TE8	90.09	63.45	60.79	71.44
MMA12	90.26	54.80	56.07	67.05
MMA20	88.69	57.34	58.73	68.25

Open in a new tab

Table C.10:

Results on CIFAR10 dataset under PGD and C&W attacks: (1)“natural” denotes “Clean Accuracy”. (2) ${AT}_{ϵ}$ denotes AT with $ϵ_{t r a i n} = ϵ$ . Similar notations are used for others. (3) $ϵ_{t r a i n}$ is 6/255 measured by L-inf norm. (4) PGD and C&W run for 1000 iterations. (5) The largest value in each column is bolded.

	Accuracy			Avg.
Method	natural	PGD	C&W	Avg.
AME	87.94	57.09	57.96	67.66
AT12	77.83	57.01	56.41	63.75
AT8	85.31	56.81	56.69	66.27
FAT8	88.15	53.51	53.71	65.12
FAT16	86.89	52.79	53.23	64.30
GAI8	83.96	57.06	40.51	60.51
GAI16	77.96	56.71	40.43	58.37
TRADES8	88.25	56.12	55.66	66.68
TRADES16	81.75	42.66	41.04	55.15
EE16	77.18	53.22	51.12	60.51
EE8	85.64	43.77	44.65	58.02
TE8	84.69	57.48	57.90	66.69
TE12	78.23	56.17	56.25	63.55
MMA12	88.59	56.03	57.04	67.22
MMA20	86.56	56.91	57.02	66.83

Open in a new tab

Table C.11:

Results on Tiny ImageNet dataset under PGD and C&W attacks: (1)“natural” denotes “Clean Accuracy”. (2) ${AT}_{ϵ}$ denotes AT with $ϵ_{t r a i n} = ϵ$ . Similar notations are used for others. (3) $ϵ_{t r a i n}$ is 6/255 measured by L-inf norm. (4) PGD and C&W run for 1000 iterations. (5) The largest value in each column is bolded.

	Accuracy			Avg.
Method	natural	PGD	C&W	Avg.
AME	40.84	10.83	11.05	20.91
AT4	34.37	7.90	8.01	16.76
AT8	27.25	7.93	7.76	14.31
FAT8	33.74	7.42	7.73	16.30
FAT12	31.26	8.11	8.32	15.90
GAI8	28.82	9.03	7.95	15.27
GAI4	34.51	8.02	7.82	16.78
TRADES4	36.43	7.39	7.57	17.13
TRADES8	32.40	10.79	9.45	17.55
EE16	27.57	12.28	10.84	16.90
EE8	30.09	11.25	10.99	17.44
TE8	28.31	7.56	7.43	14.43
TE4	34.41	7.06	7.04	16.17
MMA12	30.83	7.75	7.52	15.37
MMA20	25.42	7.65	7.54	13.54

Open in a new tab

Footnotes

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

[1].Wang J, Wang C, Lin Q, Luo C, Wu C, Li J, Adversarial attacks and defenses in deep learning for image recognition: A survey, Neurocomputing (2022). [Google Scholar]
[2].Mi J-X, Wang X-D, Zhou L-F, Cheng K, Adversarial examples based on object detection tasks: A survey, Neurocomputing (2022). [Google Scholar]
[3].Qiu S, Liu Q, Zhou S, Huang W, Adversarial attack and defense technologies in natural language processing: A survey, Neurocomputing 492 (2022) 278–307. [Google Scholar]
[4].Szegedy C, Zaremba W, et al. , Intriguing properties of neural networks, in: The International Conference on Learning Representations, 2014. [Google Scholar]
[5].Goodfellow I, Shlens J, et al. , Explaining and harnessing adversarial examples, in: The International Conference on Learning Representations, 2015. [Google Scholar]
[6].Suttapak W, Zhang J, Zhang L, Diminishing-feature attack: The adversarial infiltration on visual tracking, Neurocomputing 509 (2022) 21–33. [Google Scholar]
[7].Kwon H, Lee S, Friend-guard adversarial noise designed for electroencephalogram-based brain–computer interface spellers, Neurocomputing 506 (2022) 184–195. [Google Scholar]
[8].Ma L, Liang L, A regularization method to improve adversarial robustness of neural networks for ecg signal classification, Computers in Biology and Medicine 144 (2022) 105345. [DOI] [PubMed] [Google Scholar]
[9].Kim H, Lee C, Upcycling adversarial attacks for infrared object detection, Neurocomputing 482 (2022) 1–13. [Google Scholar]
[10].Eykholt K, Evtimov I, Fernandes E, Li B, Rahmati A, Xiao C, Prakash A, Kohno T, Song D, Robust physical-world attacks on deep learning visual classification, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1625–1634. [Google Scholar]
[11].Carlini N, Wagner D, Audio adversarial examples: Targeted attacks on speech-to-text, in: 2018 IEEE security and privacy workshops (SPW), IEEE, 2018, pp. 1–7. [Google Scholar]
[12].Li J, Ji S, Du T, Li B, Wang T, Textbugger: Generating adversarial text against real-world applications, in: 26th Annual Network and Distributed System Security Symposium, 2019. [Google Scholar]
[13].Liu X, Zhang J, Lin Y, Li H, Atmpa: attacking machine learning-based malware visualization detection methods via adversarial examples, in: 2019 IEEE/ACM 27th International Symposium on Quality of Service (IWQoS), IEEE, 2019, pp. 1–10. [Google Scholar]
[14].Madry A, Makelov A, et al. , Towards deep learning models resistant to adversarial attacks, in: The International Conference on Learning Representations, 2018. [Google Scholar]
[15].Athalye A, Carlini N, Wagner D, Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples, in: International conference on machine learning, PMLR, 2018, pp. 274–283. [Google Scholar]
[16].Croce F, Hein M, Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks, in: The International Conference on Machine Learning, PMLR, 2020, pp. 2206–2216. [Google Scholar]
[17].Paknezhad M, Ngo CP, Winarto AA, Cheong A, Beh CY, Wu J, Lee HK, Explaining adversarial vulnerability with a data sparsity hypothesis, Neurocomputing (2022). [Google Scholar]
[18].Zhang H, Yu Y, othersl, Theoretically principled trade-off between robustness and accuracy, in: The International Conference on Machine Learning, 2019. [Google Scholar]
[19].Crecchi F, Melis M, Sotgiu A, Bacciu D, Biggio B, Fader: Fast adversarial example rejection, Neurocomputing 470 (2022) 257–268. [Google Scholar]
[20].Yin S.-l., Zhang X.-l., Zuo L.-y., Defending against adversarial attacks using spherical sampling-based variational auto-encoder, Neurocomputing 478 (2022) 1–10. [Google Scholar]
[21].Oneto L, Ridella S, Anguita D, The benefits of adversarial defense in generalization, Neurocomputing 505 (2022) 125–141. [Google Scholar]
[22].Lust J, Condurache AP, Efficient detection of adversarial, out-of-distribution and other misclassified samples, Neurocomputing 470 (2022) 335–343. [Google Scholar]
[23].Kurakin A, Goodfellow I, et al. , Adversarial examples in the physical world, in: Artificial intelligence safety and security, 2018. [Google Scholar]
[24].Wang Y, Zou D, et al. , Improving adversarial robustness requires revisiting misclassified examples, in: The International Conference on Learning Representations, 2019. [Google Scholar]
[25].Wang Y, Ma X, et al. , On the convergence and robustness of adversarial training, in: The International Conference on Machine Learning, 2019. [Google Scholar]
[26].Sitawarin C, Chakraborty S, et al. , Sat: Improving adversarial training via curriculum-based loss smoothing, in: The 14th ACM Workshop on Artificial Intelligence and Security, 2020. [Google Scholar]
[27].Cai Q-Z, Liu C, et al. , Curriculum adversarial training, in: International Joint Conferences on Artificial Intelligence, 2018. [Google Scholar]
[28].Balaji Y, Goldstein T, et al. , Instance adaptive adversarial training: Improved accuracy tradeoffs in neural nets, preprint arXiv:1910.08051 (2019). [Google Scholar]
[29].Zhang J, Xu X, et al. , Attacks which do not kill training make adversarial learning stronger, in: The International Conference on Machine Learning, 2020. [Google Scholar]
[30].Ding GW, Sharma Y, et al. , Mma training: Direct input space margin maximization through adversarial training, in: The International Conference on Learning Representations, 2019. [Google Scholar]
[31].Baytaş İnci M., Deb D, Robustness-via-synthesis: Robust training with generative adversarial perturbations, Neurocomputing 516 (2023) 49–60. [Google Scholar]
[32].Wang Y, Zhang W, Shen T, Yu H, Wang F-Y, Binary thresholding defense against adversarial attacks, Neurocomputing 445 (2021) 61–71. [Google Scholar]
[33].Cui J, Liu S, Wang L, Jia J, Learnable boundary guided adversarial training, in: The IEEE/CVF International Conference on Computer Vision, 2021, pp. 15721–15730. [Google Scholar]
[34].Zhang J, Zhu J, et al. , Geometry-aware instance-reweighted adversarial training, in: The International Conference on Learning Representations, 2020. [Google Scholar]
[35].Dong Y, Xu K, Yang X, Pang T, Deng Z, Su H, Zhu J, Exploring memorization in adversarial training, in: The International Conference on Learning Representations, 2022. [Google Scholar]
[36].He L, Ai Q, Lei Y, Pan L, Ren Y, Xu Z, Edge enhancement improves adversarial robustness in image classification, Neurocomputing (2022). [Google Scholar]
[37].Yu X, Smedemark-Margulies N, Aeron S, Koike-Akino T, Moulin P, Brand M, Parsons K, Wang Y, Improving adversarial robustness by learning shared information, Pattern Recognition 134 (2023) 109054. doi: 10.1016/j.patcog.2022.109054. [DOI] [Google Scholar]
[38].Jia X, Zhang Y, Wu B, Ma K, Wang J, Cao X, Las-at: Adversarial training with learnable attack strategy, in: The IEEE / CVF Computer Vision and Pattern Recognition Conference, 2022, pp. 13398–13408. [Google Scholar]
[39].Li T, Wu Y, Chen S, Fang K, Huang X, Subspace adversarial training, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13409–13418. [Google Scholar]
[40].Li Y, Yang Z, Wang Y, Xu C, Neural architecture dilation for adversarial robustness, in: Ranzato M, Beygelzimer A, Dauphin Y, Liang P, Vaughan JW (Eds.), Advances in Neural Information Processing Systems, Vol. 34, Curran Associates, Inc., 2021, pp. 29578–29589. [Google Scholar]
[41].Dong M, Chen X, Wang Y, Xu C, Random normalization aggregation for adversarial defense, in: Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A (Eds.), Advances in Neural Information Processing Systems, Vol. 35, Curran Associates, Inc., 2022, pp. 33676–33688. [Google Scholar]
[42].Cortes C, Vapnik V, Support-vector networks, Machine learning (1995). [Google Scholar]
[43].Lu Y, Lu J, A universal approximation theorem of deep neural networks for expressing probability distributions, in: Larochelle H, Ranzato M, Hadsell R, Balcan M, Lin H (Eds.), Advances in Neural Information Processing Systems, Vol. 33, Curran Associates, Inc., 2020, pp. 3094–3105. [Google Scholar]
[44].Paszke A, et al. , Pytorch: An imperative style, high-performance deep learning library, in: Advances in Neural Information Processing Systems, 2019. [Google Scholar]
[45].Krizhevsky A, Hinton G, Learning multiple layers of features from tiny images, in: Technical report, University of Toronto, Toronto, Ontario, 2009. [Google Scholar]
[46].Zagoruyko S, Komodakis N, Wide residual networks, arXiv preprint arXiv:1605.07146 (2016). [Google Scholar]
[47].Netzer Y, Wang T, et al. , Reading digits in natural images with unsupervised feature learning, in: The Conference and Workshop on Neural Information Processing Systems Workshop on Deep Learning and Unsupervised Feature Learning, 2011. [Google Scholar]
[48].He K, Zhang X, et al. , Deep residual learning for image recognition, in: The IEEE / CVF Computer Vision and Pattern Recognition Conference, 2016. [Google Scholar]
[49].Chrabaszcz P, Loshchilov I, Hutter F, A downsampled variant of imagenet as an alternative to the cifar datasets, arXiv preprint arXiv:1707.08819 (2017). [Google Scholar]
[50].Ding GW, Wang L, et al. , Advertorch v0. 1: An adversarial robustness toolbox based on pytorch, preprint arXiv:1902.07623 (2019). [Google Scholar]
[51].Praveena DM, Sarah DA, George ST, Deep learning techniques for eeg signal applications – a review, IETE Journal of Research 68 (4) (2022) 3030–3037. doi: 10.1080/03772063.2020.1749143. [DOI] [Google Scholar]
[52].Hong S, Zhou Y, Shang J, Xiao C, Sun J, Opportunities and challenges of deep learning methods for electrocardiogram data: A systematic review, Computers in Biology and Medicine 122 (2020) 103801. doi: 10.1016/j.compbiomed.2020.103801. [DOI] [PubMed] [Google Scholar]
[53].Han X, Hu Y, Foschini L, Chinitz L, Jankelson L, Ranganath R, Deep learning models for electrocardiograms are susceptible to adversarial attack, Nature medicine 26 (3) (2020) 360–363. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] [1].Wang J, Wang C, Lin Q, Luo C, Wu C, Li J, Adversarial attacks and defenses in deep learning for image recognition: A survey, Neurocomputing (2022). [Google Scholar]

[R2] [2].Mi J-X, Wang X-D, Zhou L-F, Cheng K, Adversarial examples based on object detection tasks: A survey, Neurocomputing (2022). [Google Scholar]

[R3] [3].Qiu S, Liu Q, Zhou S, Huang W, Adversarial attack and defense technologies in natural language processing: A survey, Neurocomputing 492 (2022) 278–307. [Google Scholar]

[R4] [4].Szegedy C, Zaremba W, et al. , Intriguing properties of neural networks, in: The International Conference on Learning Representations, 2014. [Google Scholar]

[R5] [5].Goodfellow I, Shlens J, et al. , Explaining and harnessing adversarial examples, in: The International Conference on Learning Representations, 2015. [Google Scholar]

[R6] [6].Suttapak W, Zhang J, Zhang L, Diminishing-feature attack: The adversarial infiltration on visual tracking, Neurocomputing 509 (2022) 21–33. [Google Scholar]

[R7] [7].Kwon H, Lee S, Friend-guard adversarial noise designed for electroencephalogram-based brain–computer interface spellers, Neurocomputing 506 (2022) 184–195. [Google Scholar]

[R8] [8].Ma L, Liang L, A regularization method to improve adversarial robustness of neural networks for ecg signal classification, Computers in Biology and Medicine 144 (2022) 105345. [DOI] [PubMed] [Google Scholar]

[R9] [9].Kim H, Lee C, Upcycling adversarial attacks for infrared object detection, Neurocomputing 482 (2022) 1–13. [Google Scholar]

[R10] [10].Eykholt K, Evtimov I, Fernandes E, Li B, Rahmati A, Xiao C, Prakash A, Kohno T, Song D, Robust physical-world attacks on deep learning visual classification, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1625–1634. [Google Scholar]

[R11] [11].Carlini N, Wagner D, Audio adversarial examples: Targeted attacks on speech-to-text, in: 2018 IEEE security and privacy workshops (SPW), IEEE, 2018, pp. 1–7. [Google Scholar]

[R12] [12].Li J, Ji S, Du T, Li B, Wang T, Textbugger: Generating adversarial text against real-world applications, in: 26th Annual Network and Distributed System Security Symposium, 2019. [Google Scholar]

[R13] [13].Liu X, Zhang J, Lin Y, Li H, Atmpa: attacking machine learning-based malware visualization detection methods via adversarial examples, in: 2019 IEEE/ACM 27th International Symposium on Quality of Service (IWQoS), IEEE, 2019, pp. 1–10. [Google Scholar]

[R14] [14].Madry A, Makelov A, et al. , Towards deep learning models resistant to adversarial attacks, in: The International Conference on Learning Representations, 2018. [Google Scholar]

[R15] [15].Athalye A, Carlini N, Wagner D, Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples, in: International conference on machine learning, PMLR, 2018, pp. 274–283. [Google Scholar]

[R16] [16].Croce F, Hein M, Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks, in: The International Conference on Machine Learning, PMLR, 2020, pp. 2206–2216. [Google Scholar]

[R17] [17].Paknezhad M, Ngo CP, Winarto AA, Cheong A, Beh CY, Wu J, Lee HK, Explaining adversarial vulnerability with a data sparsity hypothesis, Neurocomputing (2022). [Google Scholar]

[R18] [18].Zhang H, Yu Y, othersl, Theoretically principled trade-off between robustness and accuracy, in: The International Conference on Machine Learning, 2019. [Google Scholar]

[R19] [19].Crecchi F, Melis M, Sotgiu A, Bacciu D, Biggio B, Fader: Fast adversarial example rejection, Neurocomputing 470 (2022) 257–268. [Google Scholar]

[R20] [20].Yin S.-l., Zhang X.-l., Zuo L.-y., Defending against adversarial attacks using spherical sampling-based variational auto-encoder, Neurocomputing 478 (2022) 1–10. [Google Scholar]

[R21] [21].Oneto L, Ridella S, Anguita D, The benefits of adversarial defense in generalization, Neurocomputing 505 (2022) 125–141. [Google Scholar]

[R22] [22].Lust J, Condurache AP, Efficient detection of adversarial, out-of-distribution and other misclassified samples, Neurocomputing 470 (2022) 335–343. [Google Scholar]

[R23] [23].Kurakin A, Goodfellow I, et al. , Adversarial examples in the physical world, in: Artificial intelligence safety and security, 2018. [Google Scholar]

[R24] [24].Wang Y, Zou D, et al. , Improving adversarial robustness requires revisiting misclassified examples, in: The International Conference on Learning Representations, 2019. [Google Scholar]

[R25] [25].Wang Y, Ma X, et al. , On the convergence and robustness of adversarial training, in: The International Conference on Machine Learning, 2019. [Google Scholar]

[R26] [26].Sitawarin C, Chakraborty S, et al. , Sat: Improving adversarial training via curriculum-based loss smoothing, in: The 14th ACM Workshop on Artificial Intelligence and Security, 2020. [Google Scholar]

[R27] [27].Cai Q-Z, Liu C, et al. , Curriculum adversarial training, in: International Joint Conferences on Artificial Intelligence, 2018. [Google Scholar]

[R28] [28].Balaji Y, Goldstein T, et al. , Instance adaptive adversarial training: Improved accuracy tradeoffs in neural nets, preprint arXiv:1910.08051 (2019). [Google Scholar]

[R29] [29].Zhang J, Xu X, et al. , Attacks which do not kill training make adversarial learning stronger, in: The International Conference on Machine Learning, 2020. [Google Scholar]

[R30] [30].Ding GW, Sharma Y, et al. , Mma training: Direct input space margin maximization through adversarial training, in: The International Conference on Learning Representations, 2019. [Google Scholar]

[R31] [31].Baytaş İnci M., Deb D, Robustness-via-synthesis: Robust training with generative adversarial perturbations, Neurocomputing 516 (2023) 49–60. [Google Scholar]

[R32] [32].Wang Y, Zhang W, Shen T, Yu H, Wang F-Y, Binary thresholding defense against adversarial attacks, Neurocomputing 445 (2021) 61–71. [Google Scholar]

[R33] [33].Cui J, Liu S, Wang L, Jia J, Learnable boundary guided adversarial training, in: The IEEE/CVF International Conference on Computer Vision, 2021, pp. 15721–15730. [Google Scholar]

[R34] [34].Zhang J, Zhu J, et al. , Geometry-aware instance-reweighted adversarial training, in: The International Conference on Learning Representations, 2020. [Google Scholar]

[R35] [35].Dong Y, Xu K, Yang X, Pang T, Deng Z, Su H, Zhu J, Exploring memorization in adversarial training, in: The International Conference on Learning Representations, 2022. [Google Scholar]

[R36] [36].He L, Ai Q, Lei Y, Pan L, Ren Y, Xu Z, Edge enhancement improves adversarial robustness in image classification, Neurocomputing (2022). [Google Scholar]

[R37] [37].Yu X, Smedemark-Margulies N, Aeron S, Koike-Akino T, Moulin P, Brand M, Parsons K, Wang Y, Improving adversarial robustness by learning shared information, Pattern Recognition 134 (2023) 109054. doi: 10.1016/j.patcog.2022.109054. [DOI] [Google Scholar]

[R38] [38].Jia X, Zhang Y, Wu B, Ma K, Wang J, Cao X, Las-at: Adversarial training with learnable attack strategy, in: The IEEE / CVF Computer Vision and Pattern Recognition Conference, 2022, pp. 13398–13408. [Google Scholar]

[R39] [39].Li T, Wu Y, Chen S, Fang K, Huang X, Subspace adversarial training, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13409–13418. [Google Scholar]

[R40] [40].Li Y, Yang Z, Wang Y, Xu C, Neural architecture dilation for adversarial robustness, in: Ranzato M, Beygelzimer A, Dauphin Y, Liang P, Vaughan JW (Eds.), Advances in Neural Information Processing Systems, Vol. 34, Curran Associates, Inc., 2021, pp. 29578–29589. [Google Scholar]

[R41] [41].Dong M, Chen X, Wang Y, Xu C, Random normalization aggregation for adversarial defense, in: Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A (Eds.), Advances in Neural Information Processing Systems, Vol. 35, Curran Associates, Inc., 2022, pp. 33676–33688. [Google Scholar]

[R42] [42].Cortes C, Vapnik V, Support-vector networks, Machine learning (1995). [Google Scholar]

[R43] [43].Lu Y, Lu J, A universal approximation theorem of deep neural networks for expressing probability distributions, in: Larochelle H, Ranzato M, Hadsell R, Balcan M, Lin H (Eds.), Advances in Neural Information Processing Systems, Vol. 33, Curran Associates, Inc., 2020, pp. 3094–3105. [Google Scholar]

[R44] [44].Paszke A, et al. , Pytorch: An imperative style, high-performance deep learning library, in: Advances in Neural Information Processing Systems, 2019. [Google Scholar]

[R45] [45].Krizhevsky A, Hinton G, Learning multiple layers of features from tiny images, in: Technical report, University of Toronto, Toronto, Ontario, 2009. [Google Scholar]

[R46] [46].Zagoruyko S, Komodakis N, Wide residual networks, arXiv preprint arXiv:1605.07146 (2016). [Google Scholar]

[R47] [47].Netzer Y, Wang T, et al. , Reading digits in natural images with unsupervised feature learning, in: The Conference and Workshop on Neural Information Processing Systems Workshop on Deep Learning and Unsupervised Feature Learning, 2011. [Google Scholar]

[R48] [48].He K, Zhang X, et al. , Deep residual learning for image recognition, in: The IEEE / CVF Computer Vision and Pattern Recognition Conference, 2016. [Google Scholar]

[R49] [49].Chrabaszcz P, Loshchilov I, Hutter F, A downsampled variant of imagenet as an alternative to the cifar datasets, arXiv preprint arXiv:1707.08819 (2017). [Google Scholar]

[R50] [50].Ding GW, Wang L, et al. , Advertorch v0. 1: An adversarial robustness toolbox based on pytorch, preprint arXiv:1902.07623 (2019). [Google Scholar]

[R51] [51].Praveena DM, Sarah DA, George ST, Deep learning techniques for eeg signal applications – a review, IETE Journal of Research 68 (4) (2022) 3030–3037. doi: 10.1080/03772063.2020.1749143. [DOI] [Google Scholar]

[R52] [52].Hong S, Zhou Y, Shang J, Xiao C, Sun J, Opportunities and challenges of deep learning methods for electrocardiogram data: A systematic review, Computers in Biology and Medicine 122 (2020) 103801. doi: 10.1016/j.compbiomed.2020.103801. [DOI] [PubMed] [Google Scholar]

[R53] [53].Han X, Hu Y, Foschini L, Chinitz L, Jankelson L, Ranganath R, Deep learning models for electrocardiograms are susceptible to adversarial attack, Nature medicine 26 (3) (2020) 360–363. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Improving Adversarial Robustness of Deep Neural Networks via Adaptive Margin Evolution

Linhai Ma

Liang Liang

Abstract

1. Introduction

2. Related Work

2.1. Glossary

Table 1:

2.2. Vanilla Adversarial Training

Figure 1:

2.3. The Optimality of Adversarial Training Samples

Figure 2:

2.4. The Margin Overestimation Problem

Figure 4:

3. Method

3.1. Adaptive Margin Evolution (AME) and An Optimal State

Figure 3:

3.2. The Algorithms of Our AME Method

3.2.1. Self-adaptive Step Size for Margin Expansion

Initial step size:

Step size adjustment using sample margin:

Step size adjustment using gradient information:

Figure 5:

3.2.2. Algorithms

Figure 6:

4. Experiments

4.1. Environment Settings

4.2. Comparing with Related Work

Table 2:

Settings for our AME:

Settings for AT [14]:

Settings for MMA [30]:

Settings for FAT [29], TE [35], TRADES [18], GAIRAT [34] and EE [36]:

4.3. CIFAR10 Experiment Settings

4.4. SVHN Experiment Settings

4.5. Tiny ImageNet Experiment Settings

4.6. Adversarial Attacks for Method Evaluation

4.7. Results

Table 3:

Table 4:

Table 5:

Table 6:

Table 7:

Table 8:

4.8. Discussion

5. Conclusion

Acknowledgement

Appendix A. Additional Supportive Results

Appendix A.1. Margin Overestimation

Figure A.7:

Appendix A.2. Margin Distributions of Adversarially-trained Models

Figure A.8:

Figure A.9:

Appendix B. The Value of ρ

Appendix C. Evaluation with More Adversarial Attacks

Table C.9:

Table C.10:

Table C.11:

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Appendix B. The Value of $ρ$