Threat Object-based anomaly detection in X-ray images using GAN-based ensembles

Shreyas Kolte; Neelanjan Bhowmik; Dhiraj

doi:10.1007/s00521-022-08029-z

. 2022 Dec 9:1–16. Online ahead of print. doi: 10.1007/s00521-022-08029-z

Threat Object-based anomaly detection in X-ray images using GAN-based ensembles

Shreyas Kolte ^1,^✉, Neelanjan Bhowmik ², Dhiraj ³

PMCID: PMC9734403 PMID: 36532881

Abstract

The problem of detecting dangerous or prohibited objects in luggage is a very important step during the implementation of Security setup at Airports, Banks, Government buildings, etc. At present, the most common techniques for detecting such dangerous objects are by using intelligent data analysis algorithms such as deep learning techniques on X-ray imaging or employing a human workforce for inferring the presence of these threat objects in the obtained X-ray images. One of the major challenges while using deep-learning methods to detect such objects is the lack of high-quality threat image data containing the “dangerous” objects (objects of interest) versus the non-threat image data in practical scenarios. So, to tackle this data scarcity problem, anomaly detection techniques using normal data samples have shown great promise. Also, among the available Deep Learning Strategies for anomaly detection for computer vision applications, generative adversarial networks have achieved state-of-the-art results. Considering these insights, we adopted a newly proposed architecture known as Skip-GANomaly and devised a modified version of it by using a UNet++ style generator which performed better than Skip-GANomaly, getting an AUC of 94.94% on Compass-XP, a public X-ray dataset. Finally, for targeting better latent space exploration, we combine these two architectures into an Ensemble, which gives another boost to the performance, getting an AUC of 96.8% on the same Compass-XP, a public X-ray dataset. To further validate the effectiveness of ensemble-based architecture, its performance was tested on patch-based training data on a subset of randomly chosen images of another huge public X-ray dataset named as SIXray, and obtained an AUC of 75.3% on this reduced dataset. To demonstrate the prowess of the discriminator and to bring some explainability to the working of our ensemble, we have used Uniform Manifold Approximation and Projection to plot the latent-space vectors for the dangerous and non-dangerous objects of the test-set; this analysis indicates that the Ensemble learns better features for separating the anomalous class from non-anomalous with respect to the individual architectures. Thus, our proposed architecture provides state-of-the-art results for threat object detection. Most importantly, our models are able to detect threat objects without ever being trained on images containing threat objects.

Keywords: Generative adversarial networks (GANs), Anomaly detection, GANomaly, Skip-GANomaly, Ensemble of GANs, X-ray images, Threat-object detection, Compass-XP, SixRay

Introduction

Anomaly Detection poses the following problem statement: Given a set of data points/observations in a vector space, the task is to detect the outlier points or those points which, with some metric, deviate from “normal” behavior. Several traditional techniques have been employed for anomaly detection, starting from simple statistical techniques like the median of the data points or trimmed-mean methodologies and going to Density-based techniques, one-class Support Vector Machines, etc. However, most of these techniques are either purely unsupervised techniques or purely machine learning techniques, not involving deep learning or other sophisticated methodologies with inherent feature extraction options.

The two major areas where Anomaly Detection in computer vision is required are Security applications and Industrial applications. In security-related applications, the task is to detect threat objects in luggage carried in airports or other offices, firms, and institutions, whereas, in the Industrial domain, the task is to detect damaged pieces or incorrectly manufactured goods in factories.

In the field of security applications, numerous methods have been developed involving Convolutional Neural Networks, which mainly focused on the problem as an image classification problem with the labeled training data set consisting of images of both threat and non-threat objects. A major drawback of these models is the scarcity of threat object instances in the available datasets, i.e., Bags/luggage containing such threat objects is extremely rare as compared to those which do not contain such objects. Also, the supervised class of deep learning methods needs a large amount of class-balanced labeled data in order to train the model effectively.

To handle this problem of nonavailability of threat data, deep neural networks such as Generative Adversarial Network (GAN) [1]-based methodologies are developed where it is possible to define a training set using healthy samples containing only the benign object instances, i.e., non-threat objects such that the model will learn the feature space of normal data samples and all threat objects if get encountered in test data will be considered anomaly instance by the model. We discuss GANs and Anomaly detection methods based on them in the following section.

The major contributions of this paper are the following:-

$^{*}$: Development of a modified architecture based on existing GAN-based anomaly detection architecture for detecting X-Ray images containing threat objects in airport security.
$^{*}$: Development of an Ensemble architecture for detecting X-Ray images containing threat objects in airport security. The developed ensemble-based architecture achieves a state-of-the-art result on the Compass-XP dataset, nearly equaling the performance of human annotators on the dataset and also proves to be effective in detecting threat instances on another much complex SIXray dataset.

The paper is organized into the following sections:-

$^{*}$: Section 2 describes the related work literature on threat detection for detecting anomalies in X-Ray Images.
$^{*}$: Section 3 describes the 2 datasets used for testing some of the existing methods and our proposed methods.
$^{*}$: Section 4 describes existing methodologies and how we developed new architectures based on these methodologies.
$^{*}$: Section 5 describes the results of the developed models on two security datasets and their discussions.
$^{*}$: Section 6 consists of the conclusion of the paper along with ideas for future work.

Related work

Generative Adversarial Networks (GANs) [1], introduced in 2014 has the following basic paradigm: There are 2 networks called as, a Generator and a Discriminator. The Generator network is trained to generate data that mimics real data (e.g., images, text, etc.). The Discriminator is trained to distinguish between data generated by the generator and real data. The two networks are made to compete against each other in the training phase (hence the term “Adversarial Networks”). The Generator tries to maximize a certain metric of similarity between the data it generates and the real data, while the Discriminator tries to minimize a certain loss function based on how well it could distinguish between real data and data generated by the discriminator. Also worth noting is that the Generator receives a input as a vector Z from a Vector Space (known as the Latent Space) and generates data using this input vector, and the Discriminator obviously receives the particular instances of real and Generator-generated data. Hence, when a Generator’s training is completed, the latent space contains a distribution of data that would be close to the distribution of the real data that was fed into the discriminator. All of these aforementioned features of GANs can be used in several ways to suit a particular application. Equation (1) shows the goal of the GAN during training.

\begin{matrix} \begin{matrix} min_{G} max_{D} (E_{x \sim p_{data}} (x) [l o g (D (x))] \\ + E_{z \sim p_{z}} (z) [l o g (1 - D (G (z)))]) \end{matrix} \end{matrix}

In 2017, the model AnoGAN [2] was introduced as one of the first attempts to use GANs for Anomaly Detection in Image Data. The basic methodology was the following: To train the GAN on “healthy,” i.e., non-anomalous images. When an image is fed to the network, the first step is to find the best possible encoding for the image in the Latent Space using an Iterative Algorithm. Once the encoding is found, the Generator constructs an image based on the encoding. Finally, the generated image and the original image that was fed to the encoder are passed to the Discriminator, which tries to distinguish between the two; Also note that the Discriminator is trained to maximize a certain “difference score” between the original and the generated image. The training aims to Train the Generator to produce images that are indistinguishable from actual healthy images (i.e., the Discriminator should fail to distinguish between the produced and actual images). Thus, the Generator learns the distribution of “healthy” images and this is encoded into the Latent Space. Now, when an unseen image is fed to the generator, it will be encoded into the latent space and will be constructed as if it belonged to the distribution of the “healthy” (non-anomalous) images. The Discriminator then receives both these images and finds out the “difference score” between the images. If the difference score is greater than a certain threshold, the image is classified as anomalous based on the fact that it significantly deviates from the distribution of “healthy” images as learned by the Generator. The Generator is a DC-GAN decoder and the Discriminator is a Convolutional Classifier. The technical goal for the model is to learn the distribution, {X} of the “healthy” images, i.e., given a “healthy” image with latent space encoding {x}, to find the best latent space vector {z} in the latent space {Z} that, when fed to the generator, would allow the generator to generate an image most similar to the “healthy” image and closest to {x}. The algorithm is to optimize (using gradient descent) over K steps by iterating over $k = 1, 2, \dots, K$ to find the best-generated image (by finding the best latent space vector as described before). The best image generated will minimize the Residual Loss as mentioned in Eq. (2):-

\begin{matrix} \begin{matrix} L_{R} (z_{k}) = \sum_{k = 1}^{K} | x - G (z_{k}) | . \end{matrix} \end{matrix}

An important point worth noting here is the method used for finding the encoding, that is the Iterative Method, was slow. This greatly increased the Inference time for finding whether an image is anomalous. This led to the introduction of f-AnoGAN, a more efficient architecture that helped reduce the inference time by upgrading the DC-GAN in ano-GAN to a Wasserstein-GAN and adding a Convolutional Auto-encoder to map the image to the latent space (i.e., the iterative process is replaced by a learned mapping).

The next methodology introduced was EGBAD (Efficient GAN-Based Anomaly Detection) [3], a Bi-GAN [4]-based architecture that eliminated the Iterative Process and replaced it with an Encoder-Decoder-based architecture which encoded images to map to a vector in a Latent Space and a Decoder which reconstructs the Image. The Discriminator receives as an input two vectors Z1 and Z2 from the latent space. Z1 corresponds to the encoding of the real image and Z2 corresponds to that of the Generator-generated Image. The Discriminator does not have this information and is thus expected to find out which vector corresponds to which image. The most important advantage of EGBAD over AnoGAN was the huge improvement in Inference Time.

Following this, GANomaly [5] was introduced. This method consists of a GAN along with an Adversarial Autoencoder and has a pipeline similar to that of EGBAD. An Image would pass through the Encoder-Decoder architecture and thus it’s Latent Space Vector Z will be produced along with the reconstructed Image. Then, an adversarial auto-encoder would produce the Latent Space Vector of the Reconstructed Image. These inputs will then be used in training the Generator The Encoder-Decoder Network as well as the Discriminator (which is the same as that of DCGAN [6]). The model when tested on datasets like MNIST and CIFAR-10 produced better results than both AnoGAN and EGBAD while also having a reduced Inference time than EGBAD and much faster than AnoGAN. Later on, Skip-GANomaly [7] was introduced. This model removed the extra adversarial encoder present in GANomaly, and instead used a U-Net [8] style Generator with Skip-Connections between the Encoder and Decoder of the Generator, and again had the same DCGAN-based Discriminator. The Anomaly Score introduced by Skip-GANomaly is shown in Eq. (3) as follows:-

\begin{matrix} \begin{matrix} A = λ * R e c + (1 - λ) * L a t \end{matrix} \end{matrix}

Here, Rec stands for the Reconstruction Score, which is simply a pixel-by-pixel difference of the Original (Input) image of the generator and the reconstructed image, while Lat stands for a value derived from the difference in the Latent Space Vectors of the two images. The parameter $λ$ has been derived from experimentation and the best value found out was $λ$ = 0.9. The final prediction of whether or not an object is anomalous is determined by finding the threshold that gives the largest AUC under the ROC.

Other methods which tackled related problems include ComboGan_Xray [9] and Ensemble GAN models [10] for Anomaly Detection. ComboGan_Xray [9] worked on different combinations of generators and discriminators from different GAN architectures, e.g., An Auto-encoder acting as the Generator plus(+) the Discriminator of DCGAN (dubbed the “AE+DCGAN” network) and a network with the Generator of BiGAN and the Discriminator of $α$ -GAN (dubbed the “BiGAN + $α$ -GAN” network. These networks were tried on X-ray Images of human hands for anomaly detection and yielded better results than vanilla GANs (i.e., GANs of pure architecture rather than combinations from different architectures). Hence, ensemble_GAN [10] introduced the methodology of training and using multiple GANs of the same architecture for anomaly detection.

MRI-GAN [11] and 3D-MRIGAN [12] focused on anomaly detection in MRI and 3D MRI Images, respectively, and were based on AnoGAN and f-AnoGAN, respectively. More recently, Jensen et. al. [13] developed transfer learning strategies using pre-trained convolution neural networks for feature extraction from a custom-made dataset of 16-bit and 8-bit fuel-cell X-ray images for the detection of 11 classes of anomalies. They used balanced accuracy as the metric of model evaluation.

Among the most recent methods involved in Anomaly detection using GANs or other forms of adversarial networks are RANDGAN [14], WeaklyAD [15], and AnoSeg [16]. RANDGAN was implemented for the detection of COVID-19 from chest X-ray images; the unknown (anomalous) class was the Covid-19 class of images and used transfer-learning-based image segmentation as a pre-processing step. WeaklyAD is a spectral-constrained GAN for hyperspectral anomaly detection, i.e., the model is trained to detect anomalies in an image by generating images with homogenization of background (non-anomalous class) and anomaly saliency. AnoSeg focused on the task of anomaly segmentation for detecting defects in large-scale industrial manufacturing processes using 3 novel techniques combined: hard augmentation, self-supervised learning, and pixel-wise adversarial losses.

Datasets

The primary aim of adopting the threat detection algorithms is to raise the alarm on every instance of threat item in the scanned image data and the performance of such algorithms majorly depends on the quality of data used for their training. However, the availability of threat item instances is very limited due to the lesser chances of their occurrence in the normal scene as compared to nondangerous (benign) item instances. So, training a threat detection algorithm using a skewed dataset having fewer threat item instances and more normal instances may lead to poor detection performances or suffers from biasing effects of abundant class instances. So, in order to deal with this data skewness, the threat item detection algorithms can be trained using normal data instances and any threat item instance will therefore be treated as an anomaly by the trained model thereafter. Two such public datasets have been explored where there are sufficient cases of normal object instances and thus the model can be trained using only the normal object samples and their performance can be evaluated after the model gets trained by the instances of threat item class.

Compass-XP

We have used the COMPASS-XP Dataset [17], which consists of X-Ray images of Luggage Bags with one Item in each bag, for our models. There are 6 types of Images in the dataset: Original Photograph, Gray-Scale, False Duo-Color, and three other types based on different densities. In totality, there are 1901 unique images for each of the 6 types (Thus, making it a total of 11,406 images). Each of these images belongs to a particular class and is labeled Dangerous or Non-dangerous based on whether the class of objects is prohibited or non-prohibited to be carried in Airplanes, etc. There are 334 Non-dangerous classes (examples of such classes include cardigans and other clothes, torches, etc.) and 35 dangerous classes (examples include lighters, knives, etc.). The total “Dangerous” Images are 258 while the total “Non-Dangerous” images are 1643. The research paper [17] which introduced this dataset also tried several methods for detecting and classifying dangerous objects in the dataset including several Image Processing and Machine Learning methods. Their best-achieved result was a median AUC of 81-83%, achieved using Deep-Learning Methods (Convolutional Neural Networks for Binary Classification).

It is also worth noting that [17] had divided the Compass-XP dataset into a 4:1 training and testing ratio with both training and testing set consisting of Dangerous and Non-Dangerous objects when training their Deep-Learning models for achieving their best results. Also worth noting is the fact that human experts were also tested on the dataset, and their predictions achieved an average AUC of 97%. Some sample images belonging to the normal (i.e., non-dangerous) classes from this dataset are shown in Fig. 1.

Fig. 1 — Nonthreat category Sample Images from the CompassXP dataset

SIXray

The SIXray [18] dataset contains 10,59,231 X-ray images from six common categories of prohibited items, namely, gun, knife, wrench, pliers, scissors, and hammer. Unlike the Compass-XP dataset, the SIXRay dataset has large images with multiple objects present in a single bag.

Due to computational constraints in terms of available memory of GPUs, we selected a subset of randomly selected nonthreat and images containing threat objects from the dataset. We then generated a dataset by extracting random patches of size 256 $\times$ 256 from the images to generate a total of 90k non-threat patch-based images and 10k patch-based images containing threat objects. The 90k images were divided into subsets of size 80k and 10k, respectively, for the training and the test set. Thus, our training set consisted of 80k nonthreat images and the testing set consisted of 10k images containing threat objects and 10k nonthreat images. We used this dataset for training all the methodologies, i.e., Skip-GANomaly, modified Skip-GANomaly and their Ensemble-based architectures. Some sample images belonging to the normal (i.e., non-dangerous) classes from this dataset are shown in Fig. 2.

Fig. 2 — Nonthreat Patch Image Samples from the SIXray dataset

Methodologies

Among the state-of-the-art implementations of GANs for Anomaly detection, we narrowed down our initial studies to two architectures: GANomaly [5] and SkipGANomaly [7]. After some initial comparative experimentation on these methods (described in Sect. 3.2), we decided to move forward with SkipGANomaly as the base model for developing the modified version of it inorder to obtain improved performance. Sections 4.3, 4.4, and 4.5 describes the SkipGANomaly method, Modified SkipGANomaly (proposed), and an Ensemble Network of these two methodologies, respectively. Before this, however, we proceed to describe our training conditions on the two datasets (Compass-XP and SIXray) in Sect. 4.1.

Approaches and Dataset Details

Due to the adopted training methodologies, the models needed no “dangerous” (positive) class images for the purpose of training. This adopted training methodology has dual benefits first it removes the requirement of obtaining the training sample data for all the classes which is very difficult in this case and second the intraclass variance in the samples of the majority class maximum benefits the model training process.

Compass-XP dataset

Our Training conditions for the Compass-XP dataset were as follows: The Test set consisted of all of the 258 “Dangerous” images along with 258 “Non-Dangerous” images, where the “Non-Dangerous” images were one each from 258 randomly chosen “Non-Dangerous” classes. The rest of all the Non-Dangerous images were put into the training set. Since the training set size was not significantly larger than the Test-Set size, we augmented the training set with transformation techniques such as horizontal and vertical flips and rotation, such that not only the training set gets augmented, but also the bias created by the higher representation of some classes with respect to others would be removed. Thus, we increased the training set size from 1643 unique images to about 10,500 images.

SIXRay dataset

Since the SIXRay dataset was already large enough as compared to Compass-XP, no augmentation was needed for the training set. It consists of 100k images, out of which 10k were of the dangerous categories (i.e., they contained an object from one of the six dangerous categories mentioned in Sect. 3.2) and the remaining 90k were nonthreat images. Hence, a random and approximately equal subset of 10k images was chosen from this set of 90k images to be part of the test set and the remaining 80k images were chosen to serve as the training set.

Initial experiments

We experimented with the two most recently introduced models, GANomaly and Skip-GANomaly for the detection of Dangerous objects in X-ray images. For comparing which of the two is better, we performed some initial experiments with GANomaly and Skip-GANomaly being trained and tested on different batch sizes (Batch Normalization was not used, the batch sizes were changed simply based on computing environment constraints) and different Image sizes (All input images would be resized to a particular image size) are summarized in Table 1. To find the best suited batch size, we trained the models for fixed number of epochs for all the experiments. The values in Table 1 represents the AUC of the ROC.

Table 1.

Initial experiments for performance comparison of GANomaly and Skip-GANomaly

Image size, batch size	GANomaly AUC	Skip-GANomaly AUC
32 $\times$ 32, 256	0.478	0.487
64 $\times$ 64, 64	0.493	0.512
128 $\times$ 128, 16	0.538	0.589
256 $\times$ 256, 4	0.594	0.710
512 $\times$ 512, 1	0.634	0.789

Open in a new tab

From this, we concluded that the Skip-GANomaly performs significantly better than GANomaly on this dataset. We thus dedicated further efforts to training Skip-GANomaly for getting the best possible results.

Skip-GANomaly (U-Net-based generator)

As stated earlier in the Introduction section, Skip-GANomaly has a U-Net-based generator. We trained the model on the dataset for a larger number of epochs (35–40 epochs). We again tried varied batch sizes and image sizes. The results are summarized in Table 2.

Table 2.

Initial experiments for determination of correct batch-size and image-size combination

Image size, batch size	AUC
256 $\times$ 256, 8	0.742
256 $\times$ 256, 16	0.948
256 $\times$ 256, 32	0.715
512 $\times$ 512, 8	0.804
52 $\times$ 512, 16	0.795

Open in a new tab

Bold values indicate the highest AUC achieved by a model

As is evident from Table 2, we used 2 Image Sizes. These were chosen based on the fact that most of the images (greater than 95%) were of the size range of 200–600 pixels in one dimension and 400–900 in the other dimension. Also, the fact that the images contained only a single object meant that slight reductions in size would lead to little or no loss in the features. We used different batch sizes (Batch normalization was used here), and obtained different results. The best results were obtained on an image size of 256 $\times$ 256 and a batch size of 16. Whenever an increase in batch size gave improved results, we tried increasing the batch size. We chose an initial batch size of 8 because of the fact that when constructing the training set, a large fraction of the images were augmented 8 times.

Skip-GANomaly with a modified generator

Skip-GANomaly [7] as shown in Fig. 3 uses a UNet-based generator. The UNet++ [19] architecture is a modified version of UNet, primarily consisting of multiple Nested UNet models of several sizes, created by modifying the skip-pathways. By adopting the UNet++ based generator, a modified SkipGANomaly based architecture as shown in Fig. 4 was designed. This allowed the following two advantages: The foremost aim was reducing the semantic gap between the encoder and decoder sub-networks of UNet, and a secondary aim of being allowed to use multiple levels of features for predicting the final, thus allowing fast inference from obtained features for predicting the segmentation. Of these aims, our major inspiration behind selecting UNet++ to build a modified generator was the same as the foremost aim, i.e., to reduce the Semantic Gap between features learned by the encoder and decoder sub-networks.

The method achieved a slight improvement over the original Skip-GANomaly. Fig. 4 illustrates the modified architecture. In this figure, “x” represents the input image for training/inference, and the italicized “x” is the reconstructed image from the generator.

Ensemble network

The modified generator performed slightly better than the original Skip-GANomaly architecture. However, to produce even better results, we constructed an ensemble of the 2 architectures for finding out the anomalous objects. The ensemble consists of the two generators (The generator of the original SkipGANomaly and the generator of the modified architecture, i.e., the UNet++ architecture) and the two discriminators. Each generator can be connected to each discriminator, as can be seen in Fig. 5.

Fig. 5 — High-level view of the ensemble architecture

The reason we chose to form an ensemble for improving the AUC is because of the successful results of [10], where ensembles of a uniform individual architecture were trained and improved results were obtained. However, our ensemble consists of 2 different types of generators, and also we first load each component of the ensemble with the pre-trained weights of the individual architectures (hence justifying the need for 2 discriminators instead of 1). Another reason is that when the discriminators receive inputs from both these generators, their ability to map images to the latent space is boosted. This will be demonstrated by our Uniform Manifold Approximation and Projection (U-MAP) analysis in Sect. 5.

As we were initializing the ensemble with the pre-trained weights of the trained individual architectures, we have used a smaller training set (removing all augmented images at 45° rotation) of the size of about 5000 images for Compass-XP. During training, the training set is first divided into batches of images. Then, for each batch, a generator and a discriminator network are randomly chosen from the 2 choices available for each. Then, these chosen networks are trained. The algorithm for testing is also similar. The Algorithm 1 shown describes the training of the ensemble.

For the Testing method, the test set remained the same as before. We also reduced the learning rate to $10^{- 6}$ , so that the training would be stable but slow. The model achieved stable performance after 5 epochs of training. The final AUC achieved was 96.8%, significantly better than the modified architecture. Algorithm 2 shown describes the testing method.

The block level representation of the proposed ensemble-based architecture is shown in Fig. 5.

Results and analysis

As adopted by [5, 7, 10], we have also chosen Area under Curve (AUC) as the main parameter of evaluation for all our methodologies. Apart from this, to demonstrate the superiority of the features learned by the ensemble, we selected the U-MAP methodology to visually analyze the latent space vectors computed by the discriminator. To explain briefly, UMAP is an algorithm for dimension reduction based on manifold learning techniques and ideas from topological data analysis. It provides a very general framework for approaching manifold learning and dimension reduction, and can also provide specific concrete realizations. The points plotted in the scatterplot are the arithmetic means of the features of batches of images. The label “0” (blue) indicates anomalous images taken from the test set and the label “1” indicates their corresponding reconstructed images. We chose U-MAP over t-SNE and Principle Component Analysis (PCA) because PCA only captures Linear features/components, whereas t-SNE is a better visualization algorithm rather than one which finds actually useful components. Hence, U-MAP captures better components than t-SNE does.

Summary of results of models on Compass-XP and SIXRay

Shown in Table 3 is the results achieved by the 3 architectures on the 2 datasets on which experiments were performed. As mentioned earlier, for SIXray dataset, the whole dataset was not used for training purpose due to the limited availability of computation resources. So, a subset of randomly chosen 80k nonthreat images was chosen from the full nonthreat dataset of SIXray and from this patches of images were obtained which forms the final training data. All the three models were then trained using this set of generated training dataset.

Table 3.

Final results on the 3 architectures and 2 datasets

Model	Compass-XP AUC (%)	SIXRay AUC (%)
SkipGANomaly	94.8	66.7
Modified SkipGANomaly	94.94	69.2
Ensemble	96.8	75.3

Open in a new tab

Bold values indicate the highest AUC achieved by a model

From the above results, we wish to point out the following:- 1. The modified SkipGANomaly architecture performs better than the original one for both the datasets: We attribute this to more richly learned features by the UNet++ based generator. This can be observed, especially for the more challenging dataset SIXRay, where the improvement over the original SkipGANomaly is more significant.

2. The ensemble performs significantly better than the individual networks: For both datasets, the ensemble brings a significant performance boost, especially with Compass-XP where it nearly equals the AUC achieved by human annotators (96.8% by ensemble vs. 97% by humans).

Training plots, UMAP and generated images on Compass-XP

In this section, we display some of the images generated by the 3 architectures that we tried, the training plots of the 3 architectures, and a U-MAP [20] analysis of the feature vectors (i.e., the latent space vectors) of the images taken in batches.

The original SkipGANomaly architecture

Figure 6 shown represents some of the images generated by the architecture. The images look promising in terms of object similarity to the real ones.

Fig. 6 — Some sample images generated by the basic Skip-GANomaly architecture

The following (Fig. 7) is the training curve for SkipGANomaly. The “Best-AUC” curve indicates the Best AUC recorded yet.

Figure 8 depicts the UMAP plot for this architecture. Clearly, there is no pattern whatsoever between the anomalous objects’ features and their reconstructions’ features.

Fig. 8 — UMAP plot on the Compass-XP dataset for the original Skip-GANomaly architecture. The red colored dots (class 0) indicate the nonanomalous (nonthreat) class and the blue colored dots (class 1) indicate the anomalous (threat object) class (colour figure online)

SkipGANomaly with modified generator

The following (Fig. 9) is the training plot for the Modified SkipGANomaly architecture. Note that the learning curve is smoother than the one for the basic architecture.

The following (Fig. 10) is some images generated by the Modified Architecture. The generated images are very close to the real-life objects.

The following (Fig. 11) is the UMAP plot for the Modified Architecture. Note again that there is no pattern for the points.