Trustworthy and Reliable Deep Learning-based Cyberattack Detection in Industrial IoT

Fazlullah Khan; Ryan Alturki; Md Arafatur Rehman; Spyridon Mastorakis; Imran Razzak; Syed Tauhidullah Shah

doi:10.1109/tii.2022.3190352

. Author manuscript; available in PMC: 2024 Jan 1.

Published in final edited form as: IEEE Trans Industr Inform. 2022 Jul 13;19(1):1030–1038. doi: 10.1109/tii.2022.3190352

Trustworthy and Reliable Deep Learning-based Cyberattack Detection in Industrial IoT

Fazlullah Khan ¹, Ryan Alturki ², Md Arafatur Rehman ³, Spyridon Mastorakis ⁴, Imran Razzak ⁵, Syed Tauhidullah Shah ⁶

PMCID: PMC10353731 NIHMSID: NIHMS1849065 PMID: 37469712

Abstract

A fundamental expectation of the stakeholders from the Industrial Internet of Things (IIoT) is its trustworthiness and sustainability to avoid the loss of human lives in performing a critical task. A trustworthy IIoT-enabled network encompasses fundamental security characteristics such as trust, privacy, security, reliability, resilience and safety. The traditional security mechanisms and procedures are insufficient to protect these networks owing to protocol differences, limited update options, and older adaptations of the security mechanisms. As a result, these networks require novel approaches to increase trust-level and enhance security and privacy mechanisms. Therefore, in this paper, we propose a novel approach to improve the trustworthiness of IIoT-enabled networks. We propose an accurate and reliable supervisory control and data acquisition (SCADA) network-based cyberattack detection in these networks. The proposed scheme combines the deep learning-based Pyramidal Recurrent Units (PRU) and Decision Tree (DT) with SCADA-based IIoT networks. We also use an ensemble-learning method to detect cyberattacks in SCADA-based IIoT networks. The non-linear learning ability of PRU and the ensemble DT address the sensitivity of irrelevant features, allowing high detection rates. The proposed scheme is evaluated on fifteen datasets generated from SCADA-based networks. The experimental results show that the proposed scheme outperforms traditional methods and machine learning-based detection approaches. The proposed scheme improves the security and associated measure of trustworthiness in IIoT-enabled networks.

Keywords: Deep learning, Trustworthiness, Supervisory Control, Data Acquisition Networks, Cybersecurity, Industrial Internet of Things

1. Introduction

The Industrial Internet of Things (IIoT) is a pervasive network that connects a diverse set of smart appliances in the industrial environment to deliver various intelligent services. In IIoT networks, a significant amount of industrial control systems premised on supervisory control and data acquisition (SCADA) are linked to the corporate network through the Internet [1]. Typically, these SCADA-based IIoT networks consist of a large number of field devices [2], for instance, intelligent electronic devices, sensors and actuators, connected to an enterprise network via heterogeneous communications [3].This integration provides the industrial networks and systems with supervision and a lot of flexibility and agility [2–4], resulting in greater production and resource efficiency. On the other hand, this integration exposes SCADA-based IIoT networks to serious security threats and vulnerabilities, posing a significant danger to these networks and the trustworthiness of the systems [5]. The trustworthiness of an IIoT-enabled system ensures that it performs as expected while meeting a variety of security requirements, including trust, security, safety, reliability, resilience, and privacy [6–8]. Fig. 1 depicts the fundamental aspects of trustworthiness in an IIoT-enabled network. The basic goal of the IIoT-enabled system is to increase trustworthiness by safeguarding identities, data, and services, and therefore to secure SCADA-based IIoT networks from cybercriminals [8, 9].

Fig. 1: — Security and Trustworthiness goals and Confidentiality-Integrity-Availability triad.

Several protocol updates have been proposed to meet this purpose, including the distributed network protocol (DNP 3.0) [10]. However, it covers authentication and data integrity aspects only, leaving numerous holes for attackers to use known flaws like hash collision to carry out serious attacks [11]. Information Technology and Industrial Operational technology bodies build a typical risk management plan utilising ISO 27005:2018 [10] to recognise, rank, and implement alleviation techniques in automated or semiautomated enterprises. A comprehensive risk management plan and adequate preventive measures may not ensure absolute security against growing risks and attacks. This consequently offers a difficult research challenge for industrial and cybersecurity control researchers to 1) obtain the maximum degree of attack detection. 2) Report malicious behaviour as soon as it appears. 3) Isolate the afflicted subsystems as soon as possible. In recent years, there has been a surge toward the utility of artificial intelligence (AI) methods in evolving cybersecurity approaches, including attack prediction [12], privacy preservation [13], forensic exploration [14] and malware disclosure [15]. Deep learning (DL) is an AI approach that incorporates better learning models with considerable success in various disciplines [16]. However, designing a reliable and trustworthy AI, particularly a DL-based cyberattack detection model for the IIoT platforms, remains a research problem.

By considering the limitations of previous techniques, we employ network attributes of industrial protocols and propose a Pyramidal Recurrent Unit (PRUs) and Decision Tree (DT) based ensemble detection mechanism. The proposed mechanism has the potential to detect cyberattacks in any extensive industrial network. The interoperability with other detection engines and expandability for a wider industrial network with multiple areas distinguishes the proposed mechanism from previous studies. The proposed detection method is disseminable across many IIoT domains. Furthermore, our model is straightforward to implement and deploy and can improve efficiency and accuracy while overcoming the shortcomings of previous efforts. The following capabilities can characterize the novelty and contribution of our work.

We propose a scalable and efficient DL and DT-based ensemble cyber-attack detection framework to resolve trustworthiness issues in the SCADA-based IIoT networks.
We present an efficient probing approach by the SCADA-based network data to solve the protocol mismatch limitations of traditional security solutions for the IIoT platform.
A statistical analytic approach for ensuring the trustworthiness and reliability of the proposed model for SCADA-based IIoT networks.

The rest of the paper is organized as follows. In Section 2, we have discussed the basics of problem formulation. In Section 3, we have given details of our proposed work, followed by the results and discussion in Section 4. Finally, the paper is concluded in Section 5.

2. Preliminaries and Methods

In this paper, we follow the real-world settings [17] of cyberattacks on an industrial control system (ICS). Through these settings, we leverage the datasets from the power control system [18] for detecting industrial cyberattacks. Fig. 2 illustrates the overall architecture of a SCADA-based industrial control network. It is made up of various layers, including a processing and central master control layer, a physical layer, and a corporate layer, all of which are formed in a hierarchical order.

2.1. Datasets

The physical layer, as indicated in Fig. 2, contains various equipment such as breakers (BR1−BR4), intelligent electronic devices, power generators (G1, G2) and programmable logic controllers. The lowest physical layer collects sensor-based data and is used by the local control logic to make control decisions before transmitting it to the devices. They also get instructions from the top or master control/process layers, which also are responsible for managing and keeping track of the remote physical devices and local control layer devices. They are also equipped with intrusion detection systems (IDS). The corporate layer aids business operations and launches management declarations to the master control layer. In this paper, we adopt the 15 benchmark datasets obtained from the SCADA power system¹ to identify and detect different kinds of attacks. The intrusion attacks on the SCADA system are detected using two separate classification events. The binary classification events, comprising 37 events, are divided into 28 attacks and nine normal events. The other is the multiclass classification events, encompassing 37 different events, such as natural events, regular events, and attack events, each with its own set of class labels.

Each of the 15 datasets has thousands of distinct attacks. The datasets are randomly sampled at 1% to decrease the influence of a small sample size. Accordingly, there are 3711 attack-event samples, 1221 natural-event samples and 294 no-event samples.

2.2. Problem Formulation

Assume a dataset $D = {(x_{1}, y_{1}), \dots, (x_{n}, y_{n})}$ with training examples, where x_i indicates a vector of real or discrete values. Further, these values represent the features of vector x_i, expressed as $〈 x_{i 1}, x_{i 2}, x_{i 3}, \dots, x_{i m} 〉$ . x_ij represents the jth feature of any given vector x_i. In contrast, the values of y_i are of dual nature. One type indicates binary classification, while the other consists of classes ${1, \dots, K}$ , representing multi classification. Different from that, the second type includes real values, representing regression. In a nutshell, given a training dataset D with E examples, the goal is to train a learning algorithm, which can produce a classifier output T. The classifier T indicates a hypothesis in the means of a true function, expressed as $f (x_{i}) = y_{i}$ that predicts new values for y_i every given value of x_i.

3. The Proposed Model

3.1. Pyramidal Recurrent Unit models

Deep PRUs [19] are deep learning models used to manipulate sequential data. Fig. 3 provides an overview of the cell structure of a PRU cell. The PRU comprises several cells, each with three major layers: the forget gate, input gate, and output gate. Also, PRU applies Pyramidal transformation to the input vector and use a Grouped linear transformation to the context vector. Then, they combine them under the umbrella of PRU and feed it as input to the LSTM cell.

3.1.1. Pyramidal transformation (PR) for input

Instead of linearly transforming a given input vector x to an output vector y as $y = F_{L} (x) = W . x$ , where $W \in ℝ^{N \times M}$ is the weight matrix ( $x \in ℝ^{N}$ to $y \in ℝ^{M}$ ), PR subsample it into K pyramidal levels to obtain various representations with different scales. The subsampling propagates K vectors as

x^{k} \in ℝ^{\frac{N}{2^{k - 1}}}

(1)

where $2^{k - 1}$ denotes the sampling rate and $k = {1, \dots, K}$ . For each $k = {1, \dots, K}$ , the PR learns a scale-specific transformation as

W^{k} \in ℝ^{\frac{\frac{N}{2^{k - 1}} \times \frac{M}{K}}{1}}

(2)

Then, PR concatenates the transformed sub-samples to get the pyramidal output $\bar{y} \in ℝ^{M}$ as

\bar{y} = F_{P} (x) = [W^{1} \cdot x^{1}, \dots, W^{K} \cdot x^{K}]

(3)

where [·, ·] denotes the concatenation operation. Given an input vector x, PR sub-sample it using a kernel k with 2e + 1 elements as

x^{k} = \sum_{i = 1}^{N / s} \sum_{j = - e}^{e} x^{k - 1} [s i] κ [j]

(4)

where s denotes the stride operation while $k = {2, \dots, K} .$

3.1.2. Grouped linear transformation (GLT)

GLT breaks down the traditional linear transformation through factoring in two parts. First, given the input vector $h \in ℝ^{N}$ , GLT split it into g smaller groups as

h = {h^{1}, \dots, h^{g}}, \forall h^{i} \in ℝ^{\frac{N}{g}} .

(5)

Then, through a linear transformation $F_{L} : ℝ^{\frac{N}{g}} \to ℝ^{\frac{M}{g}}$ , GLT transforms hⁱ into $z^{i} \in ℝ^{\frac{M}{g}}$ for each $i = {1, \dots, g}$ . The final output vector is then formed by concatenating the resulting g output vectors zⁱ as

\bar{z} = F_{G} (h) = [W^{1} \cdot h^{1}, \dots, W^{g} \cdot h^{g}]

(6)

3.1.3. Pyramidal Recurrent Unit (PRU)

Pyramidal Recurrent Unit is created by extending the vanilla LSTM architecture using the pyramidal and the grouped linear transformations described above. At a given time t, PRU combines both input and context vectors through a transformation function using Eq. 7.

{\hat{G}}_{v} (x_{t}, h_{t - 1}) = {\hat{F}}_{P} (x_{t}) + F_{G} (h_{t - 1})

(7)

where $v \in {f, i, c, o}$ indicates the forget, input and output gates of the vanilla LSTM. ${\hat{F}}_{P} (\cdot)$ denotes the pyramidal while $F_{G} (\cdot)$ represent the grouped linear transformations. The resultant $G_{v}$ in then feeded to the vanilla LSTM architecture to model PRU. Specifically, a PRU cell takes $x_{t} \in ℝ^{N}$ , $h_{t - 1} \in ℝ^{M}$ and $c_{t - 1} \in ℝ^{M}$ at a given time t as input and generate the forget gate signal as

f_{t} = σ ({\hat{G}}_{f} (x_{t}, h_{t - 1}))

(8)

The forget gate is in charge of removing each cell’s prior information. The input and content gates, which updates cell information is then calculated as

i_{t} = σ ({\hat{G}}_{i} (x_{t}, h_{t - 1})) {\hat{c}}_{t} = \tanh ({\hat{G}}_{c} (x_{t}, h_{t - 1}))

(9)

Similarly, the output gate is calculated as

o_{t} = σ ({\hat{G}}_{o} (x_{t}, h_{t - 1}))

(10)

Context vector and cell state are then generated by combining the inputs with these gate signals as

c_{t} = f_{t} \otimes c_{t - 1} + i_{t} \otimes {\hat{c}}_{t} h_{t} = o_{t} \otimes \tanh (c_{t})

(11)

Where ⊗ is the element-wise multiplication, σ represents the sigmoid while tanh denotes the tangent activation function. In general, PRU cells are composed of only one layer. However, increasing the network depth enhances its efficiency and effectiveness when it comes to learning and recognising complex sequential patterns [19]. Thus, we use a stack of PRUs with different configurations to better classify normal and attack events. The network size and number of layers are two of the most significant characteristics to consider while designing our PRUs. Table 1 lists the PRUs used in our method.

TABLE 1:

PRUs SETTINGS FOR THE PROPOSED METHOD

PRU Model	Layer Size	Number of Layers
1	(100)	1
2	(200)	1
3	(100,100)	2
4	(200,100)	2
5	(100,100,100)	3
6	(100,50,20)	3

Open in a new tab

3.2. Ensemble of PRUs

To produce an aggregated outcome on the result of PRUs, we employ a DT unit. Suppose DT combines a set of different PRUs (denoted by L) over a subspace S for features $F_{i} \supseteq F$ , indicated as ${F_{i} (\cdot)}_{j = 1, \dots, S} \cdot {y_{i}}_{j = 1, \dots, S}$ denotes the class label, which are acquired through distinct PRUs L. Each L can be independently classified for any given example $x \in F_{i}$ through its feature subspace F_i. The DT considers a set of confidence rates for each class in the dataset before deciding on the result. The DT module receives the input from L as

Input of DT = {L_{i, c} where i \in {Number of L} AND c \in Number of Classes}

(12)

where L_i,c indicates the confidence rate of ith trained model for class c. As an input, the DT takes these confidence rates and determines the association among the true label of network data and the L confidence rate in a hierarchical manner. Fig. 4 shows the schematic structure of the DT and its functions in our proposed scheme. Suppose a training set $D_{M}^{F_{i}}$ , of M samples and F features, which each i ∈ S. In the same fashion, $D_{N}^{F_{i}}$ represents the test set with N samples and F features. DT determines the PRU output space manifold and provides a model for classifying the output class label y_i. The proposed approach is efficient in training and testing, requires little memory, and is appropriate and scalable for intrusion detection in SCADA IIoT because of its ability to eliminate irrelevant features.

Fig. 4: — Flowchart of the proposed method.

Theorem 1: Our method is trusted to detect SCADA-based IIoT cyberattacks through the ensemble of PRUs and DT.

Proof: Suppose S represents a group of training instances and a deep-learning model D can build a learner L. The L can be considered a hypothesis around a true function f, which accepts an instance x and assigns a label y to it. The proposed model produces a collection of learners/hypotheses (L) and explores a space H for optimal hypotheses. The proposed learning process can discover various distinct hypotheses in H, where each provides identical or varying accuracy outcomes on training examples of distinct random feature sets. The proposed approach reduces the likelihood of selecting incorrect learners by generating a collection of accurate learners and combining them through a DT. Combining precise hypotheses can better statistically approximate the function f. Hence, the proposed model is trusted to identify intrusion attacks in SCADA-based IIoT networks.

4. EVALUATIONS AND FINDINGS

We conducted a wide range of experiments with the benchmark datasets discussed in Section 2.1. We implemented our proposed model using Python 3.7 and the popular deep learning framework PyTorch². We ran all experiments on the NVIDIA GEFORCE GTX 1080 GPU for our proposed models and alternative baselines. We trained six distinct PRUs, each with a varied structure. We employ Adam [20], which delivers faster convergence than the SGD and avoids the challenge of adjusting the learning rate [16]. We selected 256 as the batch size, 200 as the epoch, 0.001 as the learning rate and determined the hyperparameters through experiments. We also employed a ten fold cross-validation approach [21] for both training and testing, which breaks a dataset randomly into ten segments and takes one segment for testing and the remaining nine for training. However, we divided the dataset into three parts at random and utilized eight segments for training, one segment for testing, and one segment for validation. We use the following metrics and detection time to measure the effectiveness of our model.

A c c u r a c y = \frac{TP + TN}{TP + TN + FP + FN}

(13)

False Positive Rate = \frac{FP}{FP + TN}

(14)

where FP, FN, TP, TP represent the false positive, false negative, true positive and true negative, respectively. Accuracy measures the samples accurately detected by a classifier divided by total samples

4.1. Results

Fig. 5 – 8 demonstrate the experimental outcomes of the baselines and our proposed model. Fig. 5 shows the accuracy, while Fig. 6 describes the false-positive rate for detecting both normal and abnormal events. In the same fashion, Fig. 7 shows the accuracy, while Fig. 8 illustrates the false-positive rate for classifying the normal and various attacks in traffic events.

Fig. 8: — Performance Analysis of the proposed model for Multiclassification in terms of False-positive Rates.

Fig. 6: — Performance Analysis of the proposed model for multiclassification in terms of Accuracy.

Fig. 7: — Performance Analysis of the proposed model for Binary classification in terms of False-positive Rates

4.2. Comparison With Benchmark methods

We compare our method with RKNN [10] and RSRT [14] models in terms of accuracy and computational time to illustrate its superior performance. We follow the same structure as reported in their work for a fair comparison. We compare the accuracy results for all of the 15 datasets, and in terms of computational time costs, we only consider dataset 9. In addition, we also use a statistical analysis test to assess the statistical variations in accuracy results.

4.2.1. Comparison of Accuracy Results

We conducted experiments with each model on all 15 datasets. We conducted experiments with each model on all 15 datasets. Fig. 5 - 6 and Table 2 illustrate the accuracy results. As can be seen in both Fig. 5 and 6, PRU model 4 is the best model. Thus, for clarity, we only showcase the results of PRU 4. Table 2 shows how well our model detects both normal and abnormal events when compared to other baselines. Similarly, our model also outperforms the baseline models in the multiclassification attack settings. Also, it can be seen that our proposed model outperforms both RSRT and RSKNN for detecting both normal and abnormal events for binary and multiclass classification.

TABLE 2:

COMPARISON RESULT OF OUR METHOD AND OTHER BASELINE METHODS IN TERMS OF ACCURACY FOR FOR BINARY AND MultiCLASSIFICATION.

Datasets	Binary Classification			Multiclassification
Datasets	RSKNN	RSRT	Proposed	RSKNN	RSRT	Proposed
1	95.87%	96.71%	98.89%	89.99%	91.01%	95.56%
2	95.12%	95.60%	98.21%	89.13%	91.10%	95.69%
3	95.91%	96.06%	98.49%	90.72%	91.67%	96.01%
4	95.17%	96.34%	98.64%	89.79%	91.63%	96.09%
5	96.55%	96.49%	98.75%	90.21%	90.77%	95.29%
6	95.73%	96.05%	98.46%	90.73%	91.10%	96.72%
7	95.26%	96.12%	98.53%	89.87%	91.02%	96.58%
8	95.59%	96.38%	98.61%	90.61%	91.19%	96.85%
9	95.16%	95.18%	98.57%	88.80%	90.69%	95.03%
10	96.13%	96.66%	98.82%	90.51%	92.61%	97.36%
11	95.77%	96.19%	98.59%	90.38%	91.35%	96.99%
12	95.94%	96.19%	98.62%	91.06%	91.84%	97.52%
13	96.73%	96.77%	99.03%	90.40%	91.84%	97.58%
14	95.48%	96.16%	98.52%	89.42%	90.53%	95.13%
15	95.10%	96.05%	98.45%	90.08%	91.24%	96.97%

Open in a new tab

4.2.2. Statistical Analysis of Accuracy Results

We used the non-parametric Mann-Whitney T-test for statistical analysis and looked at the implications of the accuracy results for RSRT and our proposed method. The non-parametric Mann-Whitney T-test is considered resilient against outliers, better for small sample sizes, and is independent of distributional assumptions [22]. The Mann–Whitney T-test compares the observations of two groups and uses their size for ranking them, and is computed as

T = R_{1} - \frac{n_{1} (n_{1} + 1)}{2} + R_{2} - \frac{n_{2} (n_{2} + 1)}{2}

(15)

where R1 and R2 imply the sum of rank in 1 and 2, respectively, and n1 and n1 represent sample sizes 1 and 2, respectively, by utilizing the sum of ranks and mean rank for every single group. The best group is ranked first, while the second-best is ranked second in this situation. The statistical analysis’s testing question can be stated as follows “Is there a statistically significant difference between the accuracy results obtained by RSRT and the proposed models?” We begin by presenting the hypothesis and classifying the assert in the following manner

Alternate Hypothesis: There are statistical variations for classifying normal and abnormal events (Binary classification) or various kinds of attacks in traffic events (Multiclassification) in the accuracy outcomes of the two models.
Null Hypothesis: There are no statistical variations for classifying normal and abnormal events (Binary classification) or various kinds of attacks in traffic events (Multiclassification) in the accuracy outcomes of the two models.

Figure 9 depicts the standard error of standard deviation for classifying normal and abnormal attacks while Fig. 10 illustrates the standard error of standard deviation in the multiclassification settings. We used the statistical SPSS tool to conduct the test. For binary classification, Tables 3–5 summarize the rank, test statistics and descriptive statistics in terms of accuracy results. The two-tailed p-value, as indicated in Table 5, is below 0.05. Thus, with having a confidence level of 95%, we refuse the null and adopt the alternative hypothesis. Consequently, we infer that the accuracy outcomes of the two models differ statistically. From Table 4, we may further deduce that these variations are for our proposed method, indicating its superiority over the RSRT, based on the sum of ranks and mean rank results. Likewise, we establish the following hypothesis for classifying normal and various other attacks in traffic events.

Fig. 9: — Standard Deviation of proposed method for Binary Classification in terms of Accuracy.

TABLE 3:

Descriptive Statistics of our Method for Binary and Multiclassification in terms of Accuracy results.

N	Mode	Std. Deviation	Mean	Max	Min
30	Binary Classification	0.492	96.68	97.31	96.77
30	Multiclassification	0.73	92.22	93.96	90.51

Open in a new tab

TABLE 5:

Test Statistics of our Method for Binary and Multiclassification in terms of Accuracy results.

Statistic	Binary Classification	Multiclassification
Mann-Whitney U	48.25	15.00
Wilcoxon W	180.00	145.00
Z	−2.98	−3.40
Asymp. Sig. (2-tailed p-value)	0.0048	0.002
Exact Sig. [2*(1-tailed Sig.)]	0.0046	0.002

Open in a new tab

TABLE 4:

Comparison between RSRT and our Proposed Method for Binary and Multiclassification in terms of Ranks.

Mode	N	Model	Sum of Ranks	Mean Rank
Binary	15	RSRT	300.00	20.00
Classification	15	PROPOSED	450.00	30.00
Multiclassification	15	RSRT	130.00	8.67
	15	PROPOSED	495.00	33.01

Open in a new tab

4.3. Comparison of Computational Time Costs:

We used dataset 9, which comprises 5340 different instances, to compare the time costs for both our proposed and RSRT methods. After preprocessing, the dataset contains 3738 and 1602 instances for both training and testing, respectively. We examine both binary and multiclass configurations to determine the time cost. On the specified dataset, for both training and testing, Table 6 shows the average time cost. We can see that our proposed method requires somewhat more time to train than the RSRT. This is due to the fact that the RSRT model does not utilize a deep learning method. On the other hand, our proposed method takes substantially less time for testing than the RSRT model, making it more effective in real-world scenarios.

TABLE 6:

Comparative Results of proposed model with RSRT in terms of average time (seconds) and training and testing cost.

Model	Task	Training (seconds)	Testing
RSRT	Binary class	0.22	0.03
	Multiclass	0.36	0.05
Proposed	Binary class	0.35	0.01
	Multiclass	0.47	0.02

Open in a new tab

4.4. Reliability and Trustworthiness

To examine the reliability, assume that our method comprises 10 PRUs or learners. Because of the heterogeneous nature of ensemble learning, the errors that occur in these PRUs are uncorrelated. If some learners are inaccurate, the remaining learners may be accurate, enabling our method to properly categorize intrusion attacks in SCADA-based IIoT networks. Fig. 11 shows a simulated probability of error for 10 different learners. We can see that each learner has an error of less than or equal to 0.14, and 7 of them have an error of less than 0.09, making our method good enough to detect attacks in SCADA-based IIoT networks. We carry out experiments with dataset 1 to verify the trustworthiness of our proposed model by classifying attacks with various numbers of learners. Fig. 12 shows the accuracy results of the proposed model utilizing an ensemble of 10 base learners corresponding to a single learner. Also, we can observe how the accuracy of our proposed method increases by combining multiple learners.

Fig. 11: — Error Probability for various learners.

Fig. 12: — Accuracy results of the proposed method for various learners.

We can also reveal the trustworthiness of our method by offering a mapping of the trusted computing base (TCB) model to the defense-in-depth model. This mapping can help explain how confidentiality-integrityavailability (CIA) is sustained. Fig. 13 illustrates the TCB security paradigm, which is embedded in our proposed SCADA-based model. The elements of the trusted zone inside the security outline include security control, hardware and software, and policies, which are coupled to guarantee the maintenance of the CIA triad and total security system adds to trustworthiness. The TCB/SCADA reference monitor/physical security control paradigm prevents and detects unwanted illegal actions to resources within the trusted zone’s boundary. This layer often includes automated physical access control systems (PACS), for instance, mantraps, CCTV cameras and motion detectors. On the other hand, SCADA systems and associated subsystems are typically positioned in remote locations, where PACS deployment is challenging. Hence, in this case, a defense-in-depth approach must be supplemented with extra measures, for example, establishing antimalware resources or IDSs in the logical control.

Fig. 13: — TCB security paradigm employing a defense-indepth method to ensure trustworthiness.

They are incompatible with the SCADA settings since they are dependent on application program interfaces (API) or protocols. As a result, these classical detective or preventative security controls fail against blocking unauthorized access. Hence, accurate and reliable security control must be established to ensure a defense-indepth approach and improve the trustworthiness of the SCADA system. We solved these shortcomings in our proposed model, formed a reliable cyber-attack detection method, and verified it with massive SCADA network traffic with various attacks targeting several vulnerabilities of SCADA components and the overall system.

5. Conclusion

The ability to protect SCADA-based IIoT networks against cyberattacks increases their trustworthiness. The existing security methods along with machine learning algorithms are inefficient and inaccurate for protecting IIoT networks. In this paper, we proposed a cyberattacks detection mechanism using enhanced deep and ensemble learning in SCADA-based IIoT network. The proposed mechanism is reliable and accurate because an ensemble detection model was built using a combination of the Pyramidal Recurrent Unit (PRU) and the Decision Tree (DT). The proposed method was evaluated across 15 datasets generated from a SCADA-based network, and a considerable increase in terms of classification accuracy was obtained. Compared to state-of-the-art techniques, the obtained outcomes of our method exhibited a good balance between reliability, trustworthiness, classification accuracy and model complexity, resulting in improved performance.

In the future, we will employ more powerful deep learning models to further improve trustworthiness by detecting cyberattacks accurately. In addition, we will try to formulate and assess its performance in real-world scenarios. Also, we will work on the selection of optimal features in scenarios when the features are not sufficient.

Footnotes

^1.

https://sites.google.com/a/uah.edu/tommy-morris-uah/ics-data-sets

^2.

https://pytorch.org/

Contributor Information

Fazlullah Khan, Department of Computer Science, Abdul Wali Khan, University Mardan, Pakistan.

Ryan Alturki, Department of Information Science, College of Computer, and Information Systems, Umm Al-Qura University, Makkah, Saudi Arabia.

Md Arafatur Rehman, School of Mathematics and Computer Science, University of Wolverhampton, UK.

Spyridon Mastorakis, Department of Computer Science, University of Nebraska, Omaha, United States.

Imran Razzak, School of Computer Science and Engineering, Faculty of Engineering, University of New South Wales Sydney, NSW, Australia.

Syed Tauhidullah Shah, Department of Software Engineering, University of Calgary, Canada.

References

[1].Luo Y, Duan Y, Li W, Pace P, and Fortino G, “A novel mobile and hierarchical data transmission architecture for smart factories,” IEEE Transactions on Industrial Informatics, vol. 14, no. 8, pp. 3534–3546, 2018. [Google Scholar]
[2].Gavriluta C, Boudinet C, Kupzog F, Gomez-Exposito A, and Caire R, “Cyber-physical framework for emulating distributed control systems in smart grids,” International journal of electrical power & energy systems, vol. 114, p. 105375, 2020. [Google Scholar]
[3].Mahmoud MS, Hamdan MM, and Baroudi UA, “Modeling and control of cyber-physical systems subject to cyber attacks: A survey of recent advances and challenges,” Neurocomputing, vol. 338, pp. 101–115, 2019. [Google Scholar]
[4].Wang T, Zhang G, Bhuiyan MZA, Liu A, Jia W, and Xie M, “A novel trust mechanism based on fog computing in sensor– cloud system,” Future Generation Computer Systems, vol. 109, pp. 573–582, 2020. [Google Scholar]
[5].Guo K, Ren S, Bhuiyan MZA, Li T, Liu D, Liang Z, and Chen X, “Mdmaas: medical-assisted diagnosis model as a service with artificial intelligence and trust,” IEEE Transactions on Industrial Informatics, vol. 16, no. 3, pp. 2102–2114, 2019. [Google Scholar]
[6].Al-Hawawreh M. and Sitnikova E, “Developing a security testbed for industrial internet of things,” IEEE Internet of Things Journal, vol. 8, no. 7, pp. 5558–5573, 2020. [Google Scholar]
[7].Shahriar MA, Bappy FH, Hossain AF, Saikat DD, Ferdous MS, Chowdhury MJM, and Bhuiyan MZA, “Modelling attacks in blockchain systems using petri nets,” in 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 1069–1078, IEEE, 2020. [Google Scholar]
[8].Abdel-Basset M, Chang V, Hawash H, Chakrabortty RK, and Ryan M, “Deep-ifs: Intrusion detection approach for iiot traffic in fog environment,” IEEE Transactions on Industrial Informatics, 2020.
[9].Huda S, Abawajy J, Al-Rubaie B, Pan L, and Hassan MM, “Automatic extraction and integration of behavioural indicators of malware for protection of cyber–physical networks,” Future Generation Computer Systems, vol. 101, pp. 1247–1258, 2019. [Google Scholar]
[10].“Information technology-security techniques-information security risk management, iso/iec 27005:2018.”
[11].Yan X, Xu Y, Xing X, Cui B, Guo Z, and Guo T, “Trustworthy network anomaly detection based on an adaptive learning rate and momentum in iiot,” IEEE Transactions on Industrial Informatics, vol. 16, no. 9, pp. 6182–6192, 2020. [Google Scholar]
[12].Wu D, Jiang Z, Xie X, Wei X, Yu W, and Li R, “Lstm learning with bayesian and gaussian processing for anomaly detection in industrial iot,” IEEE Transactions on Industrial Informatics, vol. 16, no. 8, pp. 5244–5253, 2019. [Google Scholar]
[13].Moustafa N. and Slay J, “Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set),” in 2015 military communications and information systems conference (MilCIS), pp. 1–6, IEEE, 2015. [Google Scholar]
[14].Hassan MM, Gumaei A, Huda S, and Almogren A, “Increasing the trustworthiness in the industrial iot networks through a reliable cyberattack detection model,” IEEE Transactions on Industrial Informatics, vol. 16, no. 9, pp. 6154–6162, 2020. [Google Scholar]
[15].Jahromi AN, Hashemi S, Dehghantanha A, Choo K-KR, Karimipour H, Newton DE, and Parizi RM, “An improved two-hidden-layer extreme learning machine for malware hunting,” Computers & Security, vol. 89, p. 101655, 2020. [Google Scholar]
[16].Shah STU, Li J, Guo Z, Li G, and Zhou Q, “Ddfl: a deep dual function learning-based model for recommender systems,” in International Conference on Database Systems for Advanced Applications, pp. 590–606, Springer, 2020. [Google Scholar]
[17].Hink RCB, Beaver JM, Buckner MA, Morris T, Adhikari U, and Pan S, “Machine learning for power system disturbance and cyber-attack discrimination,” in 2014 7th International symposium on resilient control systems (ISRCS), pp. 1–8, IEEE, 2014. [Google Scholar]
[18].Derhab A, Guerroumi M, Gumaei A, Maglaras L, Ferrag MA, Mukherjee M, and Khan FA, “Blockchain and random subspace learning-based ids for sdn-enabled industrial iot security,” Sensors, vol. 19, no. 14, p. 3119, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
[19].Mehta S, Koncel-Kedziorski R, Rastegari M, and Hajishirzi H, “Pyramidal recurrent unit for language modeling,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4620–4630, 2018. [Google Scholar]
[20].Kingma DP and Ba J, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[21].Refaeilzadeh P, Tang L, and Liu H, “Cross-validation.,” Encyclopedia of database systems, vol. 5, pp. 532–538, 2009. [Google Scholar]
[22].Zeoli GW and Fong TS, “Performance of a two-sample mannwhitney nonparametric detector in a radar application,” IEEE Transactions on Aerospace and Electronic Systems, no. 5, pp. 951–959, 1971.

[R1] [1].Luo Y, Duan Y, Li W, Pace P, and Fortino G, “A novel mobile and hierarchical data transmission architecture for smart factories,” IEEE Transactions on Industrial Informatics, vol. 14, no. 8, pp. 3534–3546, 2018. [Google Scholar]

[R2] [2].Gavriluta C, Boudinet C, Kupzog F, Gomez-Exposito A, and Caire R, “Cyber-physical framework for emulating distributed control systems in smart grids,” International journal of electrical power & energy systems, vol. 114, p. 105375, 2020. [Google Scholar]

[R3] [3].Mahmoud MS, Hamdan MM, and Baroudi UA, “Modeling and control of cyber-physical systems subject to cyber attacks: A survey of recent advances and challenges,” Neurocomputing, vol. 338, pp. 101–115, 2019. [Google Scholar]

[R4] [4].Wang T, Zhang G, Bhuiyan MZA, Liu A, Jia W, and Xie M, “A novel trust mechanism based on fog computing in sensor– cloud system,” Future Generation Computer Systems, vol. 109, pp. 573–582, 2020. [Google Scholar]

[R5] [5].Guo K, Ren S, Bhuiyan MZA, Li T, Liu D, Liang Z, and Chen X, “Mdmaas: medical-assisted diagnosis model as a service with artificial intelligence and trust,” IEEE Transactions on Industrial Informatics, vol. 16, no. 3, pp. 2102–2114, 2019. [Google Scholar]

[R6] [6].Al-Hawawreh M. and Sitnikova E, “Developing a security testbed for industrial internet of things,” IEEE Internet of Things Journal, vol. 8, no. 7, pp. 5558–5573, 2020. [Google Scholar]

[R7] [7].Shahriar MA, Bappy FH, Hossain AF, Saikat DD, Ferdous MS, Chowdhury MJM, and Bhuiyan MZA, “Modelling attacks in blockchain systems using petri nets,” in 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 1069–1078, IEEE, 2020. [Google Scholar]

[R8] [8].Abdel-Basset M, Chang V, Hawash H, Chakrabortty RK, and Ryan M, “Deep-ifs: Intrusion detection approach for iiot traffic in fog environment,” IEEE Transactions on Industrial Informatics, 2020.

[R9] [9].Huda S, Abawajy J, Al-Rubaie B, Pan L, and Hassan MM, “Automatic extraction and integration of behavioural indicators of malware for protection of cyber–physical networks,” Future Generation Computer Systems, vol. 101, pp. 1247–1258, 2019. [Google Scholar]

[R10] [10].“Information technology-security techniques-information security risk management, iso/iec 27005:2018.”

[R11] [11].Yan X, Xu Y, Xing X, Cui B, Guo Z, and Guo T, “Trustworthy network anomaly detection based on an adaptive learning rate and momentum in iiot,” IEEE Transactions on Industrial Informatics, vol. 16, no. 9, pp. 6182–6192, 2020. [Google Scholar]

[R12] [12].Wu D, Jiang Z, Xie X, Wei X, Yu W, and Li R, “Lstm learning with bayesian and gaussian processing for anomaly detection in industrial iot,” IEEE Transactions on Industrial Informatics, vol. 16, no. 8, pp. 5244–5253, 2019. [Google Scholar]

[R13] [13].Moustafa N. and Slay J, “Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set),” in 2015 military communications and information systems conference (MilCIS), pp. 1–6, IEEE, 2015. [Google Scholar]

[R14] [14].Hassan MM, Gumaei A, Huda S, and Almogren A, “Increasing the trustworthiness in the industrial iot networks through a reliable cyberattack detection model,” IEEE Transactions on Industrial Informatics, vol. 16, no. 9, pp. 6154–6162, 2020. [Google Scholar]

[R15] [15].Jahromi AN, Hashemi S, Dehghantanha A, Choo K-KR, Karimipour H, Newton DE, and Parizi RM, “An improved two-hidden-layer extreme learning machine for malware hunting,” Computers & Security, vol. 89, p. 101655, 2020. [Google Scholar]

[R16] [16].Shah STU, Li J, Guo Z, Li G, and Zhou Q, “Ddfl: a deep dual function learning-based model for recommender systems,” in International Conference on Database Systems for Advanced Applications, pp. 590–606, Springer, 2020. [Google Scholar]

[R17] [17].Hink RCB, Beaver JM, Buckner MA, Morris T, Adhikari U, and Pan S, “Machine learning for power system disturbance and cyber-attack discrimination,” in 2014 7th International symposium on resilient control systems (ISRCS), pp. 1–8, IEEE, 2014. [Google Scholar]

[R18] [18].Derhab A, Guerroumi M, Gumaei A, Maglaras L, Ferrag MA, Mukherjee M, and Khan FA, “Blockchain and random subspace learning-based ids for sdn-enabled industrial iot security,” Sensors, vol. 19, no. 14, p. 3119, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] [19].Mehta S, Koncel-Kedziorski R, Rastegari M, and Hajishirzi H, “Pyramidal recurrent unit for language modeling,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4620–4630, 2018. [Google Scholar]

[R20] [20].Kingma DP and Ba J, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.

[R21] [21].Refaeilzadeh P, Tang L, and Liu H, “Cross-validation.,” Encyclopedia of database systems, vol. 5, pp. 532–538, 2009. [Google Scholar]

[R22] [22].Zeoli GW and Fong TS, “Performance of a two-sample mannwhitney nonparametric detector in a radar application,” IEEE Transactions on Aerospace and Electronic Systems, no. 5, pp. 951–959, 1971.

PERMALINK

Trustworthy and Reliable Deep Learning-based Cyberattack Detection in Industrial IoT

Fazlullah Khan

Ryan Alturki

Md Arafatur Rehman

Spyridon Mastorakis

Imran Razzak

Syed Tauhidullah Shah

Abstract

1. Introduction

Fig. 1:

2. Preliminaries and Methods

Fig. 2:

2.1. Datasets

2.2. Problem Formulation

3. The Proposed Model

3.1. Pyramidal Recurrent Unit models

Fig. 3:

3.1.1. Pyramidal transformation (PR) for input

3.1.2. Grouped linear transformation (GLT)

3.1.3. Pyramidal Recurrent Unit (PRU)

TABLE 1:

3.2. Ensemble of PRUs

Fig. 4:

4. EVALUATIONS AND FINDINGS

4.1. Results

Fig. 5:

Fig. 8:

Fig. 6:

Fig. 7:

4.2. Comparison With Benchmark methods

4.2.1. Comparison of Accuracy Results

TABLE 2:

4.2.2. Statistical Analysis of Accuracy Results

Fig. 9:

Fig. 10:

TABLE 3:

TABLE 5:

TABLE 4:

4.3. Comparison of Computational Time Costs:

TABLE 6:

4.4. Reliability and Trustworthiness

Fig. 11:

Fig. 12:

Fig. 13:

5. Conclusion

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases