Routing Attack Detection Using Ensemble Deep Learning Model for IIoT
Abstract:
Smart cities, ITS, supply chains, and smart industries may all be developed with minimal human interaction thanks to the increasing prevalence of automation enabled by machine-type communication (MTC). Yet, MTC has substantial security difficulties because of diverse data, public network access, and an insufficient security mechanism. In this study, we develop a novel IIOT attack detection basis by joining the following four main steps: (a) data collection, (b) pre-processing, (c) attack detection, and (d) optimisation for high classification accuracy. At the initial stage of processing, known as "pre-processing," the collected raw data (input) is normalised. Attack detection requires the creation of an intelligent security architecture for IIoT networks. In this work, we present a learning model that can recognise previously unrecognised attacks on an IIoT network without the use of a labelled training set. An IoT network intrusion detection system-generated labelled dataset. The study also introduces a hybrid optimisation algorithm for pinpointing the optimal LSTM weight when it comes to intrusion detection. When trained on the labelled dataset provided by the proposed method, the improved LSTM outperforms the other models with a finding accuracy of 95%, as exposed in the research.
1. Introduction
The IIoT is a subset of the IoT that makes it possible to connect devices in a smart way to provide predictive services in an industrial sector that is becoming more and more automated [1]. Machine-type communication (MTC) is an example of pervasive communication that is needed for the interconnection of devices so that machines can collaborate on an IIoT task without human intervention. One machine gathers sensitive information from the business setting and sends it to others using a wireless or cellular network interface [2], [3]. The information is then analysed by a computer model, which makes precise judgements and perhaps initiates robotic processes. The Internet is used for machine-to-machine communication; however, this exposes MTC systems to a wide variety of security threats, with, but not incomplete to, attacks, network exploitation, injection attacks [4], [5]. Furthermore, the MTC system's many devices collect an enormous variety of data (industrial-critical data) that needs regular monitoring to prevent data breaches and tampering.
Due to variables such as the devices' limited resources, the network's dynamic topology, and the variety of attack vectors, detecting assaults on the routing scheme of IIoT strategies can be challenging [6]. Recent years have seen the development of a number of methods for dealing with this problem, machine learning-based approaches. Different aspects of network traffic, like traffic patterns, are used by these techniques to detect and categorise routing attacks [7], [8]. Particular attention is paid in this essay to the RPL protocol, which is vulnerable to routing attacks. Attacks against RPL may be broken down into two groups: those that are carried over from WSNs and those that are special to RPL and take advantage of its particular weaknesses [9]. This page details a variety of RPL assaults, such as Flood Attacks, Data-DoS/DDoS Attacks [10], which mostly target layer 3 of the OSI perfect.
In reality, the application layer is the highest level of the IIoT architecture [11], and it enables a wide variety of industrial processes and applications with smart healthcare, smart vehicles, and so on. The (IIoT) is an all-encompassing network that serves a wide range of industries and individual users. But, it brings up a wide range of new issues relating to safety, security, the economy, and society. To address these issues, we need scalable solutions on a grand scale. Due to their limited resources, IoT sensor nodes necessitate security products that utilise as little space, power, and money as possible. These fixes should work with the industry's standard in communication protocols [12]. As IoT devices generate vast volumes of data across industrial applications, an IIoT system is enticing to cybercriminals [13]. The sheer volume of data suggests that traditional methods of data processing are inadequate for IoT and IIoT use cases. Thus, machine learning is one of the best computer models for incorporating IoT-device intelligence (ML).
Maintaining proper command of IIoT's massive industrial systems is a challenging endeavour. The ability to swiftly and safely understand and analyse vast volumes of data is crucial for computing systems in the modern day [14]. In addition, the latency and reliability of data transmission required high system capability and throughput. The overall performance of the industrial sector has been vastly improved thanks to "Deep learning (DL) algorithms" and models in terms of dependability and reliability. These algorithms show a lot of promise for fixing security issues in IIoT [15]. Unfortunately, they lack the necessary accuracy and have a higher computational cost. In order to provide a potential answer to attack detection, optimisation methods might be used in the deep learning model.
This paper's contributions are summed up as:
· The perfect must be able to unearth concealed patterns in classify network traffic as either malicious or benign in order to detect novel or previously undisclosed threats. We've employed a suite of clustering methods to get here. The results of many clustering algorithms are pooled together using a weighted voting approach to improve the accuracy with which the class label (malicious/non-malicious) is predicted for a given piece of IIoT network data. After conducting a thorough performance investigation, weights have been determined for the results of each clustering method. An unsupervised mechanism capable of identifying voting, which transforms an unlabelled dataset into a labelled dataset.
· A deep learning model for IoT network attack detection is trained using the labelled dataset produced by the proposed technique. The performance of several deep learning models (optimised LSTM, MLP, and DBN) has been compared to determine the most effective model for detecting threats in an Internet of Things (IoT) network.
· Hybrid optimisation model is used to choose the LSTM's weight appropriately.
The remaining sections of the paper are as shadows: In Part 2, we outline the current research on how to spot attacks in IIoT networks. The suggested model is described in depth in Section 3. Section 4 discusses the suggested model's implementation and the deep learning models that were employed. Section 5 wraps up the report and discusses where the research may go from here.
2. Related Works
A new MANET routing protocol based on reinforcement learning and named reputation opportunistic routing by Ryu and Kim [16] is proposed (RORQ). This protocol uses game theory to identify and blacklist rogue nodes in a network, allowing for more streamlined traffic flow. So, our approach can more efficiently locate a routing path in a hostile network. The simulation results demonstrated that the suggested technique outperformed other cutting-edge routing protocols. Gains 82% in average end-to-end delay, and up to 28% in energy efficiency were shown by the proposed method over other algorithms in the blackhole to 12% in energy competence were shown by the proposed method over other algorithms in the grayhole attack scenario.
To aid in the finding of jamming attacks, Obeidat et al. [17] developed a model to analyse the operation of VANETs while under jamming attacks, and they offer EVA (Enhancement Voting Algorithm) based on global Trust are exchanged. Route Error (RERR) and HELLO packets are utilised during the period of route maintenance. Because of their crucial role in routing, these packets are also appraised as part of the trust score. Although while misbehaving nodes are technically capable of processing these packets, they are less likely to be utilised than their well-behaving counterparts. The calculated global trust value is used to define three trust levels that will be used to determine the optimal routing decision in the NS3 simulation. Bonnmotion is used to design and analyse mobility scenarios, which are then used to probe the properties of mobile multi-hop networks. In order to develop a mechanism strategy, the scenarios were spread to the NS3 network simulators. In order to determine how well a network performs when subjected to jamming assaults, it is first analysed using a variety of quality-of-service (QoS) metrics and throughput (PDR) measures.
WSNs are a cornerstone of the (IoT), and Rabhi et al. [18] highlight their susceptibility to routing assaults in their presentation of the Routing power (RPL). We also offer a method for identifying three distinct forms of assault against RPL, and we highlight some recent research suggestions for doing so. We simulate four network scenarios using Contiki-Cooja, one benign and three malicious presenting different phase, where we employed WEKA, to determine whether the behaviour was benign or malicious according to the database. In this stage, we employ many distinct classification procedures, which collectively allow us to achieve a precision value greater than 96%.
To identify DDoS bouts in the IoT-CIDDS dataset, Malik et al. [19] suggest a feature engineering and machine learning outline. There are two stages to the framework: Our initial step is to create algorithms for dataset enrichment, with a focus on using cutting-edge feature engineering for statistical analysis of the dataset's probability distribution and feature correlations. Later, using IoT-CIDDS to generate training, validation, and testing datasets, we propose an ML model and conduct a complexity analysis of the feature-engineered dataset using five machine learning techniques. Performance metrics for training classifiers and evaluating ML models include false positive rate, accuracy, precision, recall, area under curve, and computational time. Detecting DDoS attacks in standard IoT networks using the 6LoWPAN stack is a challenging problem, but the experimental consequences show that significant feature reduction optimises the IDS.
Based on the DL technology, Alghamdi and Bellaiche [20] describe a cascaded wormhole detection method for Internet of Things networks (DTF). Using a federated strategy that ensures data security and privacy at the node level, (LSTM) deep learning models were trained. The DTF is based on two trust qualities. Due to its lightweight and accurate cascaded and federated learning strategy, the suggested method has achieved an accuracy of 96%.
Örs and Levi [21] offer a multi-class classifier based on machine learning that can distinguish between six different kinds of attacks and normal traffic. Instead of just having a general idea of whether or not attacks are happening on a network, our node-based feature extraction and detection approach models the traffic patterns of the attackers across a sliding time window, allowing us to pinpoint their exact IP addresses. We also present an intrusion detection dataset built from traffic data obtained from real-world IoT devices running 6LoWPAN and RPL protocols, which can be used for training and testing our algorithms. In addition to using RPL routing assaults, a common method of attack against IoT devices, we also make use of the Mirai botnet. As can be seen from the findings, the suggested intrusion detection system has a recall score between 79% and 100% for detecting 6 distinct types of attacks. We also deploy the generated model in an implementation across a testbed to demonstrate its viability.
3. Proposed System
The LSTM-based discovery is trained with the X-IIOTID: connectivity and device-agnostic intrusion dataset for the IIoT [22] dataset. The final version of the dataset has a feature space size of 68 and contains 820834 training examples. There are three different kinds of attack labels that can be applied to a target: normal and attack, normal and sub-category attack, and normal and sub-sub-category attack. The algorithmic processing pipeline for the intelligence layer is depicted in Algorithm 1.
Algorithm 1. Data pre-processing algorithmic movement
Input: Raw machine requests D, Target Values $y_{\mathrm{m}, 1}$ Output: $\Phi_{\mathrm{m} \times \mathrm{o} \times \mathrm{n}, y_{\mathrm{m}, \mathrm{C}}}$ |
$y_{m, c} \leftarrow$ OHE $(y m ; 1)$ $D \leftarrow \operatorname{drop}(D, columns =[I P, date, timestamps,i d s])$ if $D_{\text {columns }}$.isNull() and $D_{\text {columns }}$.dataType in [string,int] then $D_{\text {columns }} \leftarrow D_{\text {columns }}. fillNull \left(D_{\text {columns }}\right.. mode ())$ else $D_{\text {columns }} \leftarrow D_{\text {columns }}. fillNull \left(D_{\text {columns }}\right.. median ())$ end if if $D_{\text {columns }}$.data Category is string then $D_{\text {columns }}$= labelEncoder ($D_{\text {columns }}$) end if $\begin{gathered}\varphi m \times n \leftarrow D \\ \varphi_l \leftarrow \operatorname{group} B y(l) \\ \Phi_{m \times o \times n} \leftarrow \bigcup_C \phi_l\end{gathered}$ return $\Phi_{m \times o \times n}, y_{m, C}$ |
In order to use the obtained dataset for training and prediction, it must first undergo pre-processing. Columns like IP addresses, dates, and ids that aren't strictly necessary are taken out of the dataset. All NaN and null values are replaced with the median of the related columns [23]. Columns containing strings can have their values converted to numbers using label encoding. Take the dataset represented by $\varphi$_(mn) where columns have been removed and null values have been substituted. Each class label l is transformed using the group by operation G.
where, C represents a group of unique intended audiences. Moreover, it can be partitioned into a large number of timesteps, each of size o. For the last batch of training data, mon may be expressed as mon=C l. The string data type in each target class must be converted into a one-hot encoded vector. The dataset has C distinct classes, one of which is the initial target class, y. It's unmistakable that y is a string data type and that its form is (m,1). Changing y with a single pass of OHE,
This stands for the assignment operator. The following is one representation for the hot encoded vector y (m,C):
It is necessary to do feature engineering on the ensemble model to guarantee that the data is of the appropriate distribution before it can be functional to machine requests. Imagine a sudden influx of machine-generated queries. D.
$\forall 1 \leq i \leq m, \forall 1 \leq j \leq n$
where, DMi is the data being communicated by the machine, and aj is a single feature or characteristic of that data, and n is the total number of features. For at unit time period, active computers can create a request to send to the terminus node. The sending devices will provide you with these queries in the format of
$\forall 1 \leq k \leq o, \quad \tau_t \subseteq D$
Following is a description of the form of the feature spaces for the associated machine data.
The size of the set t may be calculated as on. In order to acquire the final dataset fit for model predictions, $\tau_t$ is transformed to 1. The second dimension of the modified dataset $\tau_t$ represents the whole-time step, while the third dimension represents a number of characteristics. The model’s specifications for the input shape inform the implementation of data preparation. It's important to remember that the size of the dataset being transmitted, $\tau_t$, does not change.
The idea behind ensemble learning is that better performance may be achieved by combining the outcomes of many learning models. Independent ensemble building and coordinated ensemble construction are two implementations of the ensemble learning paradigm that can provide numerous projected outputs. The goal of the independent ensemble construction approach is to generate multiple results that can be joint using the ensemble technique by independently executing a learning algorithm multiple times on different training data subsets or by independently executing different learning models on the same dataset. In contrast, when building a coordinated ensemble.
The suggested model makes use of weighted voting to combine the results of many base learning models in an independent ensemble creation method. To predict a class label for each data vector in the given unlabelled dataset is the primary goal of the proposed ensemble learning model. As a result, we have relied on clustering methods to foretell the labels assigned to data matrices. Small Batch K-Means, Fuzzy C-Means, and OPTICS clustering were employed as the foundational learning models for the suggested model. Each clustering method will produce a 0 or 1 as its predicted output, with 0 representing benign traffic and 1 representing malicious traffic. Two groups, one containing benign data and the other containing harmful data, are created by combining the anticipated output from each clustering method, for each data entry, using weighted voting using equation 3.
Using a weighted voting system, the findings of many clustering algorithms are combined to create a single, more accurate forecast for the data. This is called an independent ensemble construction approach.
where, Wi stands for the weights connected to the clustering method's base prediction Pi. Eventually, we get to the formula for predicting the class label V:
The suggested model's clustering approach, Small Batch K-Means, had its weights linked with the projected value adjusted to 0.25 for both OPTICS and Fuzzy C-Means after extensive performance investigation. Algorithm 2 depicts the whole procedure for applying the ensemble learning model to transform an unlabelled dataset into a labelled one.
Algorithm 2. Working of Ensemble Learning Perfect |
1: Input: 2: $D_{\mathrm{UL}}$: Unlabelled Dataset 3: FS: Feature set from Algorithm 1 4: Begin 5: Create an unfilled list $D_{\mathrm{L}}$ 6: Set $W_1$= $W_2$= 0:25 & $W_3$ = 0:50 7: for each data-entry $d_{\mathrm{UL}}$ in $D_{\mathrm{UL}}$ do 8: $P_1=M B K$ means $\left(F S\left(d_{U L}\right)\right)$ 9: $P_2=O P T ICS\left(F S\left(d_{U L}\right)\right)$ 10: $P_3=F C$means $\left(F S\left(d_{U L}\right)\right)$ 11: Cal.V using eq. 3 12: if V>0: 5 then 13: $\operatorname{Set} \hat{V}=1$ 14: else 15: $\operatorname{Set} \hat{V}=0$ 16: end if 17: Append $\left(d_{U L} ; \widehat{V}\right)$ in $D_L$ 18: end for 19: return DL: Labelled Dataset |
Labeled data may be produced with the help of the suggested ensemble model. In order to train various deep learning models, the created labelled dataset is put to use. Through performance analysis, we choose a model that is effective in detecting malicious assaults in an IIoT network using LSTM networks [24], MLPs [25], and DBNs [26]. Figure 1 depicts the underlying architectures from which the various deep neural network models were constructed. To detect unidentified network attacks at the edge layer, the trained DL model can be organised at the fog layer. This study uses a hybrid optimisation model to determine the LSTM's weight optimally, as will be shown below.
In this paper, we suggest a fresh hybrid algorithm called the Hybrid Cat-Particle Swarm Optimization (HCPSO) algorithm. We integrate the CSO and PSO that are recognised as good metaheuristic algorithms. We employ the entire CSO scheme procedure in the HCPSO algorithm, with a few tweaks here and there. Similar to PSO, the algorithm stores both the global and local optimal positions. After that, we use of the specified dimension in searching mode, and the best new contender is picked to take its place. This hybridization attempts to achieve a faster-convergent algorithm without significantly increasing its execution time. All steps of the HCSPO algorithm are labelled as follows.
Then, in the range [0,1], generate a vector of N searchers' initial positions (X) and speeds (V).
where, 𝐷 is the number of items types.
1. Convert the position (𝑋) into MBKP-MC solution term (𝑌) using Equation (14).
2. Verify all of the limitations. Make sure that all the solutions are an infeasible area which means all the solutions must meet the MBKP-MC constraints. Consider each solution's fitness value (total profit) and rank them accordingly. Divide the individuals into seeking and tracing modes.
3. Individuals who are actively seeking something. Create copies based on their own the best position C k Equation (15) and modify the selected dimension based on the best global solution C g Equation (16).
4. If individuals are in tracing mode. Update the velocity and position based on PSO movement as formulated in Equation (17) - Equation (18)
Combine the cats in both the searching and tracing modes, making sure that no spots are beyond the range [0,1]. It is necessary to change the solution by means of Equation if it is larger than the search space (19).
5. Convert the new position (𝑋) into the MBKP-MC solution term (𝑌). Check all the constraints and then evaluate the fitness value.
6. Update the best individual position $C_{\mathrm{k}}$ and the best global position $C_{\text {g }}$.
7. Check the termination criterion. If the criterion is reached, then the algorithm is stopped and the final solution is $C_{\text {g }}$. But, if the criterion is not reached, go back to step 6.
4. Results and Discussion
The following equations are used to evaluate the model's presentation using some of the most used metrics.
Table 1 presents the validated results of proposed model. Figure 2, Figure 3, Figure 4, Figure 5 provide the graphical analysis of various metrics.
Algorithm | Precision | Recall | F-score | Accuracy |
KNN | 87.21 | 80.15 | 80.43 | 80.10 |
SVM | 84.32 | 85.93 | 83.45 | 85.71 |
MLP | 92.43 | 92.15 | 91.68 | 92.10 |
DBN | 93.48 | 92.44 | 91.81 | 92.46 |
LSTM | 90.21 | 89.54 | 89.03 | 89.52 |
Optimized Ensemble | 96.61 | 94.52 | 93.24 | 94.53 |
In the analysis of accuracy, KNN achieved 80%, SVM achieved 85%, MLP achieved 92%, DBN achieved 92%, LSTM achieved 89.52% and projected model achieved 94.53%. The reason for better presentation is that the weight of the LSTM is optimized by HCPSO. When comparing with all models, SVM achieved poor performance on precision, i.e., 84.32%, where the MLP, DBN, LSTM achieved nearly 90% to 94% of precision and finally, the proposed model achieved 96.61%. When the models are tested with recall and F-score, the KNN achieved 80%, SVM achieved 83% to 85%, MLP achieved 92%, DBN achieved 92%, LSTM achieved 89% and proposed ensemble model achieved 93% of F-score and 94.52% of recall.
5. Conclusion
We provide a model that may transform an unlabeled network dataset into a labelled one, allowing for the prediction of previously undiscovered attacks. The AI-based ensemble model that aided the intelligence layer in predicting the output label is evaluated using accuracy, F1-score. The projected ensemble learning model converts the dataset into a labelled dataset so that it may be used to train a DL model. Improved versions of the LSTM, MLP, and DBN deep learning models were used to increase classification accuracy in the study. With an attack detection accuracy of 95% on the analysed dataset, the results show that optimised LSTM performs better than the other two DL models in identifying malicious assaults in an IIoT network. To identify new threats, the proposed unsupervised ensemble-based learning algorithm analyses unlabeled IIoT network data. In a fog computing setup, this concept may be used in the cloud. In order to use deep learning models for network intrusion detection, the proposed method labels network traffic. The trained network may be installed at the fog layer to analyse the network traffic of edge devices and identify them, with frequent updates in the cloud to account for new assaults. The stress on fog and on power- devices may be decreased by employing a fog computing architecture. Implementing the suggested model on a real-world IoT network using a fog computing architecture would allow us to further investigate its efficacy and complexity. The study also introduces a hybrid optimisation algorithm for pinpointing the optimal LSTM weight when it comes to intrusion detection. When trained on the labelled dataset provided by the proposed method, the improved LSTM outperforms the other models with a finding accuracy of 95%, as exposed in the research.
The data used to support the findings of this study are available from the corresponding author upon request.
The authors declare that they have no conflicts of interest.