Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2024 Oct 26;14:25487. doi: 10.1038/s41598-024-74350-3

Detecting command injection attacks in web applications based on novel deep learning methods

Xinyu Wang 1, Jiqiang Zhai 1,, Hailu Yang 1
PMCID: PMC11513136  PMID: 39461962

Abstract

Web command injection attacks pose significant security threats to web applications, leading to potential server information leakage or severe server disruption. Traditional detection methods struggle with the increasing complexity and obfuscation of these attacks, resulting in poor identification of malicious code, complicated feature extraction processes, and low detection efficiency. To address these challenges, a novel detection model, the Convolutional Channel-BiLSTM Attention (CCBA) model, is proposed, leveraging deep learning techniques to enhance the identification of web command injection attacks. The model utilizes dual CNN convolutional channels for comprehensive feature extraction and employs a BiLSTM network for bidirectional recognition of temporal features. An attention mechanism is also incorporated to assign weights to critical features, improving the model’s detection performance. Experimental results demonstrate that the CCBA model achieves 99.3% accuracy and 98.2% recall on a real-world dataset. To validate the robustness and generalization of the model, tests were conducted on two widely recognized public cybersecurity datasets, consistently achieving over 98% accuracy. Compared to existing methods, the proposed model offers a more effective solution for identifying web command injection attacks.

Keywords: Web command injection, Deep learning, Attack detection

Subject terms: Computer science, Information technology, Software

Introduction

The rapid advancement of information science and technology has significantly elevated the importance of digitalization in society, creating numerous opportunities for progress. Among these emerging technologies, artificial intelligence has emerged as a transformative force, reshaping both daily life and production processes. However, these advancements bring with them a critical challenge: network security. As the adoption of Internet technologies expands, the urgency to address network security issues intensifies, with the consequences of security breaches becoming increasingly severe. Consequently, ensuring the security of web applications is now more crucial than ever.

In web applications, varying levels of security awareness among programmers, both across different companies and within development teams, often result in numerous vulnerabilities or security risks when handling user HTTP requests. Among these risks, malicious code injection is a prevalent attack method. Malicious code or commands are inserted into an inadequately secured web application, allowing the execution of unauthorized actions. Consequently, such attacks are recognized as a significant security threat. Notably, in the 2021 OWASP (Open Web Application Security Project) ranking1, code injection attacks are listed among the top ten network security risks. These vulnerabilities typically arise from improper handling of untrusted data, enabling attackers to introduce arbitrary code or commands into the application, leading to unintended execution. Code injection attacks encompass various forms, including command injection2, SQL injection3, cross-site scripting (XSS), XPath injection, and LDAP injection4.

As of July 2024, multiple web command injection vulnerabilities have drawn significant attention in the cybersecurity community. For example, on June 12, 2024, Montalbano5 reported that threat actors exploited a critical PHP command injection vulnerability, allowing remote code execution to target companies and individuals using Windows and Linux systems. Earlier, on May 14, 2024, Lakshmanan6 reported that the CACTI Maintenance Center had disclosed several web command injection vulnerabilities. These flaws permit authenticated users with “import template” permissions to execute arbitrary PHP code on the web server, enabling the injection of malicious commands for unauthorized operations. In another instance, on April 19, 2024, Cisco7 issued a security advisory warning of multiple vulnerabilities in the web-based management interface of its Integrated Management Controller (IMC). These vulnerabilities allow authenticated remote attackers with administrator-level privileges to perform command injection attacks, leading to potential privilege escalation to root level. This issue affects nearly 50 Cisco products, posing significant risks to enterprise networks and causing substantial harm. Additionally, on March 1, 2024, Lakshmanan8 reported that Ivanti had disclosed multiple security vulnerabilities affecting its products, four of which had been exploited by threat actors, including a command injection vulnerability in a web-based application.

This series of incidents demonstrates that web command injection attacks have become a prevalent method employed by malicious actors in the cybersecurity landscape. Many network devices are vulnerable to these attacks, posing significant threats across industries. These attacks not only compromise data security and integrity but also allow unauthorized remote access to inadequately secured web application systems. Moreover, attackers can execute various malicious actions on vulnerable systems, such as deleting data or disabling network security devices, ultimately gaining remote access and maintaining persistent control over enterprise networks. Despite the severity of these risks, limited research has been conducted on detecting web command injection attacks, and no studies have specifically focused on these attacks within web applications. Considering the potentially unpredictable consequences of successful web command injection attacks, this article aims to address this gap by focusing on their detection.

This paper proposes an end-to-end method for detecting web command injection attacks. This approach leverages the characteristics of web command injection traffic by utilizing dual convolutional channels for feature extraction and employing feature fusion for model learning.

The key contributions of this paper are as follows:

  • A comprehensive analysis of the strengths and weaknesses of existing research in web attack detection is provided. Based on a comparative study, a novel deep learning detection model is introduced, which discards traditional rule-based and semantic analysis methods. Instead, this model extracts malicious fields from attack packets and uses them as features for detection, enabling end-to-end detection without the need for manual feature extraction.

  • Web command injection traffic is analyzed, and the effectiveness of the attention mechanism in detecting these attacks is evaluated through ablation experiments. This feature weighting strategy integrates both global and local contextual relationships, enhancing the CNN’s ability to extract and integrate overall feature structure information.

  • Current state-of-the-art detection models are evaluated and tested across multiple well-known network security datasets. Through extensive experimental validation, the proposed method demonstrates superior performance compared to other approaches in the field and proves effective in detecting web command injection attacks in real-world Internet applications.

The remainder of this paper is organized as follows: In Sect. “Related work”, we present the current state of research in our research area. Web command injection attack analysis is discussed in Sect. "Web command injection attack". In Sect. “Our method”, we explain our approach in detail. In Sect. "Model evaluation and testing", we will perform a test evaluation of our approach and discuss the performance of our approach. In Sect. "Conclusion and discussion", we draw corresponding conclusions and discuss the next steps based on the conclusions.

Related work

In the field of web command injection attack detection, the body of literature remains relatively sparse. The research status and limitations of command injection attack detection across various studies are summarized in Table 1, highlighting the progression and gaps in this area of research.

Table 1.

Summary of related research.

Year and author Experiment Result Disadvantages
2019, Stasinopoulos et al.9 Conducted verification on a virtual platform, compared the effectiveness with other tools, and validated results in a real-world environment. Demonstrated strong performance across three comparative experiments. Manual detection required, lacking automation capabilities.
2019, Zolanvari et al.10 Manually extracted 23 features, which were integrated into an Intrusion Detection System (IDS) for detection using machine learning techniques. Achieved over 99% accuracy. The feature extraction process is cumbersome and relies on manual intervention.
2022, Gaber et al.11 Applied two feature selection techniques-constant removal and recursive feature elimination-and evaluated the performance using various machine learning classifiers, including Support Vector Machine (SVM), Random Forest, and Decision Tree. The decision tree classifier with 8 features achieved 99% accuracy, 95% precision, and a 90% F1 score. No specific focus on command injection attacks; primarily serves as a reference method.
2023, Yi et al.12 Reviewed the application of deep learning methodologies for network attack detection, detailing the fundamental processes involved. Effectiveness of deep learning in detecting attacks, including CNN and DNN, was compared. Command injection attacks were not addressed within the scope of the deep learning analysis.
2020, Ferrag et al.13 Analyzed and compared the detection performance of various deep learning models using a divided network intrusion detection dataset. Utilized CIC-IDS2018 and BOT-IOT datasets for comparative analysis. Focused on deep learning, but lacked specific comparison of models for attack detection.
2021, Odumuyiwa and Chibueze14 Implemented character embedding during feature extraction to enable the model to learn higher-level abstract features and capture relationships between request parameters. The average detection accuracy exceeded 90%. Detection accuracy for command injection attacks was only 89%; further improvement is necessary.
2022, Seyyar et al.15 Integrated Natural Language Processing (NLP) techniques, including the Bidirectional Encoder Representations from Transformers (BERT) model, with deep learning for enhanced detection capabilities. Achieved up to 99% accuracy. The pre-trained model demands substantial mcomputational resources and requires extensive training time.
2022, Zhang et al.16 Developed a specialized feature extraction method to aggregate features into a sparse matrix, subsequently constructing a deep neural network model. Achieved an accuracy rate of 96%. Accuracy still requires further improvement.
2022, Zhao et al.17 Proposed a novel sample generation method to address imbalanced training data, alongside introducing a more expressive feature fusion model. Detection accuracy reached approximately 99%. Manual extraction of discrete features required; lacks specific feature extraction for command injection attacks.
2023, Stiawan et al.18 Utilized a composite neural network model that combines Long Short-Term Memory (LSTM) networks and Principal Component Analysis (PCA) to detect malicious attacks. Achieved over 96% accuracy. Accuracy requires further enhancement.
2024, Liu and Dai19 Integrated BERT with LSTM networks to enhance the detection of SQL injection attacks. Achieved over 97% accuracy. High computational power required; accuracy still needs to be improved.
2024, Jimoh et al.20 Compared and summarized the efficacy of machine learning and deep learning models in attack detection, proposing a Convolutional Neural Network (CNN) for detecting malicious attacks. The highest accuracy reached 96%. Reliance on a single model for detection; overall detection effectiveness requires improvement.
2024, Babayigit et al.21 Leveraged multi-domain datasets to overcome the limitations associated with single dataset-based deep learning intrusion detection models. Detection accuracy reached approximately 97%. Did not include detection of web command injection attacks.

In 20199, Stasinovoulos et al. identified the significant impact of command injection attacks on websites, emphasizing that these threats had previously been underexplored by researchers. To address this gap, they introduced Commix, an open-source tool designed to automatically detect and exploit command injection vulnerabilities in web applications. Although Commix proved useful for penetration testers and security researchers, it lacked the capability for dynamic detection of web command injection attacks.

Traditional detection methods like Commix revealed several limitations, particularly due to their reliance on manual processes. To mitigate these challenges, researchers integrated artificial intelligence into detection strategies. For instance, in 201910, Zolanvari et al. combined machine learning with an Intrusion Detection System (IDS) for IoT security, manually extracting 23 features to train a model capable of detecting command injection attacks. Similarly, in 202211, Gaber et al. employed an intrusion detection method in IoT applications, utilizing constant removal and recursive feature elimination as feature selection techniques, which were then tested with SVM, random forest, and decision tree classifiers, yielding promising results.

Despite the effectiveness of these methods, they required substantial manual effort for feature extraction. To address this, deep learning was employed to automate feature extraction for injection attack detection. In 202312, Yi et al. reviewed the application of deep learning methods for network attack detection, outlining basic processes and evaluating models such as CNN and DNN. In 202013, Ferrag et al. explored the preprocessing stages of deep learning in network attack detection, comparing the performance of various deep learning models in the context of injection attacks. Further advancements were made by Odumuyiwa et al. in 202114, who developed a model combining convolutional and deep neural networks, utilizing character embedding techniques to detect HTTP injection attacks with approximately 89% accuracy. In 202215, Seyyar et al. integrated the pre-trained BERT model with machine learning methods, achieving positive results in injection attack detection. Similarly, in 202216, Zhang et al. employed a word embedding method to convert data into word vectors, which were processed through a sparse matrix for model training. Additionally, in 202217, Zhao et al. proposed a feature fusion model that combined deep learning-extracted features with manually extracted discrete features to identify command injection attacks. However, this approach still required significant manual effort and did not specifically target web command injection attacks. In 202318, Stiawan et al. applied an LSTM+PCA composite model to detect SQL injection and XSS attacks, achieving an accuracy of 94%. In 202419, Liu et al. used the pre-trained BERT model combined with an LSTM network to achieve 97% accuracy in detecting SQL injection traffic. Finally, in 202420, Jimoh et al. summarized the application of machine learning and deep learning in attack detection, proposing a CNN-based feature transformation method, though further improvement in accuracy was needed. Also in 202421, Babayigit et al. utilized multi-domain datasets in IoT to overcome the limitations of single dataset-based deep learning models, but their research did not address web-based command injection attacks.

Web command injection attack

Web command injection attacks occurred when an attacker injected malicious commands into the input data of a web application, thereby allowing the execution of arbitrary operating system commands on the application server. Applications that did not implement proper verification and filtering mechanisms were particularly vulnerable to these attacks. Attackers exploited various channels, including form submissions, cookies, and HTTP headers, to deliver malicious commands, which were subsequently executed with the application’s permission level. The primary risk associated with web command injection vulnerabilities arises from inadequate input validation and filtering mechanisms. The topology of web command injection attacks is depicted in Fig. 1.

Fig. 1.

Fig. 1

Web command injection attack topology.

In command injection attacks, the parameter section of an HTTP request is exploited by malicious actors to execute commands on the server. The location of the injected command within the attack is illustrated in Fig. 2, where the three parameters are concatenated using the “ &” symbol. Following concatenation, the parameters were appended to “Index.php” using the “?” symbol.

Fig. 2.

Fig. 2

Where web command injection attacks occur.

Web command injection attack categories

Web command injection attack with echo results

A command injection attack resulting in echoed output occurs when an attacker leverages common operators to execute malicious commands. By combining the original command with the injected command, the attacker can obtain the execution results.

For instance, consider a scenario in which a web application employs system call-related functions, such as exec, and accepts user input for an address parameter to execute the ping command. An attacker could exploit this by crafting a malicious command injection, thereby executing the attack.

http://localhost:8080/Index.php?address =127.0.0.1 &&ls

In the above example, the attacker uses logical operators “ &&” in the address parameter to stitch the address parameter and ls commands together. While the web application executes the ping command on the server side, the ls command will be executed. After the ls command is executed, the directory information obtained in the files is returned to the attacker, as shown on the right side of Fig. 1.

Web command injection attack without echo results

In a command injection attack where the results are disclosed, the attacker can assess the success of the injection by analyzing the application’s response. In contrast, in attacks where the results are concealed, the attacker cannot directly determine whether the injection was successful, as the application does not provide feedback on the injected command.

Nevertheless, even in the absence of direct feedback, attackers can employ indirect methods to gauge the success of their attack. For instance, the delay introduced by the sleep function can serve as an indicator, as the sleep function causes the system to pause operation temporarily, thereby signaling a potential successful injection.

http://localhost:8080/Index.php?address=127.0.0.1;str=$(echo WXYZ); str1=${#str}; if [ “4” != ${str1} ]; then sleep 0; else sleep 4

In this scenario, the attacker first creates a variable, str, containing a random string (e.g., WXYZ), and then assigns the length of this string to another variable, str1. A conditional statement is used to control the server’s response: if the value of str1 is not equal to 4, the command sleep 0 is executed, causing the server to respond immediately. If str1 equals 4, the command sleep 4 is executed, resulting in a 4-second delay before the server responds. By introducing these differing delay times, the attacker can observe the application’s response to infer the success of the injection.

If the injection was successful, the server’s response time increased by 4 seconds due to the execution of sleep 4. If the injection fails, the sleep 0 command will be executed, leading to a minimal response time.

Whether the command injection attack yields an echoed response or not, a malicious operator is required to concatenate the injected command with a system command. Table 2 provides commonly used operators and their corresponding symbols in command injection attacks. In this table, the strings cmd1 and cmd2 represent the command fragments to be executed.

Table 2.

Operator used in command injection attacks.

Symbol type Symbol Example usage
Redirector “<” , “>>” , “>” cmd1>file
Pipe “|” cmd1|cmd2
Linker “;” cmd1;cmd2
Logical operator “ &&” , “||” cmd1 &&cmd2
Replacement “”’, “$()” $(cmd)

Table 3 lists the commonly used system commands.

Table 3.

Web command injection system commands.

Keywords Example Functions
ping Ping 127.0.0.1 Send packet to target IP
exec Exec(code) Execute code segment
system System(command) Execute command
wget Wget URL Download target URL file
sleep Sleep 5 System sleeps for 5 seconds
cat Cat {file} View file
whoami Whoami View currently logged in users
ipconfig Ipconfig View network information

Our method

The overall workflow is divided into two primary phases: preprocessing and model recognition. During the preprocessing phase, the dataset is prepared and split into training and test sets. In the model recognition phase, the constructed model is trained and evaluated using these datasets. The overall structure of the model recognition process is illustrated in Fig. 3.

Fig. 3.

Fig. 3

General structure diagram.

Data preprocessing

Preprocessing involves the necessary operations and preparation of raw data for subsequent analysis, modeling, or other related tasks. Once the dataset for training is determined, preprocessing is essential to adapt the data to meet the input requirements of the model, ensuring that the model can learn useful patterns and information. This study preprocesses the data injected by received commands, dividing the steps into data extraction, data cleaning, and data segmentation, as depicted in Fig. 4.

Fig. 4.

Fig. 4

Data preprocessing.

Data extraction

Command injection data is extracted by analyzing the parameters within HTTP request packets. The detection process targets the parameter sections of each field in the HTTP data packets, which are subsequently reassembled into malicious traffic for further analysis. Python scripts are employed to extract and process the relevant information from the received data packets.

Data cleaning

  • Data decoding: Command injection attack code is often obfuscated using encoding techniques to evade detection. Before training, it is necessary to decode the input until it is in a form accepted by the target application, thereby deobfuscating the command injection code.

  • Data stream processing: Communication processes may include elements such as image transmission, file uploads, and video streaming. The body of a POST request often contains redundant stream data, which is irrelevant for model training and unnecessarily increases data complexity. Therefore, we first identify the data streams and replace them with strings according to their types, as detailed in Table 4.

  • Data normalization: To optimize model training, parameter names in the dataset are replaced with sequential alphabetic characters (e.g., a, b, c), as parameter names do not contribute to target identification. Additionally, the protocol component is removed, and the primary domain name is replaced with the placeholder string sample. These preprocessing steps ensure that the data is cleaned and streamlined, providing a more effective foundation for model training.

Table 4.

Data stream processing.

Data flow type MD5 SHA BASE64 Encryption Binary data
Replace string MD5-Str-Hash SHA-Str-Hash BASE64-Str-Hash Encryption-Str StreamBinary-Str

Data segmentation

Given the critical role of special symbols in command injection attacks, a statistical analysis was conducted to identify the symbols most frequently associated with these attacks. This analysis identified 23 high-frequency special symbols: [’-’, ’_’, ’.’, ’=’, ’/’, ’?’, ’ &’, ’:’, ’#’, ’+’, ’<’, ’>’, ’(’, ’) ’, ’%’, ’=’, ””, ’|’, ’$’, ’{’, ’}’, ’;’, ’,’]. These symbols were selected as delimiters for the data segmentation process.

Utilizing these special symbols as segmentation criteria during data preprocessing enhances the accuracy and thoroughness of the segmentation process. This method enables the extraction of more refined data, which in turn supports the generation of comprehensive vector representations, thereby optimizing the effectiveness of model training and subsequent analysis.

Model

Given the critical role of special symbols in command injection attacks, relying solely on Word2Vec embedding is insufficient to capture the essential information these symbols convey22. To address this limitation, we employ character embedding to preserve the effective information inherent in these special symbols.

Following data preprocessing, the segmented data are divided into two categories: words and symbols. For the word component, we utilize the Word2Vec algorithm to train each word in the vocabulary. However, due to the unique characteristics of command injection attack traffic, special symbols are crucial features for identifying such attacks. Relying solely on Word2Vec embedding would inadequately retain the key information contained in these symbols. Therefore, we implement a character embedding-based feature representation method for the symbol component. In this approach, each symbol is first replaced with an integer value corresponding to its sequence and then mapped into feature vectors. These vectors are subsequently stacked to form a matrix, effectively preserving the critical information necessary for detecting command injection attacks.

Feature extraction design

The proposed model structure for extracting command injection attack features consists of two main components: the input layer and the convolutional layer. The input layer is responsible for text embedding, while the convolutional layer is dedicated to feature extraction.

  • Input layer: We employ dual-channel embedding for text processing. The preprocessed input comprises n words and symbols, denoted as D={a1,a2,,an}, where each an (n=1,2,3,,n) represents a segmented word or symbol. Initially, we separate the words and symbols within D, dividing it into a word set W={w1,w2,,wp}, containing p words, and a symbol set C={c1,c2,,cq}, containing q symbols. For embedding, we apply two distinct methods: the word vector model from Jang et al.23 for the word set W, and a character mapping-based symbol vector for the symbol set C. These processes generate two embedding matrices: a word vector matrix and a symbol vector matrix. The resulting vector matrices are subsequently used as inputs to the convolutional layer.

  • Convolutional layer: The word and symbol vector matrices generated by the input layer are processed through two distinct CNN channels for feature extraction. CNN channel 1 based on word matrix: After the word set W={w1,w2,,wp} in the input is converted into a feature matrix through the input layer, we use a convolution kernel to extract the data in W. The sizes of the convolution kernels are 3, 4, and 5, and the data they extract are the feature data of unigrams, bigrams, and trigrams in W. For the jth convolution kernel, j{11,12,13}, we represent the weight matrix as UjRcj×mj, and the bias vector as bjRCj, where Cj represents the dimensionality size of the output, mj=hj×m, where m represents the dimensionality of word embedding, and the size of the combined word embedding feature is represented by hj. We use the proposed convolution kernel to perform a convolution extraction operation on W={w1,w2,,wp} to generate the output vector Sj=[s1j,s2j,,sjj], where Sj represents the feature vector extracted by the jth convolution kernel. The output vectors of the three convolution kernels (corresponding to j=11,12,13) are then concatenated, as expressed in Eq. 1:
    s1=[s11,s12,s13]s1RC11+C12+C13 1
    The generated s1 is used as the output vector of the extracted word matrix. CNN channel 2 based on symbolic matrix: After the symbol set C={c1,c2,,cq} in the input is converted into a feature matrix through the input layer, we use the same method as CNN channel 1 of the word matrix to extract the feature vector. We use the same model of convolution kernel on C={c1,c2,,cq} to extract the convolutional extraction, generating the output vector Sj=[s1j,s2j,,sjj]. Finally, we concatenate the output vectors of the three convolution kernels (corresponding to j=21,22,23), as expressed in Eq. 2:
    s2=[s21,s22,s23]s2RC21+C22+C23 2

Feature extraction structure

The complete architecture of our proposed dual CNN model is illustrated in Fig. 5. For preprocessed inputs, the output from the input layer is directed to the corresponding CNN channel. The output of the convolutional layer is computed using the formula pi=g(xiW+b) where b represents the bias and W denotes the weight matrix. The resulting data from the convolutional layer is subsequently passed through the max-pooling layer, which selects the maximum feature from the extracted convolutional outputs. Following the convolutional and fully connected layers, a merging layer concatenates the extracted word and symbol features. The output of this merging layer is a concatenated sequence derived from both CNN channels, integrating the convolutional and pooling features. This concatenated sequence represents the global vector, concluding the feature extraction process.

Fig. 5.

Fig. 5

Double convolution channel structure diagram.

Identify classification

The proposed architecture was designed to enhance classification accuracy by aggregating multiple features. Initially, the data underwent preprocessing and was subsequently passed through the input layer, where it was converted into an embedding vector matrix. The feature extraction layer subsequently captures command injection attack features from this vector matrix for modeling. The BiLSTM model captured contextual features by integrating both forward and backward hidden layers. An attention mechanism layer was incorporated to process these contextual features, enabling the model to focus on words and symbols associated with command injection keywords, thus improving the model’s understanding of sentence semantics. After processing through the attention layer, the data is fed into the softmax classifier.

Given the critical importance of keywords in sentence semantics, assigning differential weights to these keywords was deemed essential for precise sentence interpretation. The attention mechanism assigns varying weights to each keyword, enhancing the model’s ability to recognize and interpret the sentence’s meaning.

The BiLSTM network was employed as the recognition mechanism, effectively overcoming the limitations of traditional LSTM models and enhancing text classification performance by efficiently extracting local contextual information. BiLSTM processes both forward LSTM (denoted as LSTM) information, from Bc1 to Bcm (m represents the number of words in the text), and backward LSTM (denoted as LSTM) information, from Bcm to Bc1. By integrating information from both directions, the model constructs a comprehensive context for the sentence. The outputs of the BiLSTM network were expressed by Eqs. 3 and 4.

hf=LSTMBcn,n1,m 3
hf=LSTMBcn,nm,1 4

The annotation of a given feature sequence Bcn is obtained through the forward hidden state hf and the backward hidden state hb. The model derives the total hidden state hc by concatenating the forward hidden state hf and the backward hidden state hb, As shown in Eq. 5.

hc=[hf,hb] 5

The attention mechanism was specifically designed to focus on keyword features, thereby reducing the influence of non-keyword elements on the overall text semantics. It functions as a fully connected layer combined with a softmax function. The attention mechanism’s operation is described by the following equations:

uf=tanh(whc+b) 6
af=expufvfi=1Mexpufvf 7
Ac=afhc 8

As indicated by Eq. 6, the word annotation hc was initially processed through a perceptron layer, resulting in uf, which represents the hidden state of hc. Here, w and b represent the weights and biases in the neuron, and tanh(.) is the hyperbolic tangent function. The model uses uf and the word-level context vector vf to assess the importance of each word.

Equation 7 employs the softmax function to obtain the normalized weight af, where M is the number of words in the text, and exp(.) is the exponential function. The word-level context vector vf serves as a high-level representation of word information and is randomly initialized and jointly learned during the training process. The weighted sum of the forward reading word annotations is calculated based on the weight af to produce the forward context representation Ac.

Ac, represented by Eq. 8, is the output of the attention layer.

In our method, the Adam optimizer24 was employed to optimize the network’s loss function. The Adam optimizer fine-tunes the model parameters, and the loss function is shown in Eq. 9.

Ltotal=-1numi=1numj=1classesyijlnoij 9

Here, “classes” represents the number of categories, yij denotes the label of sample i for category j, and oij represents the model’s predicted probability that sample i belongs to category j.

The complete model process is detailed in Table 5, while the core pseudo-code is presented in Table 6.

Table 5.

Model process.

graphic file with name 41598_2024_74350_Tab5_HTML.jpg

Table 6.

Model core pseudo code.

graphic file with name 41598_2024_74350_Tab6_HTML.jpg

Model evaluation and testing

Dataset

The proposed method aims to detect malicious network activities within large-scale traffic data, with a particular focus on distinguishing web command injection attacks from other forms of malicious behavior. To validate the generalizability and robustness of our approach, we implemented cross-validation techniques across diverse datasets and performed evaluations on cross-domain datasets. The datasets employed in this research include:

  • Web command injection dataset: Positive samples within the Web Command Injection Dataset were collected from standard network traffic observed in an enterprise environment and supplemented by data from publicly accessible repositories. Negative samples were obtained from malicious traffic recorded during Capture The Flag (CTF) competitions, in addition to malicious attack data available on platforms such as Kaggle and GitHub.

  • HTTP CSIC 2010: The CSIC 2010 dataset25, developed by the Consejo Superior de Investigaciones Científicas (CSIC), is a comprehensive resource designed for research in network security intrusion detection. This dataset encapsulates real-world scenarios involving a variety of network attacks, including server-side exploits such as SQL injection, CRLF injection, and cross-site scripting (XSS). Each sample within the dataset is meticulously annotated with detailed request and response data, and malicious samples are clearly labeled with their corresponding attack types, thereby supporting supervised learning methodologies.

  • HTTPParamDataset: For cross-domain evaluation, we selected the open-source HTTPParamDataset26, a critical resource for studying the security of web applications. Through detailed analysis, this dataset enables researchers and developers to gain insights into attacker behavior patterns. It includes data on various malicious activities, such as SQL injection, XSS, and directory traversal attacks.

Following data preprocessing, we partitioned the dataset into training, validation, and test sets in a 3:1:1 ratio. The proportions of the three dataset categories are depicted in Fig. 6 (Categories with too few data in HTTPParamDataset are merged into Other Injection). The specific distribution of the total number of samples is presented in Table 7.

Fig. 6.

Fig. 6

Dataset ratio chart.

Table 7.

Sample distribution across different datasets and categories.

Dataset Category Total
Web command injection dataset Benign 236170
Command injection
Other injection
HTTP CSIC 2010 Benign 61000
Malware
Benign
SQL Injection
HttpParamsDataset Cross-Site scripting 31067
Command injection
Path traversal attacks

Experimental configuration

  • Laboratory equipment: We separately evaluated the impact of the feature extraction method and the attention-based attack detection and recognition model on the dataset. The experiments were conducted on a machine equipped with 16 vCPUs (Intel® Xeon® Platinum 8352V), 90 GB of RAM, and an NVIDIA RTX 4090 GPU with 24 GB of VRAM. The tests were performed using PyTorch 2.0 and Python 3.8.5 on Ubuntu 20.04.

  • Parameter settings: The model’s training and testing phases were conducted using the parameter configurations detailed in Table 8. The configuration parameters are designed to optimize the performance of the neural network model during these stages.

Table 8.

Model parameters.

Model parameters Value
Embedded dimensions 100
Number of BiLSTM layers 1
Hidden layer size 64
Batch size 256
Feature splicing ratio 1:1
Optimizer Adam
Classifier Softmax
Training epochs 30

Analysis of results

This section presents a comprehensive series of experiments aimed at validating the effectiveness of the proposed model. The experimental framework is structured as follows:

  • Comparative analysis: We conducted a comparative analysis between our model and existing deep learning-based injection attack detection models across multiple datasets to demonstrate the advantages of our approach.

  • Ablation studies: We performed ablation studies to assess the contribution of our proposed feature extraction method and the integration of the attention mechanism. These studies evaluate the impact of each component on the overall performance of the model.

  • Cross-Domain evaluation: Cross-domain experiments were conducted to evaluate the generalization and robustness of the model on datasets containing non-command injection attacks.

Comparative analysis

In this section, we presented a comparative analysis of our proposed method for detecting command injection attacks. The process began with the extraction of keyword and symbol features from the command injection data, which were then fused in a merging layer. These fused features were subsequently input into a neural network model incorporating an attention mechanism for classification. We benchmarked our approach against recent leading research in the field. The comparative results are illustrated in Fig. 7, while the training process’s accuracy and loss dynamics are depicted in Fig. 8.

Fig. 7.

Fig. 7

Comparative experimental evaluation.

Fig. 8.

Fig. 8

Training process evaluation diagram.

The models used for comparison are divided into two categories: single models and hybrid models. Among the single models, the approach by Odumuyiwa 14, developed in 2020, which employs character embedding and CNN, achieved an accuracy of 93.20%. Similarly, the method by Zhang 16, proposed in 2022, which aggregates features into sparse matrices, attained an accuracy of 93.16%. These models, however, exhibit lower recognition rates compared to others and demonstrate limited feature generalization capabilities.

In the hybrid models category, Zhao’s17 approach in 2022, which integrated convolutional and memory networks through feature fusion, achieved an accuracy of 97.23%. Stiawan18, in 2023, proposed the LSTM+PCA model, which demonstrated good detection performance. Furthermore, Seyyar 15, in 2022, and Liu 19, in 2024, employed Bert-Transformer architectures combined with other networks, achieving accuracies of 98.01% and 96.2%, respectively. These findings indicate that composite neural network structures offer significant advantages in detecting web command injection attacks.

In 2024, Jimoh 20 conducted a comprehensive review of injection attack detection using deep learning methods and compared traditional machine learning techniques (e.g., decision tree, SVM, random forest, naive Bayes, and logistic regression) with deep learning approaches (e.g., LSTM, GRU, CNN). In our model, we utilized BiLSTM instead of LSTM for feature recognition, concatenating input features in both forward and reverse directions. Our model achieved an accuracy of 99.21%, outperforming other models in the comparison. Although our method is slightly outperformed by the pre-trained model proposed by Seyyar on the CSIC 2010 dataset, the pre-trained model demands extensive pre-training time and computational resources. Overall, our model demonstrates a clear advantage in terms of accuracy and efficiency.

Ablation studies

To validate the effectiveness of the proposed attention mechanism in our model, we conducted an ablation study. The variations in loss rate and accuracy during the training process are presented in Fig. 9. Model (1) represents a model without an attention mechanism, while Model (2) represents a model incorporating an attention mechanism.

Fig. 9.

Fig. 9

Ablation experiment comparison chart.

As depicted in Fig. 9, the integration of the attention mechanism significantly improves model accuracy, accelerates convergence, and reduces training time. Additionally, we compared the confusion matrices of the models with and without the attention mechanism on the test set, as shown in Fig. 10.

Fig. 10.

Fig. 10

Confusion matrix comparison chart.

The confusion matrix results indicate that the model incorporating the attention mechanism correctly predicts a higher number of samples and exhibits fewer false positives compared to the model without the attention mechanism.

These experimental results demonstrate that the attention mechanism effectively mitigates the influence of noise by assigning greater weight to key features, thereby enhancing the model’s overall performance. This confirms that the addition of the attention mechanism substantially improves the model’s recognition and classification capabilities.

Cross-domain evaluation

To evaluate the generalization capability of our trained model, we conducted cross-domain evaluation experiments using cross-domain datasets and employed methods for comparative analysis.

  • Cross-Dataset evaluation: We tested our trained model on the HTTPParamDataset and compared its performance with other methods. The experimental results are presented in Table 9. The results demonstrate that our method effectively detects SQL injection and XSS injection attacks, achieving an accuracy of 99.51% on the cross-domain dataset. It is notable that the methods proposed by Seyyar 14 and Liu 15, both of which incorporate transformer architectures, also perform well. The experimental data further confirm that composite network structures are highly effective in identifying malicious network attacks, enabling the classification of malicious traffic through comprehensive feature aggregation.

  • Cross-Domain method evaluation: To assess the performance advantages of our method, we conducted cross-domain method experiments, comparing it with state-of-the-art attack detection methods from other fields. Specifically, we statistically compared advanced machine learning models with graph neural network (GNN) models, and the results are shown in Table 10. The experimental data reveal that in 2020, Tang P.27 employed machine learning techniques to detect SQL injection attacks, achieving an accuracy of 97.75%. In 2023, Crespo-Martínez28 introduced a lightweight protocol to examine effective features in SQL injection data streams, attaining an accuracy of 96.4%. Both methods, however, are slightly less accurate than our approach and require manual feature extraction, making them less suitable for real-time detection in practical scenarios. In recent years, GNNs have made significant progress in vulnerability detection. We reviewed recent advances in GNN-based attack detection. In 2016, Kar29 utilized feature graphs combined with SVM algorithms to detect SQL injection attacks, achieving an accuracy of 95.12%. In 2022, Liu30 proposed using graph convolutional neural networks (GCNs) to detect XSS attacks, achieving a detection rate of 99.2% on web command injection dataset. In 2024, Wang31 introduced a novel GCN-based method for detecting XSS attacks, leveraging correlations within the graph neural network to enhance detection performance, resulting in a detection rate of 98.9% on web command injection dataset. The experimental data indicate that GNNs are effective in detecting command injection attacks, with detection performance comparable to our method. However, given the higher computational power and time required by GNNs, our approach proves to be more practical for real-time command injection attack detection.

Table 9.

Cross-Dataset evaluation.

Year and author Acc (%) Pre (%) Rec (%) F1 (%)
2020 and Odumuyiwa14 97.36 99.01 97.02 98.04
2022 and Seyyar15 99.21 99.01 98.52 98.69
2022 and Zhang16 95.24 96.21 93.42 94.23
2023 and Zhao17 97.02 95.31 94.28 95.45
2023 and Stiawan18 96.05 94.96 95.12 94.46
2024 and Liu19 97.89 97.73 97.01 96.35
2024 and Jimoh20 97.32 97.73 96.95 96.82
Our model 99.51 99.23 99.45 99.02
Table 10.

Cross-Domain method evaluation.

Year and author Acc (%) Pre (%) Rec (%) F1 (%)
2020 and Tang27 97.75 96.27 96.88 95.42
2023 and Crespo-Martínez28 96.42 96.23 95.82 96.82
2016 and Kar29 95.12 94.31 93.46 94.64
2022 and Liu30 99.23 98.91 98.64 96.51
2024 and Wang31 98.91 98.22 98.11 95.32
Our model 99.21 99.13 99.24 98.91

Conclusion and discussion

In this study, we investigated web command injection attacks and applied various hybrid deep learning models to detect these attacks within web applications. We developed novel models specifically tailored for command injection detection and conducted experiments to assess the impact of attention mechanisms on model performance. Our models demonstrated strong results across multiple datasets, achieving accuracy rates exceeding 98%. In comparison with existing models for detecting web command injection attacks, our proposed model consistently outperformed other approaches.

Considering the increasingly complex landscape of network security, future work will focus on developing more comprehensive detection models to further enhance accuracy and robustness. Additionally, we plan to extend the applicability of our model to other security-related tasks, such as identifying malicious URL traffic, detecting phishing attempts, malware detection, and combating botnet attacks.

Artificial intelligence has played a significant role in advancing network security by offering robust tools for protecting web applications and accurately identifying malicious activities. By integrating diverse data types and leveraging state-of-the-art machine learning techniques, AI models have effectively detected and mitigated a wide range of network threats. Moving forward, we will aim to refine our models to reduce training time and improve recognition efficiency. Future research will also prioritize the detection of various attack types, including zero-day attacks, flood attacks, DDoS attacks, and malware attacks, to enhance the model’s versatility and effectiveness.

Acknowledgements

We would like to thank the editor and anonymous reviewers for taking the time to review this article and providing valuable comments. The authors would like to thank the researchers from Harbin University of Science and Technology for supporting this work with the National Natural Science Foundation of China (No. 61402126, 61602133), the Key Project of Research on Teaching Reform of Higher Education in Heilongjiang Province (No. SJGZ20220086), the Higher Education Research Project of Heilongjiang Higher Education Society (No. 23GJYBB080), and Harbin University of Science and Technology. The authors would like to thank the corresponding author from Harbin University of Science and Technology for their support.

Author contributions

X.W. proposed the method, designed the experimental steps, verified the experimental results, and wrote the main content of the manuscript. J.Z. provided the research direction and experimental equipment, summarized the experiments, and summarized and revised the manuscript. All authors participated in review of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 61402126, 61602133), Harbin Institute of Technology University Researcher Support Project, Heilongjiang Provincial Key Research Project on Higher Education Teaching Reform (No. SJGZ20220086), the corresponding author, and all authors.

Data availability

The research content and research data are both displayed in the article. Due to the offensive nature of the network attack traffic dataset, the data set is not directly disclosed in the article. The datasets used and analyzed in the current study are available from the first and corresponding author on reasonable request.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Owasp top 10:2021 (2021). [Online]. https://owasp.org/Top10/.
  • 2.Command injection. [Online]. https://owasp.org/www-community/attacks/Command_Injection.
  • 3.Tadhani, J. R., Vekariya, V., Sorathiya, V., Alshathri, S. & El-Shafai, W. Securing web applications against xss and sqli attacks using a novel deep learning approach. Sci. Rep.14, 1803 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.A03:2021 - injection. [Online]. https://owasp.org/Top10/A03_2021-Injection/.
  • 5.Montalbano, E. Tellyouthepass ransomware group exploits critical php flaw (2024). [Online]. https://www.darkreading.com/vulnerabilities-threats/tellyouthepass-ransomware-exploits-critical-php-flaw/.
  • 6.Lakshmanan, R. Critical flaws in cacti framework could let attackers execute malicious code (2024). [Online]. https://thehackernews.com/2024/05/critical-flaws-in-cacti-framework-could.html/.
  • 7.Advisory, C. S. Integris health says data breach impacts 2.4 million patients (2024). [Online]. https://www.bleepingcomputer.com/news/security/integris-health-says-data-breach-impacts-24-million-patients/.
  • 8.Lakshmanan, R. Five eyes agencies warn of active exploitation of ivanti gateway vulnerabilities (2024). [Online]. https://www.bleepingcomputer.com/news/security/integris-health-says-data-breach-impacts-24-million-patients/.
  • 9.Stasinopoulos, A., Ntantogian, C. & Xenakis, C. Commix: automating evaluation and exploitation of command injection vulnerabilities in web applications. Int. J. Inf. Secur.18, 49–72 (2019). [Google Scholar]
  • 10.Zolanvari, M., Teixeira, M. A., Gupta, L., Khan, K. M. & Jain, R. Machine learning-based network vulnerability analysis of industrial internet of things. IEEE Internet Things J.6, 6822–6834 (2019). [Google Scholar]
  • 11.Gaber, T., El-Ghamry, A. & Hassanien, A. E. Injection attack detection using machine learning for smart iot applications. Phys. Commun.52, 101685 (2022). [Google Scholar]
  • 12.Yi, T., Chen, X., Zhu, Y., Ge, W. & Han, Z. Review on the application of deep learning in network attack detection. J. Netw. Comput. Appl.212, 103580 (2023). [Google Scholar]
  • 13.Ferrag, M. A., Maglaras, L., Moschoyiannis, S. & Janicke, H. Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study. J. Inf. Secur. Appl.50, 102419 (2020). [Google Scholar]
  • 14.Odumuyiwa, V. & Chibueze, A. Automatic detection of http injection attacks using convolutional neural network and deep neural network. J. Cyber Secur. Mobility 489–514 (2020).
  • 15.Seyyar, Y. E., Yavuz, A. G. & Ünver, H. M. An attack detection framework based on bert and deep learning. IEEE Access10, 68633–68644 (2022). [Google Scholar]
  • 16.Zhang, W. et al. Deep neural network-based sql injection detection method. Secur. Commun. Netw.2022, 4836289 (2022). [Google Scholar]
  • 17.Zhao, C., Si, S., Tu, T., Shi, Y. & Qin, S. Deep-learning based injection attacks detection method for http. Mathematics10, 2914 (2022). [Google Scholar]
  • 18.Stiawan, D. et al. An improved lstm-pca ensemble classifier for sql injection and xss attack detection. Comput. Syst. Sci. Eng. 46 (2023).
  • 19.Liu, Y. & Dai, Y. Deep learning in cybersecurity: A hybrid bert-lstm network for sql injection attack detection. IET Inf. Secur.2024, 5565950 (2024). [Google Scholar]
  • 20.Jimoh, A., Ahmed, M. K., Salihu, S., Mod, B. & Salihu, M. N. Enhancing web security through comprehensive evaluation of sql injection detection models. Development23, 25 (2024). [Google Scholar]
  • 21.Babayigit, B. & Abubaker, M. Towards a generalized hybrid deep learning model with optimized hyperparameters for malicious traffic detection in the industrial internet of things. Eng. Appl. Artif. Intell.128, 107515. 10.1016/j.engappai.2023.107515 (2024). [Google Scholar]
  • 22.Ji, S., Satish, N., Li, S. & Dubey, P. K. Parallelizing word2vec in shared and distributed memory. IEEE Trans. Parallel Distrib. Syst.30, 2090–2100 (2019). [Google Scholar]
  • 23.Jang, B., Kim, M., Harerimana, G., Kang, S.-U. & Kim, J. W. Bi-lstm model to increase accuracy in text classification: Combining word2vec cnn and attention mechanism. Appl. Sci.10, 5841 (2020). [Google Scholar]
  • 24.Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
  • 25.Council, S. R. N. Http dataset csic 2010 (2010). [Online]. https://www.tic.itefi.csic.es/dataset/.
  • 26.Morzeux. Httpparamsdataset (2020). [Online]. https://github.com/Morzeux/HttpParamsDataset/.
  • 27.Tang, P., Qiu, W., Huang, Z., Lian, H. & Liu, G. Detection of sql injection based on artificial neural network. Knowl.-Based Syst.190, 105528 (2020). [Google Scholar]
  • 28.Crespo-Martínez, I. S. et al. Sql injection attack detection in network flow data. Comput. Secur.127, 103093 (2023). [Google Scholar]
  • 29.Kar, D., Panigrahi, S. & Sundararajan, S. Sqligot: Detecting sql injection attacks using graph of tokens and svm. Comput. Secur.60, 206–225 (2016). [Google Scholar]
  • 30.Liu, Z., Fang, Y., Huang, C. & Han, J. Graphxss: An efficient xss payload detection approach based on graph convolutional network. Comput. Secur.114, 102597 (2022). [Google Scholar]
  • 31.Wang, Q. et al. Igxss: Xss payload detection model based on inductive gcn. Int. J. Network Manage. e2264 (2024).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The research content and research data are both displayed in the article. Due to the offensive nature of the network attack traffic dataset, the data set is not directly disclosed in the article. The datasets used and analyzed in the current study are available from the first and corresponding author on reasonable request.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES