. 2025 Jan 2;25(1):211. doi: 10.3390/s25010211

Table 9.

Comparison of BERT QA RL + RS with state-of-the-art approaches in terms of data ingestion methods, supported environments, feature coverage, and scalability to complex scenarios.

Work	Data Ingestion Method	Environment Supported	Network Services	Web Vulnerabilities	Misconfiguration Scenarios	Scalability to Complex Scenarios
Yi, J. and Liu, X. [9]	Leverages MulVAL attack graphs and predefined vulnerabilities.	Simulated networks with hosts and subnets.	🗸			Capable of scaling to subnet-based configurations but limited by fixed graph structures.
Hamidi, M., et al. [10]	Connects with tools like Metasploit, SQLmap, and Weevely via APIs.	Controlled setups with predefined exploitation paths.	🗸	🗸		Limited adaptability due to predefined tools and static environments.
Ghanem, M. and Chen, T. [15]	Analyzes penetration testing expert behavior using logs from servers, databases, and routing devices.	Simulated environments with predefined vulnerability paths.	🗸			Limited due to static and predefined scenarios.
Ghanem, M. and Chen, T. [16]	Processes state and action spaces with probabilistic representations of devices and networks.	Networks with devices modeled probabilistically for vulnerabilities.	🗸		🗸	Constrained by reliance on probabilistic state-space representations.
Zennaro, F., et al. [17]	Uses Q-learning to train agents in Capture the Flag scenarios.	Simplified scenarios with predefined port vulnerabilities.	🗸			Restricted to predefined attack paths and ports.
Chaudhary, S. et al. [18]	Employs DT scripts and Python-based log analysis for vulnerability identification.	Focused on file exploitation in predefined Windows and Linux environments.			🗸	Restricted to static environments, without provisions for scalability or dynamic updates.
Nhu, N., et al. [19]	Employs Docker-based environments for training reinforcement learning agents.	Dockerized setups with a variety of CVEs.	🗸	🗸		Scales moderately well but lacks contextual processing for extrapolation.
Schwartz, J. and Kurniawati, H. [20]	Focuses on Metasploit-based testing for FTP vulnerabilities.	Single-port FTP exploitation scenarios.	🗸			Minimal scalability beyond basic vulnerability testing.
Tran, K., et al. [22]	Implements Cascaded Reinforcement Learning Agents for discrete action spaces.	Simulated networks with multiple subnets and hosts.	🗸		🗸	Highly scalable in subnet-based scenarios but less effective in dynamic configurations.
Nguyen, H., et al. [21]	Implements action spaces using Metasploit modules for scanning, exploitation, and PEsc.	Simulations with connected hosts and service vulnerabilities like CVE-2021-41773 and CVE-2015-3306.	🗸			Limited to predefined Metasploit actions and lacks dynamic adaptability to emerging or IoT environments.
Ying, W. et al. [23]	Analyzes and filters CVE data with NLP techniques for event extraction, covering vulnerabilities from 1999 to 2021.	Employs a database of 4638 vulnerabilities from CVE with detailed categorization of 16 CWE types.		🗸		Limited to textual analysis and lacks integration with reinforcement learning or adaptive exploration.
BERT QA RL + RS (This proposal)	Combines BERT’s contextual processing with reinforcement learning for adaptive exploration, integrating real-time data updates for dynamic environments.	Supports diverse configurations, including interconnected services, AB weaknesses, CFs, and real-world scenarios.	🗸	🗸	🗸	Highly scalable due to its modular design, contextual adaptability, and ability to generalize policies across complex environments like cloud and IoT systems.