Skip to main content
. 2025 Jan 2;25(1):211. doi: 10.3390/s25010211

Table 12.

Comparison among RL, GAs, and BERT QA RL + RS in a penetration testing scenario aligned with NIST SP 800-115.

Criterion RL GAs BERT QA RL + RS
Nature of Environment Dynamic, sequential, with rewards tied to actions Nonsequential, evaluating solution populations without temporal feedback Dynamic and sequential; integrates RL rewards and BERT’s semantic context
Continuous Adaptation Adjusts its policy as the environment evolves (new ports, vulnerabilities) Difficult; changes require new populations and generations without guaranteed rapid adaptation Iterative adjustment: RL adapts to novel findings, BERT recalibrates responses, incorporating new Q,C,A
Contextual Information Can leverage structured information (states, rewards) but limited semantic comprehension No semantic understanding; only evaluates solution fitness without linguistic context Incorporates BERT’s contextual comprehension, correlating vulnerability descriptions (CVE/CWE) with NIST methodology
Alignment with NIST SP 800-115 RL can implement the cycle (reconnaissance, identification, exploitation) by maximizing rewards at each phase No natural integration with these phases. GAs optimize a fitness function, lacking a sequential flow suited to recommended stages Aligns with phases (planning, reconnaissance, vulnerability assessment, exploitation, reporting), leveraging RL and BERT’s semantics
Scalability Scalable, though potentially requires more computation as complexity increases Scalable in exploration, but lacks a mechanism guiding adaptive policy changes over time Scalable; each RL insight is integrated by BERT, facilitating the reuse and expansion of the knowledge base
Final Outcome An optimal (or near-optimal) policy guiding sequential pentesting actions A set of candidate solutions without guaranteeing dynamic adaptation or contextual integration A dynamic policy, informed by semantic context and aligned with NIST guidelines, optimizing tests and leveraging cumulative learning