Analysis of Autonomous Penetration Testing Through Reinforcement Learning and Recommender Systems

. 2025 Jan 2;25(1):211. doi: 10.3390/s25010211

Algorithm 1 Q-Learning for

V_{R}

V_{I}

, and

V_{E}

tests.

1:
Initialization:
2:
Initialize the Q-table $Q (s, a)$ with zeros for all state-action pairs $(s, a)$
3:
Set the learning rate $α$ , discount factor $γ$ , and the parameter $ϵ$ for the policy $ϕ_{ϵ}$
4:
for each episode do
5:
Initialize the state s with the initial configuration of $D$ . In the initial state s, $A$ performs a reconnaissance process of all available virtual machines
6:
while the state s is not terminal do
7:
Select the action a based on the policy $ϕ_{ϵ}$
8:
Execute action a, observe reward r and the new state $s^{'}$
9:
if a pertains to $V_{R}$ then
10:
Perform reconnaissance using Nmap
11:
else if a pertains to Vulnerability $V_{I}$ then
12:
Conduct vulnerability identification using Nmap Vulners
13:
else if a pertains to $V_{E}$ then
14:
Conduct exploitation using Metasploit
15:
end if
16:
Select $a^{'}$ as the action that maximizes $Q (s^{'}, a^{'})$
17:
Update $Q (s, a)$ using the Bellman equation:
$Q (s, a) \leftarrow Q (s, a) + α [r + γ Q (s^{'}, a^{'}) - Q (s, a)]$
18:
Update the state $s \leftarrow s^{'}$
19:
end while
20:
end for