ReinforSec: An Automatic Generator of Synthetic Malware Samples and Denial-of-Service Attacks through Reinforcement Learning

. 2023 Jan 20;23(3):1231. doi: 10.3390/s23031231

Algorithm 1 Learning process, out-of-policy

Require:

Q (s_{t}, A_{t})

\forall s_{t} \in S, \forall A_{t} \in A

arbitrarilty, and

Q (t e r m i n a l s t a t e, \cdot) = 0

for each

s_{t}

do
Initialize agent a with sates s at time

t + 1

for each

s_{t + 1}

do
Choose A from S using

Λ

derived from Q
Take action

A_{t}

, observe R,

s_{t + 1}

Q (s_{t}, A_{t}) \leftarrow Q (s_{t}, A_{t}) + α [R (s_{t + 1}, A_{t + 1}) + Φ \max Q (s_{t + 1} + A_{t + 1}) - Q (s_{t}, A_{t})]

s_{t} \leftarrow s_{t + 1}

end for
until

s_{t}

is terminal, hence the PE is fully mutated
end for