Deep Reinforcement Learning Based Decision Making for Complex Jamming Waveforms

. 2022 Oct 10;24(10):1441. doi: 10.3390/e24101441

Algorithm 1: The proposed improved soft actor-critic algorithm.

Initialization: Randomly initialize the parameters of the policy network and the two

Q networks. Set the experience reply D with size of 100,000.

Input: The current communication parameters of the communication parties and the

current jamming action of the jammer

1: for episode i = 1, 2, …, J do

2: for step j = 1, 2, …, N do

3: According to the state,

s_{t}

, input to the policy network sampling output

action,

a_{t}

;

4: The proto-action,

a_{t}

, is input to the improved Wolpertinger architecture to

obtain the actual executed action,

a_{t}

;

5: Executing action

a_{t}

;

6: Obtaining the next state,

s_{t + 1}

, and feedback and calculating the actual

reward,

r

;

7: Storing (

s_{t}

a_{t}

s_{t + 1}

r

) into experience pool D;

8: Sampling the smallest batch,

N_{B}

, from experience pool D for training;

9: Updating network parameters A and B for Q₁ and Q₂;

10: Updating the parameters of the policy network;

11: Updating the parameters of the target Q₁ and target Q₂ networks;

12: Setting

s_{t} = s_{t + 1}

;

13: end for

14: end for

Output: Jamming action for the communication parties at the next moment