UAV Detection Using Reinforcement Learning

View full-text article in PMC

. 2024 Mar 14;24(6):1870. doi: 10.3390/s24061870

Algorithm 2 Hierarchical Reinforcement Learning (HRL)

1:
Load droneRF dataset
2:
Define input features and target variables
3:
Apply filtering and smoothing techniques
4:
Encode the target with respect to the input features
5:
Split the dataset into training (70%) and testing (30%) sets (trainSet, testSet)
6:
Create an instance from the environment class (Env)
7:
Create an instance from the Agent class (agent: Policy 1, Policy 2, Policy 3, Policy 4)
8:
Train each policy on trainset using Hierarchical RL procedure
9:
Test each policy on testSet and evaluate the system
10:
procedure Hierarchical_RL(dataset, agent):
11:
for episode in 1…N do
12:
Reset Env variable to its original state
13:
for channel in 1…10 do
14:
Sample will pass ‘Classifier 1’ to determine the presence of UAV (2 classes: 0-No UAV, 1-UAV)
15:
Generate action using Policy 1 of the agent.
16:
if action == 0 then
17:
End and save the predicted value (predClass = C1)
18:
else
19:
Sample will pass ‘Classifier 2’ to determine the UAV model (3 classes: 0-Bebop, 1-AR, 2-Phantom3)
20:
Generate action using Policy 2 of the agent.
21:
if action == 2 then
22:
End and save predicted value: Phantom3 UAV (predClass = C10)
23:
else if action == 0 then
24:
Sample will pass ‘Classifier 3’ to determine the mode of the Parrot Bebop (4 classes: 0-ON (C2), 1-Hovering (C3), 2-Flying (C4), 3-Recording (C5))
25:
Generate action using Policy 3 of the agent.
26:
if action == 0 then
27:
End and save predicted value: Bebop, ON mode (predClass = C2)
28:
else if action == 1 then
29:
End and save predicted value: Bebop, Hovering mode (predClass = C3)
30:
else if action == 2 then
31:
End and save predicted value: Bebop, Flying mode (predClass = C4)
32:
else
33:
End and save predicted value: Bebop, Recording mode (predClass = C5)
34:
end if
35:
else if action == 1 then
36:
Sample will pass ‘Classifier 4’ to determine the mode of the Parrot AR (4 classes: 0-ON (C6), 1-Hovering (C7), 2-Flying (C8), 3-Recording (C9))
37:
Generate action using Policy 4 of the agent.
38:
if action == 0 then
39:
End and save predicted value: AR, ON mode (predClass = C6)
40:
else if action == 1 then
41:
End and save predicted value: AR, Hovering mode (predClass = C7)
42:
else if action == 2 then
43:
End and save predicted value: AR, Flying mode (predClass = C8)
44:
else
45:
End and save predicted value: AR, Recording mode (predClass = C9)
46:
end if
47:
end if
48:
end if
49:
if action == true_label then
50:
reward = 1
51:
end if
52:
Save |E| trajectory: (state S, action A, reward R)
53:
end for
54:
Apply REINFORCE to update the policies of the agent using |E| trajectories
55:
end for
56:
end procedure