-
1:
Load droneRF dataset
-
2:
Define input features and target variables
-
3:
Apply filtering and smoothing techniques
-
4:
Encode the target with respect to the input features
-
5:
Split the dataset into training (70%) and testing (30%) sets (trainSet, testSet)
-
6:
Create an instance from the environment class (Env)
-
7:
Create an instance from the Agent class (agent: Policy 1, Policy 2, Policy 3, Policy 4)
-
8:
Train each policy on trainset using Hierarchical RL procedure
-
9:
Test each policy on testSet and evaluate the system
-
10:
procedure Hierarchical_RL(dataset, agent):
-
11:
for episode in 1…N do
-
12:
Reset Env variable to its original state
-
13:
for channel in 1…10 do
-
14:
Sample will pass ‘Classifier 1’ to determine the presence of UAV (2 classes: 0-No UAV, 1-UAV)
-
15:
Generate action using Policy 1 of the agent.
-
16:
if action == 0 then
-
17:
End and save the predicted value (predClass = C1)
-
18:
else
-
19:
Sample will pass ‘Classifier 2’ to determine the UAV model (3 classes: 0-Bebop, 1-AR, 2-Phantom3)
-
20:
Generate action using Policy 2 of the agent.
-
21:
if action == 2 then
-
22:
End and save predicted value: Phantom3 UAV (predClass = C10)
-
23:
else if action == 0 then
-
24:
Sample will pass ‘Classifier 3’ to determine the mode of the Parrot Bebop (4 classes: 0-ON (C2), 1-Hovering (C3), 2-Flying (C4), 3-Recording (C5))
-
25:
Generate action using Policy 3 of the agent.
-
26:
if action == 0 then
-
27:
End and save predicted value: Bebop, ON mode (predClass = C2)
-
28:
else if action == 1 then
-
29:
End and save predicted value: Bebop, Hovering mode (predClass = C3)
-
30:
else if action == 2 then
-
31:
End and save predicted value: Bebop, Flying mode (predClass = C4)
-
32:
else
-
33:
End and save predicted value: Bebop, Recording mode (predClass = C5)
-
34:
end if
-
35:
else if action == 1 then
-
36:
Sample will pass ‘Classifier 4’ to determine the mode of the Parrot AR (4 classes: 0-ON (C6), 1-Hovering (C7), 2-Flying (C8), 3-Recording (C9))
-
37:
Generate action using Policy 4 of the agent.
-
38:
if action == 0 then
-
39:
End and save predicted value: AR, ON mode (predClass = C6)
-
40:
else if action == 1 then
-
41:
End and save predicted value: AR, Hovering mode (predClass = C7)
-
42:
else if action == 2 then
-
43:
End and save predicted value: AR, Flying mode (predClass = C8)
-
44:
else
-
45:
End and save predicted value: AR, Recording mode (predClass = C9)
-
46:
end if
-
47:
end if
-
48:
end if
-
49:
if action == true_label then
-
50:
reward = 1
-
51:
end if
-
52:
Save |E| trajectory: (state S, action A, reward R)
-
53:
end for
-
54:
Apply REINFORCE to update the policies of the agent using |E| trajectories
-
55:
end for
-
56:
end procedure