Skip to main content
. 2024 Mar 14;24(6):1870. doi: 10.3390/s24061870
Algorithm 2 Hierarchical Reinforcement Learning (HRL)
  •  1:

    Load droneRF dataset

  •  2:

    Define input features and target variables

  •  3:

    Apply filtering and smoothing techniques

  •  4:

    Encode the target with respect to the input features

  •  5:

    Split the dataset into training (70%) and testing (30%) sets (trainSet, testSet)

  •  6:

    Create an instance from the environment class (Env)

  •  7:

    Create an instance from the Agent class (agent: Policy 1, Policy 2, Policy 3, Policy 4)

  •  8:

    Train each policy on trainset using Hierarchical RL procedure

  •  9:

    Test each policy on testSet and evaluate the system

  • 10:

    procedure Hierarchical_RL(dataset, agent):

  • 11:

    for episode in 1…N do

  • 12:

      Reset Env variable to its original state

  • 13:

      for channel in 1…10 do

  • 14:

        Sample will pass ‘Classifier 1’ to determine the presence of UAV (2 classes: 0-No UAV, 1-UAV)

  • 15:

        Generate action using Policy 1 of the agent.

  • 16:

        if action == 0 then

  • 17:

         End and save the predicted value (predClass = C1)

  • 18:

        else

  • 19:

         Sample will pass ‘Classifier 2’ to determine the UAV model (3 classes: 0-Bebop, 1-AR, 2-Phantom3)

  • 20:

         Generate action using Policy 2 of the agent.

  • 21:

         if action == 2 then

  • 22:

           End and save predicted value: Phantom3 UAV (predClass = C10)

  • 23:

         else if action == 0 then

  • 24:

           Sample will pass ‘Classifier 3’ to determine the mode of the Parrot Bebop (4 classes: 0-ON (C2), 1-Hovering (C3), 2-Flying (C4), 3-Recording (C5))

  • 25:

           Generate action using Policy 3 of the agent.

  • 26:

           if action == 0 then

  • 27:

             End and save predicted value: Bebop, ON mode (predClass = C2)

  • 28:

           else if action == 1 then

  • 29:

             End and save predicted value: Bebop, Hovering mode (predClass = C3)

  • 30:

           else if action == 2 then

  • 31:

             End and save predicted value: Bebop, Flying mode (predClass = C4)

  • 32:

           else

  • 33:

             End and save predicted value: Bebop, Recording mode (predClass = C5)

  • 34:

           end if

  • 35:

         else if action == 1 then

  • 36:

           Sample will pass ‘Classifier 4’ to determine the mode of the Parrot AR (4 classes: 0-ON (C6), 1-Hovering (C7), 2-Flying (C8), 3-Recording (C9))

  • 37:

           Generate action using Policy 4 of the agent.

  • 38:

           if action == 0 then

  • 39:

             End and save predicted value: AR, ON mode (predClass = C6)

  • 40:

           else if action == 1 then

  • 41:

             End and save predicted value: AR, Hovering mode (predClass = C7)

  • 42:

           else if action == 2 then

  • 43:

             End and save predicted value: AR, Flying mode (predClass = C8)

  • 44:

           else

  • 45:

             End and save predicted value: AR, Recording mode (predClass = C9)

  • 46:

           end if

  • 47:

         end if

  • 48:

        end if

  • 49:

        if action == true_label then

  • 50:

          reward = 1

  • 51:

        end if

  • 52:

        Save |E| trajectory: (state S, action A, reward R)

  • 53:

      end for

  • 54:

      Apply REINFORCE to update the policies of the agent using |E| trajectories

  • 55:

    end for

  • 56:

    end procedure