Skip to main content
. 2023 Oct 4;25(10):1415. doi: 10.3390/e25101415
Algorithm 2 Online Infer most likely Goal for the Observations
Require: Tπ(G): State S and action A spaces in the continuous domain, and policy evaluation networks Qπg
Require: G: a set of candidate goals
Require: O: an observation sequence O=s0,a0,s1,a1,
  •  1:

    Initializing minimum distance δ*

  •  2:

    while Observation sequence O update do

  •  3:

       for gG do

  •  4:

         δDistanceQπg,O

  •  5:

         if δδ* then

  •  6:

            g*g and δ*δ

  •  7:

         end if

  •  8:

       end for

  •  9:

    end while

  • 10:

    return g*