Step 0. Initialization
Step 0a: For each action A ∈ 2𝒜, choose a regression model Q̃(·, A; θA) and initialize the parameter estimates θ̃A (e.g. θ̃A = 0).
Step 0b: Choose a feature-extraction function f(·) (see §3.2.2).
Step 0c: Choose an exploration rule {En}, n ∈ {1, 2, …} such that En ∈ [0, 1] and En → 0 (see §3.2.1).
Step 0d: Choose a learning rule {λi}, i ∈ {1, 2, …} such that λi ∈ (0, 1] and λi → 1 (see §3.2.1).
Step 0e: Choose the number of optimization iterations N ≥ 100; set n ← 1 and i ← 1.
|
Step 1. While n ≤ N:
|