| Algorithm 1: Predict the optimal route |
| Input: Start state; Result: Optimal route; initialization; Initialize Q(s,a); Initialize state ’s’; Choose an action ’a’ using epsilon-greedy approach; for each time step do Take a; Observe the reward r(t+1) and the state s(t+1); Update Q(s(t),a(t)); s(t) ← s(t+1); a(t) ← a(t+1) end |