Skip to main content
. 2023 Jan 9;25(1):135. doi: 10.3390/e25010135
Algorithm 2 Pseudo code for learning algorithm for the selection of state action, policies, and rewards (Code Listing 2: Deep Q-learning algorithm)
1 Initialize P (state, action) with some random value
while (P! = terminal){
3     Initialize state
While (state! = terminal){
5     Choose an action from the state by policy inferred from P
6     Take action to action, observer, state’
7     P (state, action) ← P (state, action) + α [r + γ maxα,
       P (state’, action’) − P (state, action)]
8     state ← state’
9  }
10 }