Skip to main content
. 2022 Jun 13;9(6):253. doi: 10.3390/bioengineering9060253
Algorithm 2. Train RL Agent in MDP Environment
Result: The Agent Successfully Finds The Optimal Path Which Results In Cumulative Reward
Initialization;
Create MDP Environment;
1. Create MDP Model With Identified States And Actions;
2. Specify The State Transition And Reward Matrices For The MDP;
3. Specify The Terminal States Of The MDP;
4. Create The RL MDP Environment For This Process Model;
5. Specify The Initial State Of The Agent By Specifying A Reset Function;
Create Q-Learning Agent;
1. Create A Q Table Using The Observation And Action Specifications From The MDP Environment;
2. Set The Learning Rate Of The Representation;
3. Create A Q-learning Agent;
Train Q-Learning Agent;
1. Specify The Training Options (Episode, Stop Training Criteria);
2. Train The Agent Using The ‘train’ Function;
Validate Q-Learning Results;
1. Simulate The Agent In The Training Environment Using The ‘sim’ Function.