Algorithm 2. Train RL Agent in MDP Environment |
Result: The Agent Successfully Finds The Optimal Path Which Results In Cumulative Reward |
Initialization; |
Create MDP Environment; |
1. Create MDP Model With Identified States And Actions; |
2. Specify The State Transition And Reward Matrices For The MDP; |
3. Specify The Terminal States Of The MDP; |
4. Create The RL MDP Environment For This Process Model; |
5. Specify The Initial State Of The Agent By Specifying A Reset Function; |
Create Q-Learning Agent; |
1. Create A Q Table Using The Observation And Action Specifications From The MDP Environment; |
2. Set The Learning Rate Of The Representation; |
3. Create A Q-learning Agent; |
Train Q-Learning Agent; |
1. Specify The Training Options (Episode, Stop Training Criteria); |
2. Train The Agent Using The ‘train’ Function; |
Validate Q-Learning Results; |
1. Simulate The Agent In The Training Environment Using The ‘sim’ Function. |