| Algorithm 2. Train RL Agent in MDP Environment |
| Result: The Agent Successfully Finds The Optimal Path Which Results In Cumulative Reward |
| Initialization; |
| Create MDP Environment; |
| 1. Create MDP Model With Identified States And Actions; |
| 2. Specify The State Transition And Reward Matrices For The MDP; |
| 3. Specify The Terminal States Of The MDP; |
| 4. Create The RL MDP Environment For This Process Model; |
| 5. Specify The Initial State Of The Agent By Specifying A Reset Function; |
| Create Q-Learning Agent; |
| 1. Create A Q Table Using The Observation And Action Specifications From The MDP Environment; |
| 2. Set The Learning Rate Of The Representation; |
| 3. Create A Q-learning Agent; |
| Train Q-Learning Agent; |
| 1. Specify The Training Options (Episode, Stop Training Criteria); |
| 2. Train The Agent Using The ‘train’ Function; |
| Validate Q-Learning Results; |
| 1. Simulate The Agent In The Training Environment Using The ‘sim’ Function. |