| G | Generator network |
| C | Critic network |
| Parameters of generator or discriminator | |
| Parameters of Adam optimizer | |
| State captured by the agent at time-slot t | |
| Possible actions taken by the agent at time-slot t | |
| Reward returned to the agent at time-slot t | |
| Transaction of state probability matrix | |
| Discount factor, where | |
| Bellman operator | |
| X | Original Data |
| M | Replay memory data set |
| z | Random noise vector |
| Coefficient of penalty |