| Parameter | Definition |
| s | state of UAV |
| a | action of UAV |
| R | reward function |
| d | euclidean distance |
| ρ | switch for R3 |
| k | discount factor |
| π | strategy of the agent |
| p | action choice probability |
| yi | winning score list |
| zi | winning agent list |
| V | state value function |
| Q | action value function |
| P | state transition matrix |
| e | revenue function of the searched area |
| z | time |
| ε | search capability of UAV |
| α | learning rate |
| bi | bundle of agent |
| cij[bi] | score function |
| Lt | maximum assigned task number |