Input: input quintuple : error parameter, balance parameter , discount factor . Output: output Optimal strategy pair , optimal value function .
Initialization V-function , take the initial value randomly. Such as ; suppose ;
Use Equation (19) to greedy improvement strategy pair : ;
Use the updated strategy pair in step 2 and Formula (18) to find the V-function ;
Repeat steps 2, 3;
Step : assuming , has been obtained, at this step, do the following two steps of calculation:
-
(a)
Evolutionary calculation of system: find by and Formula (18), when , defined .
-
(b)
Greedy computing :
If , , terminate calculation.
Return result.
|