Reinforcement Learning for Bit-Flipping Decoding of Polar Codes

. 2021 Jan 30;23(2):171. doi: 10.3390/e23020171

Algorithm 2 Q-learning algorithm for the QLSCF algorithm

While

e p i s o d e

n u m b e r

:
Initial observation

s

;
While

f r a m e < 10000

:
Initial information of polar codes:

y_{1}^{N}, L_{1}^{N}

;
If

e p i s o d e

0.1 \times n u m b e r

a \leftarrow c h o o s e_a c t i o n (s, ε_{l})

;
Else:

a \leftarrow c h o o s e_a c t i o n (s, ε_{s})

;
End if

(s^{'}, r) \leftarrow e n v (a, y_{1}^{N}, L_{1}^{N}, s)

;

Q (s, a) \leftarrow Q (s, a) + α [r + γ {m a x}_{a^{'}} Q (s^{'}, a^{'}) - Q (s, a)]

;

s = s ’

;

f r a m e + = 1

;
End while
End while
// The env function
Function

(s^{'}, r) \leftarrow e n v (a, y_{1}^{N}, L_{1}^{N}, s)

s^{'} \leftarrow g e t s t a t e (s, a)

;

f l a g \leftarrow p o l a r d e c o d e r (a, y_{1}^{N}, L_{1}^{N}, s)

;
If

f l a g = 1

r = - | L_{a} | + 1

;
Else:

r = - | L_{a} | - 1

;
End if
End function