View full-text article in PMC Sensors (Basel). 2022 Nov 24;22(23):9114. doi: 10.3390/s22239114 Search in PMC Search in PubMed View in NLM Catalog Add to search Copyright and License information © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). PMC Copyright notice Algorithm 1 PPO Learning Algorithm 1:Initialize θ and ϕ with random numbers 2:for each iteration do 3: Collect B number of trajectories following the policy πθ in the actor 4: Update θ as θ+α▿θL(θ) using Equation (10). 5: Update ϕ as ϕ+α▿ϕL(ϕ) using Equation (11). 6:end for