View full-text article in PMC Sensors (Basel). 2021 Mar 11;21(6):1960. doi: 10.3390/s21061960 Search in PMC Search in PubMed View in NLM Catalog Add to search Copyright and License information © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). PMC Copyright notice Algorithm 1 DQN Algorithm 1:CreateaReplayBufferB 2:Initiaterandomlythelocalnetworkweightsθ 3:Initiatetargetnetworkweightsθt←θ 4:foreachepisodeedo 5: Initiatearandomstates 6: for eachtransitiont do 7: Selectactionaaccordingtoϵ-greedymodel 8: Findthenext-stateandreward 9: StoretheexperienceinbufferB 10: Selectbatch-sizerandomexperiencesfrombufferB 11: CalculateQ(s,a)usinglocalnetworkand 12: thebatchofexperiences 13: CalculateQ′(s′,a)usingtargetnetworkand 14: thebatchofnext-states 15: CalculatelossfunctionusingQ′(s′,a)andQ(s,a) 16: if t%updating-interval==0 then 17: θt←θ 18: end if 19: end for 20:end for