View full-text article in PMC Entropy (Basel). 2023 Jan 18;25(2):188. doi: 10.3390/e25020188 Search in PMC Search in PubMed View in NLM Catalog Add to search Copyright and License information © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). PMC Copyright notice Algorithm 1 Maximum entropy exploration with neural networks Input: α,N,θ0,X0,k for i=1,…,N do Receive context si and choose ai∼πi where πi(a|si)=er^θi−1(a,si)/α∫a′er^θi−1(a′,si)/αda′ Agent receives reward ri Add the triplet {si,ai,ri} to the dataset X Every k steps train the model r^θ: θi=arg minθ∑{sj,aj,rj}∈X|rj−r^θ(aj,sj)| end for