Skip to main content
[Preprint]. 2023 Nov 16:2023.01.12.523765. Originally published 2023 Jan 12. [Version 2] doi: 10.1101/2023.01.12.523765

Figure 3. Drug cycling policies learned by RL-genotype and RL-fit.

Figure 3.

A: Heatmap depicting the learned policy for 100 replicates (on the x-axis) of the RL-genotype and 100 replicates of RL-fit. Far left column (enlarged) corresponds to the optimal policy derived from the MDP condition. The Y-axis describes the β-lactam antibiotics each RL agent could choose from while the color corresponds to the probability that the learned policy selected a given antibiotic. Bottom heatmap shows the median fitness benefit observed under the policy learned by a given replicate. B: Heatmap showing the average learned policy for RL-fit and RL-genotype. RL-genotype learns a more consistent mapping of state to action compared to RL-fit.