Illustrative example for competing outcomes generically called ‘side-effects’ and ‘efficacy’ and no patient covariates. In both cases the set {−1, 1} should be recommended at the first stage. Subject A illustrates the impact of preference evolution. If Subject A is initially concerned only with efficacy then they will choose treatment 1 at the first stage. However, if at the time of the second decision Subject A is concerned with side-effects, having initially chosen treatment 1 they are only left with poor choices. Subject B illustrates a potential problem with applying Q-learning separately to each outcome and then checking for agreement. Both Q-learning applied to efficacy and side-effects recommend treatment 1 at the first stage for Subject B. However, the Q-learning algorithm applied to efficacy assumes treatment -1 will be chosen at the second stage, and the Q-learning algorithm applied to side-effects assumes treatment 1 will be applied at the second stage. Yet, for Subject B applying treatment 1 at the first stage leads to extreme and potentially undesirable trade-offs at the second stage.