Skip to main content
. Author manuscript; available in PMC: 2014 Mar 14.
Published in final edited form as: Biometrics. 2014 Jan 8;70(1):53–61. doi: 10.1111/biom.12132

Table 1.

Illustrative example for competing outcomes generically called ‘side-effects’ and ‘efficacy’ and no patient covariates. In both cases the set {−1, 1} should be recommended at the first stage. Subject A illustrates the impact of preference evolution. If Subject A is initially concerned only with efficacy then they will choose treatment 1 at the first stage. However, if at the time of the second decision Subject A is concerned with side-effects, having initially chosen treatment 1 they are only left with poor choices. Subject B illustrates a potential problem with applying Q-learning separately to each outcome and then checking for agreement. Both Q-learning applied to efficacy and side-effects recommend treatment 1 at the first stage for Subject B. However, the Q-learning algorithm applied to efficacy assumes treatment -1 will be chosen at the second stage, and the Q-learning algorithm applied to side-effects assumes treatment 1 will be applied at the second stage. Yet, for Subject B applying treatment 1 at the first stage leads to extreme and potentially undesirable trade-offs at the second stage.

Stage 1 Txt Stage 2 Txt Subject A Subject B
Side-effects Efficacy Side-effects Efficacy
1 1 Very poor Very good Very good Very poor
1 -1 Poor Poor Poor Very good
-1 1 Good Good Good Good
-1 -1 Very good Poor Fair Very poor