. Author manuscript; available in PMC: 2014 Mar 14.

Published in final edited form as: Biometrics. 2014 Jan 8;70(1):53–61. doi: 10.1111/biom.12132

Table 1.

Illustrative example for competing outcomes generically called ‘side-effects’ and ‘efficacy’ and no patient covariates. In both cases the set {−1, 1} should be recommended at the first stage. Subject A illustrates the impact of preference evolution. If Subject A is initially concerned only with efficacy then they will choose treatment 1 at the first stage. However, if at the time of the second decision Subject A is concerned with side-effects, having initially chosen treatment 1 they are only left with poor choices. Subject B illustrates a potential problem with applying Q-learning separately to each outcome and then checking for agreement. Both Q-learning applied to efficacy and side-effects recommend treatment 1 at the first stage for Subject B. However, the Q-learning algorithm applied to efficacy assumes treatment -1 will be chosen at the second stage, and the Q-learning algorithm applied to side-effects assumes treatment 1 will be applied at the second stage. Yet, for Subject B applying treatment 1 at the first stage leads to extreme and potentially undesirable trade-offs at the second stage.

Stage 1 Txt	Stage 2 Txt	Subject A		Subject B
Stage 1 Txt	Stage 2 Txt	Side-effects	Efficacy	Side-effects	Efficacy
1	1	Very poor	Very good	Very good	Very poor
1	-1	Poor	Poor	Poor	Very good
-1	1	Good	Good	Good	Good
-1	-1	Very good	Poor	Fair	Very poor