. 2024 Feb 14;80(1):ujad017. doi: 10.1093/biomtc/ujad017

TABLE 2.

Simulation result for Scenario 1b with 2 stages and 3 treatment options (sample size N = 1000, 100 replications).

;

Method		Stage 1			Stage 2			Overall

T-RL	Y ₁ as outcome	77.75 (1.32)	4.98 (0.07)	3.35 (0.09)	65.51 (5.20)	4.31 (0.11)	2.74 (0.28)	43.89 (5.33)	9.29 (0.14)	6.09 (0.28)
	Y ₂ as outcome	21.31 (1.19)	−0.36 (0.07)	4.94 (0.07)	77.36 (13.12)	3.89 (1.06)	4.20 (0.30)	18.51 (6.74)	3.53 (1.05)	9.14 (0.30)
MOT-RL: w = (0.3, 0.7)
	100% tolerant	71.07 (12.41)	3.11 (0.78)	4.39 (0.20)	96.02 (2.81)	4.76 (0.16)	4.58 (0.05)	69.08 (12.04)	7.87 (0.77)	8.97 (0.21)
	90% tolerant	71.59 (12.46)	3.12 (0.78)	4.38 (0.20)	96.16 (2.73)	4.76 (0.16)	4.58 (0.05)	69.66 (12.11)	7.88 (0.78)	8.96 (0.21)
	70% tolerant	72.64 (12.19)	3.10 (0.76)	4.39 (0.20)	96.38 (2.83)	4.70 (0.17)	4.58 (0.05)	70.82 (11.94)	7.80 (0.78)	8.96 (0.20)
	50% tolerant	73.94 (12.01)	2.94 (0.83)	4.40 (0.23)	96.62 (2.96)	4.26 (0.28)	4.51 (0.06)	71.23 (11.89)	7.19 (0.83)	8.91 (0.24)
MOT-RL: w = (0.4, 0.6)
	100% tolerant	82.49 (12.28)	3.78 (0.71)	4.31 (0.23)	96.08 (3.13)	4.76 (0.19)	4.58 (0.06)	79.18 (12.36)	8.54 (0.72)	8.90 (0.24)
	90% tolerant	82.74 (12.30)	3.79 (0.70)	4.31 (0.24)	96.08 (3.13)	4.76 (0.19)	4.58 (0.06)	79.39 (12.36)	8.55 (0.71)	8.89 (0.24)
	70% tolerant	83.02 (12.57)	3.82 (0.69)	4.30 (0.23)	96.18 (3.15)	4.74 (0.19)	4.58 (0.06)	80.70 (12.69)	8.55 (0.71)	8.88 (0.24)
	50% tolerant	84.13 (12.27)	3.76 (0.63)	4.33 (0.25)	96.52 (2.75)	4.61 (0.24)	4.55 (0.11)	82.98 (12.45)	8.37 (0.69)	8.88 (0.28)
MOT-RL: w = (0.5, 0.5)
	100% tolerant	91.19 (9.96)	4.35 (0.51)	4.28 (0.23)	95.99 (3.10)	4.76 (0.18)	4.58 (0.06)	89.53 (10.49)	9.10 (0.56)	8.86 (0.23)
	90% tolerant	91.54 (9.94)	4.36 (0.50)	4.28 (0.24)	95.99 (3.10)	4.76 (0.18)	4.58 (0.06)	89.86 (10.52)	9.12 (0.56)	8.86 (0.24)
	70% tolerant	92.27 (9.89)	4.37 (0.50)	4.27 (0.24)	95.99 (3.10)	4.75 (0.19)	4.58 (0.06)	90.55 (10.57)	9.12 (0.56)	8.85 (0.24)
	50% tolerant	93.07 (9.58)	4.27 (0.53)	4.25 (0.29)	96.30 (2.66)	4.68 (0.21)	4.56 (0.11)	91.56 (10.05)	8.96 (0.60)	8.81 (0.31)
MOT-RL: w = (0.6, 0.4)
	100% tolerant	90.36 (10.73)	4.72 (0.27)	4.04 (0.45)	95.68 (3.27)	4.74 (0.19)	4.58 (0.06)	88.29 (11.35)	9.46 (0.35)	8.62 (0.46)
	90% tolerant	90.46 (10.70)	4.73 (0.27)	4.03 (0.47)	95.68 (3.27)	4.74 (0.19)	4.58 (0.06)	88.36 (11.32)	9.47 (0.35)	8.61 (0.48)
	70% tolerant	91.07 (10.54)	4.74 (0.24)	4.04 (0.46)	95.84 (2.80)	4.73 (0.21)	4.57 (0.11)	89.03 (10.95)	9.47 (0.30)	8.61 (0.47)
	50% tolerant	92.23 (9.98)	4.68 (0.29)	4.06 (0.44)	96.17 (1.91)	4.70 (0.23)	4.55 (0.17)	90.50 (10.12)	9.38 (0.37)	8.61 (0.46)
MOT-RL: w = (0.7, 0.3)
	100% tolerant	84.50 (10.11)	4.88 (0.17)	3.67 (0.46)	95.04 (4.06)	4.71 (0.22)	4.56 (0.08)	78.72 (10.65)	9.59 (0.27)	8.23 (0.47)
	90% tolerant	84.57 (10.15)	4.89 (0.17)	3.67 (0.47)	95.08 (3.88)	4.71 (0.22)	4.56 (0.08)	78.83 (10.53)	9.59 (0.28)	8.23 (0.48)
	70% tolerant	84.97 (10.11)	4.88 (0.16)	3.66 (0.47)	95.37 (3.25)	4.70 (0.24)	4.55 (0.15)	79.50 (10.42)	9.58 (0.27)	8.21 (0.50)
	50% tolerant	85.25 (10.02)	4.87 (0.17)	3.64 (0.45)	95.83 (1.99)	4.69 (0.26)	4.52 (0.21)	80.20 (10.36)	9.55 (0.30)	8.16 (0.49)

The Inline graphic column indicates the empirical mean and standard error (SE) of the percentage of subjects whose estimated treatment set includes the actual optimal treatment, which is recorded for stage 1, stage 2, and both stage 1 and stage 2 (overall). The column presents the empirical mean and SE of the estimated counterfactual mean rewards of i th objective at stage j, R_ij (i, j = 1, 2), which are obtained from true rewards and the estimated optimal (or tolerant) regime. The Inline graphic column presents the empirical mean and SE of the estimated counterfactual mean outcome of the i_th objective.