Extended Data Table 2.
Parameter values as identified from data. The action bias was fit on the power supply output voltage. Measurement noise is Gaussian additive noise and randomly sampled at each simulation time step. We use a fixed action bias with an additive random offset to account for non-ideal behaviour of power supply hardware. Current diffusion-parameter variations account for the uncontrolled operating conditions. Parameter variations are sampled at the beginning of each episode but kept constant during the episode. The samples are drawn from uniform (action bias) and log-uniform (current diffusion) distributions using the bounds in this table. For single-plasma training, Rp, βp and qA are varied, whereas in a multiple-plasmas training, we vary and IOH. In the latter case, we sample an overall geometric mean offset of the two from a log-uniform distribution. We sample the log of the multiplicative difference between them from Bs (4,4), for which Bs is a scaled β distribution. We sample a single IOH value for both coils. Parameters are sampled as absolute values unless explicitly indicated as scaling factors.