Skip to main content
. 2015 Aug 11;5:12874. doi: 10.1038/srep12874

Figure 3. Learning curves.

Figure 3

Left: The success probability ps as a function of number of measurement rounds on the qubit in state |φ〉 is shown for four angles as averaged over an ensemble of N = 1000 agents running in parallel (λ = 1, γ = 1/100). The time scale of learning can be rescaled by increasing both λ and γ. Dashed lines give analytical approximations of the asymptotic values (see text). The insets show the transition probabilities of the average agent after 1000 learning steps, Inline graphic, where Inline graphic is the ensemble average of the weight hα, together with the minimal and maximal probabilities obtained in the ensemble (as error bars), and the analytical curve of p(+1|φ, α). From the transition/measurement probabilities of a single agent j we infer30,31 its mean angle Inline graphic. The ‘‘vector sum’’ of the mean angles of the Inline graphic individual agents is the complex number Inline graphic, which determines the ensemble average of the mean angle Inline graphic and its circular standard deviation Inline graphic. Bottom middle: A higher reward scaling λ and lower damping rate γ give a faster initial learning and a higher asymptote, with a slower final convergence for the latter. Curves show the differences of all ps with the reference case λ = 1, γ = 1/100. Top right: Asymptotic success probability Inline graphic (analytical approximation) as a function of φ for 4 projectors. The curve is π/2-periodic. Bottom right: Learning time τ90 for the ensemble of agents to reach 90% of Inline graphic. Data is the average of the learning times of 1000 agents.