Skip to main content
. 2022 Aug 21;12(8):1277. doi: 10.3390/life12081277
Algorithm 2 Upper confidence bound (UCB)
  • 1:

    for i=1,2,,Kdo

  • 2:

        Apply treatment i            ▹ apply each treatment once

  • 3:

    for i=K+1,K+2,,do

  • 4:

        for u{1,,K} do

  • 5:

            Estimate θu^=αuαu+βu          ▹ estimate mean rewards

  • 6:

            nu,i # of times treatment u has been applied so far

  • 7:

        Ui=arg maxuθu^+lninu,i        ▹ select and apply action

  • 8:

        Apply Ui and observe Ci

  • 9:

        (αUi,βUi)(αUi,βUi)+(Ci,1Ci)     ▹ update distribution