Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2022 Dec 16;445:133613. doi: 10.1016/j.physd.2022.133613

COVID-19 vaccine incentive scheduling using an optimally controlled reinforcement learning model

K Stuckey a, PK Newton b,c,
PMCID: PMC9754750  PMID: 36540277

Abstract

We model Covid-19 vaccine uptake as a reinforcement learning dynamic between two populations: the vaccine adopters, and the vaccine hesitant. Using data available from the Center for Disease Control (CDC), we estimate the payoff matrix governing the interaction between these two groups over time and show they are playing a Hawk–Dove evolutionary game with an internal evolutionarily stable Nash equilibrium (the asymptotic percentage of vaccinated in the population). We then ask whether vaccine adoption can be improved by implementing dynamic incentive schedules that reward/punish the vaccine hesitant, and if so, what schedules are optimal and how effective are they likely to be? When is the optimal time to start an incentive program, how large should the incentives be, and is there a point of diminishing returns? By using a tailored replicator dynamic reinforcement learning model together with optimal control theory, we show that well designed and timed incentive programs can improve vaccine uptake by shifting the Nash equilibrium upward in large populations, but only so much, and incentive sizes above a certain threshold show diminishing returns.

Keywords: Vaccine uptake dynamics, Evolutionary game theory, Hawk–Dove games, Optimal control, Reinforcement learning dynamics, Dynamic incentives

1. Introduction

The voluntary uptake of vaccines for Covid-19 has proven to be a challenge around the world, but particularly in the United States where multiple vaccine options have been available since early in 2021 yet there remains a sizable unvaccinated group. After an initial population of early adopters were vaccinated, the surge began to slow, despite widespread availability, and has now reached what looks to be a fairly stable resting point (see Fig. 1). While 100% voluntary compliance is rarely if ever achievable, vaccine hesitancy [1] has proven to be more widespread for Covid-19 than for other vaccines, such as seasonal flu vaccines [2], the polio vaccine, smallpox, HPV and others [3]. With any widespread nationally coordinated vaccination effort, there will always be a population of people who we label vaccine adopters (e.g. elderly, immuno-compromised, healthcare workers) who get vaccinated as soon as they are eligible, or shortly thereafter, then others follow. The vaccine uptake curve for this group in these early stages is limited mostly by vaccine availability and logistics. There is also a vaccine hesitant population who will delay their initial chances to get vaccinated, then as they see others getting sick and weigh evidence and public opinion, some might decide to vaccinate (adopters), while still others might further delay, or forgo their chance altogether for various reasons (hesitant) [4]. We view the full population as a collection of two types of players in a time-evolving game, who interact, learn, and receive payoffs (reward/punishment) according to the strategy (adopt/forgo) they choose, where the interactions determine the fitness of the players using one of two strategies, and the survival of that strategy is determined by the fitness function and the population frequencies of the players in each group. The two competing behaviors ultimately result in a growth curve describing vaccine uptake, shown in Fig. 1, that starts out rapidly (exponential), then slows down, passing through an inflection point to a fairly stable resting percentage of vaccinated individuals, which in the United States seems to have settled at just under 60% of the population (Fig. 1).

Fig. 1.

Fig. 1

Covid-19 vaccine uptake data in the United States (1 dose for Johnson & Johnson or 2 doses for Pfizer/Moderna) starting in January 2021 as is publicly available at https://covid.cdc.gov/covid-data-tracker/#datatracker-home. Red curve shows a Gompertzian fit to the data (first derivative curve also in red), blue curve shows the results of the Hawk–Dove evolutionary game theory model (first derivative curve also in blue). Vertical lines mark the inflection point (maxima of the first derivative) where vaccine uptake begins to slow, resulting in an asymptote (Nash equilibrium) at roughly 58% of the population. Upper left inset shows the phase-plane diagram for the Hawk–Dove dynamical system with an internal evolutionary stable state (ESS) at 58%. How much can well designed incentive programs push this percentage up?.

The way to think about the evolutionary dynamics unfolding in a vulnerable unvaccinated population of players is to imagine that each individual carries with him/her a complex and ever-changing set of beliefs about their faith in the efficacy of the vaccine that is being introduced. Individuals begin to learn about the vaccine well in advance of the rollout by talking with friends and relatives, and listening and reading news reports. It is widely appreciated that the social dynamics that takes place in this pre-rollout phase is crucial to the ultimate success of the program, and one can assume that the evolution of each individuals’ perception regarding the vaccine is already forming before the actual rollout begins. Once vaccines become available, individual decisions are made over time, and from the initial uniformly unvaccinated population, a subpopulation of vaccine adopters begins to form and grow monotonically (a vaccinated person cannot become unvaccinated). The exact shape of this growth curve depends very much on the relative sizes of the two sub-populations that are forming between the unvaccinated and the vaccinated as the two groups continue to play this evolving game. The frequencies of each of the two groups within the overall population influence the decision process of each remaining unvaccinated person as it is known that unvaccinated people surrounded by a sea of vaccinated are much more likely to choose to be vaccinated than those who are surrounded by vaccine skeptics.

The question we address in this paper is whether or not a well designed (optimized) punishment/reward system can significantly alter this natural dynamic, and if so, how best to achieve the goal of obtaining a higher percentage of vaccinated people making up a population? Vaccine incentive programs have been utilized with varying degrees of success for other vaccines, but for the Covid-19 vaccine they have largely been local (county-wide and state-by-state) and somewhat haphazard, ranging from small cash rewards handed out at vaccination clinics, medium-sized vacation add-ons, or larger lottery-style rewards [5], [6]. Punishments for the unvaccinated have also been levied, ranging from the small extra hassle of requiring weekly Covid testing, more severe restrictions of not being allowed entry to restaurants or public events, and larger vaccine mandates that require vaccines as part of the employment requirement or school enrollment [7], [8], [9]. Table 1 shows a compilation of the mostly ad hoc strategies that have been implemented in states across the country but larger scale national programs have not been systematically designed or implemented.

Table 1.

Compiled information on different forms of vaccine incentives states have used. States are ranked in order of highest adoption percentage to lowest. Not listed are states where we were not able to obtain information on any incentive programs.

Rank % Vaccinated State One Dose Fully Vaccinated
2 70.68 Connecticut Free event admission Free drinks
Concert tickets
Free food

4 70.56 Maine License or event pass $1 per person vaccinated

5 69.69 Massachusetts 5 $1 million prize

5 $300,000 scholarship

6 66.72 New York Baseball tickets State Park pass

7 66.4 New Jersey Free Beer State Park Vax Pass
Dinner with governor

8 66.18 Maryland $100 for state employees $2 million lottery

9 63.48 Washington Lottery tickets

11 62.88 Oregon $1 million prize
36 $10,000 prizes
5 $100,000 scholarships

14 62.14 New Mexico 5 wkly $250,000 prize
$5 million prize
$100
10 prize wheels
Travel prize

15 61.64 Colorado Weekly lottery $500 for CDOC
$50,000 tuition

16 61.31 California 10 $1.5 million prize
30 $50,000 prizes
$50 gift card
6 vacations

18 60.42 Illinois 50,000 six flags tickets
3 $1 million prizes
40 $100,000 prizes

19 59.9 Minnesota Free/discounted drinks

20 59.81 Hawaii Travel perks

22 59.73 Delaware Inmate incentives $302,000 prize
Scholarship raffle Free drinks
Vacation passes

27 53.54 Michigan $5 million in cash prizes

31 52.98 Nevada “Vax Nevada Days”

32 52.67 North Carolina 4 $1 million prizes
$25 cash cards

35 51.88 Ohio $1 million drawings

36 50.72 Kentucky Lottery Tickets
3 $1 million prizes
15 scholarships

40 49.87 Indiana Girl scout cookies
43 48.04 Arkansas $20 lottery tickets $100 for state employees

44 47.71 Louisiana State park access Free drinks
$100,000 prizes

45 47.66 Tennessee Car sweepstakes

47 44.78 Alabama Talladega Sweepstakes
$250 gift cards

48 43.89 Wyoming “Shots for swag”

49 43.92 Idaho 4 hr paid leave

50 41.04 West Virginia $100 gift cards
$1.58 million prize

Specific questions we address in this paper include whether there are inherent limitations to well designed (optimized) punishment/reward systems, if implemented on a wide-scale basis? Are there optimal schedules that can be designed that would work most effectively? What are the optimal starting and ending times for such dynamic incentive programs? Is there a point of diminishing returns where larger incentives are no longer as effective? By modeling the vaccine uptake problem as a reinforcement learning evolutionary game played between two sub-populations of players (the vaccine adopters and the vaccine hesitant), we address these questions within the context of a mathematical model calibrated with vaccine uptake data obtained from the Center for Disease Control both on a nationwide level, and a state-by-state level. With models tailored to individual states and for different age groups, we are able to test various types of incentive schedules to produce upper and lower bounds (using the Pontryagin maximum/minimum principle from optimal control theory) on the inherent limitations of dynamic incentive programs, and by producing incentive/response curves (analogous to chemotherapeutic dose/response curves [10]), we are able to hypothesize likely responses to different types and sizes of the incentive schedules.

Aspects of vaccine policy and individual decision making surrounding these policies have been studied, for example, by Korn et al. [11] who argue that vaccine uptake can be viewed as a social contract where individuals reward others who comply and punish those who do not. Bauch et al. [12] frame the uptake problem in terms of the complex trade-offs between group interests versus self-interest arguing that, in the case of extreme events (such as bio-terrorist attacks), it is unlikely that voluntary vaccination levels alone would reach the group optimal level necessary for obtaining herd immunity. Bauch et al. [13], [14] have also used game-theoretical models to help explain human decision-making surrounding vaccine uptake studying how vaccine scares unfold [15]. In [16], they invoke imitation dynamics models to understand the complex interplay between vaccine coverage, disease prevalence, and individual decision making. In a very comprehensive recent book, Tanimoto [17] describes many popular models associated with the spread of epidemics. Chapter 9 addresses the topic of pre-emptive vs. late vaccination strategies, Chapter 10 discusses the flu vaccine uptake problem, while Chapter 11 discusses the optimal design of vaccination subsidy policies. More general modeling frameworks have used tools borrowed from statistical physics in interesting ways [18] to model vaccine dynamics.

Our approach makes use of the vaccine uptake data (country-wide as well as state by state data) available at https://covid.cdc.gov/covid-data-tracker/#datatracker-home to fit three Gompertzian parameters (a,b,c) and then use these to estimate the entries of the 2 × 2 payoff matrix that describes the evolutionary game played between the vaccine adopters and the vaccine hesitant populations. The data shows that the population is effectively playing a Hawk–Dove game with an evolutionary stable internal fixed point (ESS) representing the percentage of vaccine adopters (Doves) in the population. We then use optimal control theory on this dynamical system to design time-dependent incentive schedules that alter the baseline payoff matrix entries (altering the reward/punishment balance) in order to obtain upper (and lower) bounds on how different incentive strategies can shift the asymptotic percentage of vaccine adopters in the population. This control technique was originally developed for the design of adaptive/optimal chemotherapy schedules for controlling resistance in tumors [19], [20], [21], [22], [23], [24], [25], [26]. Here, we exploit the observation that optimizing vaccine incentive schedules is analogous to optimizing chemotherapy schedules to produce dose–response curves [10] for specific goals, such as, for example, avoiding chemotherapeutic resistance [19], [20], [21], [22]. The adoption of these techniques to vaccine incentive scheduling presents a different set of questions and challenges but can be addressed within a similar modeling framework. Other recent work that makes use of feedback control ideas to develop COVID-19 policies includes [27]. While the merging of reinforcement learning models with optimal/adaptive control theory is a new and promising field with many potential applications, a nice introduction to the field, described mostly in the robotics framework, can be found in a recent monograph [28].

2. The vaccine uptake model

Calibrating the Gompertzian curves

The vaccine uptake curve shown in Fig. 1 is a three parameter (a,b,c) Gompertzian curve,

f(t)=aexp[exp(bct)] (1)

which has a long history of use in actuarial sciences (laws of human mortality), economics (growth laws of wealth), biology (population growth and saturation), cancer (tumor growth) [29], [30]. Key parameters for fitting such a curve to this data are: (1) T: the location of the inflection point (shown in Fig. 1); (2) f(T): the slope of the tangent line at the inflection point (the growth rate when the growth curve changes from concave up to concave down); (3) f(0): the slope of the tangent line at the origin (initial growth rate). In Eq. (1), a is the asymptote (limtf(t)) (also known as carrying-capacity in other contexts [31], [32]), b is the displacement along the t-axis (time-shift parameter), and c is a time-scaling factor. In terms of those parameters, the inflection point is located at T, where:

T=b/c, (2)

is the ratio of time-shift to time scaling parameters, while the slope of f(t) at the origin and inflection points are given by:

f(0)=acexp(bexp(b)), (3)
f(T)=ac/exp(1). (4)

Inverting (2)(4) for (a,b,c) gives:

a=exp(1)f(T)T/b, (5)
c=b/T, (6)
bexp(b)=ln(f(0)/exp(1)f(T)). (7)

The transcendental equation (7) can be solved numerically to give the value of b, and the results are shown in Table 2, and corresponding curve in Fig. 1 (in red) for the US population. For this, the asymptote is roughly 58% vaccinated, and the uptake inflection point is T88 days from when vaccines first became available. Error bars in Fig. 1 are produced using a stochastic process that governs the evolutionary game dynamics in finite populations [33], [34], [35], which in the limit of large populations converges to the deterministic problem. The stochastic simulation is run and at each time step, the relative fitness of each population is used to the calculate the probability of birth and the probability of death. These probabilities are sampled and the proportion of individuals in each population is updated accordingly. This process is repeated 10,000 times. The error bars show one standard deviation around the mean.

Table 2.

Model parameters for different population groups.

Population T (days) f(0) f(T) a b c a21 a11 a12
US population 88 0.028256504 0.405499161 57.71 1.675 0.0191 0.034 0.02 0.0191
1839 year olds 106 0.008343649 0.382741771 57.8 1.907 0.018 0.032 0.019 0.018
4064 year olds 90 0.004432806 0.597362637 70.6 2.077 0.023 0.033 0.023 0.023
65+ year olds 58 0.015256548 1.024397092 81.9 1.971 0.034 0.043 0.035 0.034
Connecticut 86 0.012707894 0.563297 69.6 1.901 0.022 0.032 0.022 0.022
Vermont 89 0.002882054 0.612739997 69.4 2.14 0.024 0.035 0.024 0.024
Idaho 77 0.030170215 0.327559854 42.4 1.608 0.021 0.051 0.022 0.021
West Virginia 63 0.065387774 0.331017921 40.9 1.389 0.022 0.055 0.0231 0.0221

We want to emphasize the importance of the inflection point time T in our approach which we use to set the basic timescale over which we optimize. An optimization cycle of T/4 was chosen to be short enough to allow for relatively frequent changes in the incentive schedules if necessary, but long enough for an optimized schedule to have a reasonable impact. Each T/4 cycle is optimized individually, then longer periods nT/4 (where n is an integer) are optimized sequentially using the final value of the previous cycle as the initial value for the next. Table 2 provides a summary of all of the model parameters we use for the different sub-populations. With these parameters, we develop the reinforcement learning dynamical system.

The reinforcement learning/replicator model

We use the replicator dynamics equations from evolutionary game theory as our reinforcement learning model for vaccine uptake dynamics between the two populations xA (vaccine adopters) and xH (vaccine hesitant), where each represents a proportion of the entire population, x(t)=(x1,x2)T(xA,xH)T; xA+xH=1. The essential feature of replicator dynamics is that people (reinforcement learners) copy others, and successful strategies get replicated more frequently than unsuccessful strategies [36] thereby spreading throughout the population. As discussed earlier, initially, none of the players are committed to just one way of behaving, but retain several potential ways of behaving simultaneously. Which behavior predominates depends on the experiences at the individual level. At the population level, the process operates analogously to biological evolution governed by the replicator dynamical system. This makes the model useful not only in contexts where biological evolution by natural selection (due to competition) is prominent [37], as cells and organisms with higher fitness (measured by their ability to replicate) more often pass along their genetic characteristics in the population, but also in any reinforcement learning setting where learners copy successful strategies [38] more often than unsuccessful ones (success begets success and failure spirals downward), with success determined by fitness level. The attractiveness of this framework in the present context is that it has been widely documented that vaccine uptake is more common in a positive uptake environment, and less common in settings where fewer people choose to get vaccinated. This dynamic is also the hallmark of a reinforcement learning process where people interact, learn strategies from others, receive payoffs (in the form of advantage or disadvantage) based on strategies they adopt, the payoffs determine the fitness (ability to survive) of those strategies in the overall population, the fitness controls the survival probability of the strategy.

To formulate the dynamical system, we use:

x˙A=xA(fAf), (8)
x˙H=xH(fHf). (9)

Here, fA and fH denote the fitness of the vaccine adopters and the vaccine hesitant populations, while f denotes the average fitness of the entire population under consideration. The system simply says that the growth rate of each sub-population (x˙A/xA;x˙H/xH) is governed by the difference between the fitness of that population and the overall average fitness of both populations. The more each sub-population fitness deviates from the average (either above or below), the larger/smaller the instantaneous growth rate is of that strategy in the population. This models gradual evolution (as contrasted, for example, with catastrophism), i.e. behavior changes occur gradually as is the case with vaccine uptake dynamics. The fitnesses are defined via a 2 × 2 payoff matrix A as:

fA=Ax1=a11xA+a12xH, (10)
fH=Ax2=a21xA+a22xH, (11)
f=xT(Ax)=xAfA+xHfH, (12)

with:

A=a11a12a21a22 (13)

which defines the evolutionary game being played as determined by the CDC data. The four entries of this matrix encode the punishment-reward balance (i.e. payoffs) associated with competition between the two groups and is the heart of the model. As described in [39], the payoffs are decided by many complex factors, including each person’s perceived risk of infection (which can vary in time), the severity of the disease (which certainly varies in time), perhaps measured in hospitalization rates, financial costs of vaccinations, and also the perceived uptake of vaccinations by others. Increasing/decreasing either of the entries of the top row of A increases/decreases the fitness of the vaccine adopter population, whereas increasing/decreasing either of the entries of the bottom row of A increases/decreases the fitness of the hesitant population. Without loss, we can choose a22=0, while the remaining three entries (shown in Table 2) can be obtained as functions of (a,b,c) (which were optimally fit to the data) via Eqs. (5)(7). Thus, all of the complexities associated with the many decision processes which result in the Gompertzian uptake curves are neatly packaged into three of the four entries of the payoff matrix determining the evolutionary game which unfolds between the vaccine adopters and the vaccine hesitant.

It is a non-trivial result of this fitting process that the payoff matrix corresponds to a Hawk–Dove evolutionary game, where the vaccine adopters are the Doves, and the vaccine hesitants are the Hawks. This is based on the inequalities: a21>a11>a12>a22=0. A key feature of a Hawk–Dove evolutionary game is the existence of an internal (0,1) ESS (Nash equilibrium), which we denote by a (the asymptote of f(t)). As shown in Fig. 1, for the US population as a whole, a0.58. Fig. 2 shows the data, curve fit, and replicator dynamic model for the two states with the highest vaccine uptake percentages (Vermont and Connecticut 69%), and the two lowest (West Virginia and Idaho 41%). Fig. 2(a) shows the results of the vaccine uptake data along with both the Gompertzian curve fit and the replicator dynamics model for the four states, along with the entire US population. In Fig. 2(b) we break the US data into three different age groups (18–39; 40–64; 65+) in a similar way that the vaccine rollout prioritized these groups. This is reflected in the leftward shift of the curve corresponding to the older compared to younger groups, with the oldest population showing the steepest uptake curve consistent with the notion that this group was among the most eager to be vaccinated. The inflection point for the US population as a whole is roughly at T88 days which we take as a benchmark for scaling time when we implement our control strategy on this group. Similarly, for all other subgroups, we use the corresponding inflection point location associated with that subgroup (see Table 2).

Fig. 2.

Fig. 2

Vaccine uptake curves for different subgroups. Dots (data), Dashed (Gompertzian fit), Solid (replicator model). Vertical lines mark inflection points at maximizers of the derivative curves. (a) United States (dashed black); Connecticut (green); Vermont (purple); West Virgina (red); Idaho (blue). (b) Three different age groups: 18–39 (red), 40–64 (blue) and 65+ (purple).

Shifting vaccine uptake curves with time-dependent payoffs

To implement an optimal vaccine incentive strategy, we now consider the time-dependent payoff matrix:

A=a11a12a21a22=A0+A1(t)=a11a12a21a22+0u1(t)u2(t)0=a11a12+u1(t)a21+u2(t)a22, (16)

where A1(t) represents our control with entries in the off-diagonal terms (without loss of generality), and A0 is the baseline Hawk–Dove payoff matrix as obtained from the vaccine uptake data. The time-dependent controllers u(t)=(u1(t),u2(t))R2 are bounded above and below (based on an incentive size parameter p):

apa12+a21u1(t),u2(t)apa12+a21 (17)

(0p1) and a global constraint on the incentive schedule, U(t)=0tu(τ)dτ=const. is enforced, all of which play a role in determining the detailed outcome of the optimization procedure.

3. Methods

To implement the Pontryagin maximum (minimum) principle with boundary value constraints in order to compute upper (maximum principle) and lower (minimum principle) bounds, we follow standard methods [40] and denote:

X=[x(t),U(t)]T,XR4 (18)
X˙=F(X)=[x˙,U˙(t)]T,F:R4R4 (19)

where we would like to minimize or maximize a mathematical cost function:

J[x(),u(),t0,tf]=t0tfL(x(t),u(t),t)dt+φ[x(t0),t0,x(tf),tf] (20)

over times from t0 to tf. The first term on the right is called the running cost, while the second is called the endpoint cost. Since we are optimizing the endpoint cost, φ[x(t0),t0,x(tf),tf]=x1=xA(tf) only (i.e. the asymptotic vaccine acceptance value), we take L=0 (called a Meyer problem [40] developed in the context of missile guidance problems where final distance from the target is minimized). We briefly describe the basic framework and refer readers to [40] for more details on how to implement the approach. In particular, we construct the control theory Hamiltonian:

H(x(t),U(t),λ,u(t))=λTF(x)+L(x,u(t),t) (21)

where λ=[λ1,λ2,μ1,μ2]T are the co-state functions (i.e. momenta) associated with x and U respectively. Assuming that u(t) is the optimal control for this problem, with corresponding trajectory x(t),U(t), the canonical equations satisfy:

xi˙(t)=Hλi (22)
Ui˙(t)=Hμi (23)
λi˙(t)=Hxi (24)
μi˙(t)=HUi (25)

where i=(1,2). The corresponding boundary conditions are:

x(t0)=x0 (26)
U(t0)=0,U(tf)=Utf (27)
λi(tf)=φ(x(tf))xi(tf) (28)

Then, at any point in time, the optimal control u(t) will minimize the control theory Hamiltonian:

u(t)=arg minu(t)H(x(t),U(t),λ(t),u(t)) (29)

The optimization problem becomes a two-point boundary value problem (using (26)(28)) with unknowns (λ2(t0),x2(tf)) whose solution gives rise to the optimal trajectory x(t) (from (22)) and the corresponding control u(t) that produces it, as shown, for example, in Fig. 3. In practice, we solve the optimization problem numerically using the dynamic optimization software GEKKO. Documentation for the software can be found at https://gekko.readthedocs.io/.

Fig. 3.

Fig. 3

Optimal schedules (top horizontal bars indicating incentives on) and response curves (underneath) associated with the US population. Black curve is the uncontrolled subpopulation, blue curve is the maximized population, red curve is the minimized population. In this case, the incentive schedules start at the inflection point of the uptake curve, which is roughly at T88 days, with 18% of the population vaccinated. The dependent variables in our system are optimized through one fundamental cycle time of length T/422 days using only an endpoint cost (Meyer problem). Dark blue/red color corresponds to the u1 schedule, light blue/red color corresponds to the u2 schedule in Eq. (16).

4. Results

As a first important example, we show in Fig. 3 an optimized result using the US population curve as a baseline, with optimized schedules for the controllers u(t)=(u1(t), u2(t)) shown as horizontal bars at the top, and the maximized (blue) and minimized (red) values of the % vaccinated at the end of one cycle T/4. The optimized incentive schedule, which begins at the inflection point of the Gompertzian uptake curve, is able to push the vaccinated population up to a value of 38%, as compared to the unincentivized value of 29% that would have naturally occurred in the absence of any incentives. The control schedule is bang–bang (off/on), with the bars showing the time each of the controllers is on.

Associated with the schedules and responses shown in Fig. 3, we show a complementary and useful interpretation of our method in Fig. 4 in terms of the phase portraits (x˙1 vs. x1) that correspond to the optimized trajectories through the four separate time-regions defined from the off/on schedules in Fig. 3. Figs. 4(a)–(d) show the maximized dynamics in the four regions along with the name of the evolutionary game defined by the controlled payoff matrix in each regime. On the first leg (Fig. 4(a)), the subpopulation x1 actually decreases while a Prisoner’s dilemma game is the governing matrix. But then in the next three legs (Fig. 4(b), (c), (d)), the games are changed (Leader-Deadlock-Game #10), and the endpoint (shown by the blue curve in figure 3 is pushed up to its maximally achievable value of 37%. In Figs. 4(e)–(h) we show the corresponding four legs associated with the minimization procedure from Fig. 3. Here, the sequence of evolutionary games cycle through the reverse order as in the maximization procedure (Game #10-Deadlock-Leader-Prisoner’s Dilemma) to push the vaccinated population down to its lowest value of 23% at the end of the optimization cycle. Note that in all cases, the exact switching times from one game to another are determined from the outcome of the optimization procedure, and depends on the starting value x1(t0). We mention a recent paper by Tanimoto [41] which also discusses the connection between scaling the social dilemma strengths and its affect on the resulting phase plane dynamics which is related to this interpretation.

Fig. 4.

Fig. 4

Phase portraits associated with US population with incentive schedules starting at the inflection point of the uptake curve as shown in Fig. 3. In each of the four legs of the cycle (see Fig. 3), a different evolutionary game is being played, which drives the system along an optimal path. The dynamics along the x1 horizontal axis proceeds from the initial open circle to the closed one. (a) Phase diagram of vaccinated population through first leg of the maximizing schedule (Prisoner’s dilemma game); (b) Phase diagram of vaccinated population through second leg of the maximizing schedule (Leader game); (c) Phase diagram of vaccinated population through third leg of the maximizing schedule (Deadlock game); (d) Phase diagram of vaccinated population through fourth leg of the maximizing schedule (Game No. 10); (e) Phase diagram of vaccinated population through first leg of the minimizing schedule (Game No. 10); (f) Phase diagram of vaccinated population through second leg of the minimizing schedule (Deadlock game); (g) Phase diagram of vaccinated population through third leg of the minimizing schedule (Leader game); (h) Phase diagram of vaccinated population through fourth leg of the minimizing schedule (Prisoner’s dilemma).

We now use our optimization method to answer several specific questions that give insight into how well an optimized incentive vaccine rollout program can perform.

State-by-state results

The first question we address using our optimized incentive model, is whether it is possible to incentivize the states with low vaccine uptake curves (West Virginia and Idaho) to bring them up to the level of states with high uptake curves (Vermont and Connecticut). Fig. 5(a) shows the result of our simulations for Idaho. With relatively large incentive sizes roughly between 15%–20% (measure normalized by the baseline value), we show this is possible. But we consider this range of incentive sizes to be so large that the price of implementing them might be prohibitive. Fig. 5(b) shows our simulations for West Virginia, with the same general conclusions as Idaho. Incentive sizes this large can have a considerable effect, but the price would be high to implement them. At this point we have not included any running penalty in our cost function which could account for incentive sizes.

Fig. 5.

Fig. 5

Maximizing and minimizing vaccination percentages for state with lowest vaccination rates, West Virginia and Idaho, in comparison with states with the highest vaccination rates, Vermont (purple) and Connecticut (green). (a) Solid blue curve depicts Idaho’s natural vaccine uptake curve. Dashed curves show the Idaho optimized model with upper bounds using 5%, 10%, 15%, 20% incentive sizes (normalized using baseline value) and lower bounds using 5%, 10%, 15%, 20% incentive sizes (normalized using baseline value);(b) Solid red curve depicts West Virginia’s natural vaccine uptake curve. Red dashed curves show the West Virginia optimized model with upper bounds using 5%, 10%, 15%, 20% incentive sizes (normalized using baseline value) and lower bounds using 5%, 10%, 15%, 20% incentive sizes (normalized using baseline value).

Optimal timing

We next address whether or not the initial start-time of our optimal incentive schedule has much impact on the end result. The short answer to this is no, it does not, as shown in Fig. 6 panel for the (a) Vermont population, (b) Connecticut population, (c) Idaho population, and (d) West Virginia population. In all cases, the incentivized curves (dashed) asymptote to the absolute max/min curves no matter when the schedules begin. This indicates that we could begin the schedules at the inflection point of the uptake curves, allowing us to collect and develop the model in real time as the uptake dynamics unfolds, designing the optimal incentive schedules to use going forward. The one caveat with this approach is that although the curves all reach the same asymptote, if time is of the essence (say because of high death rates in the unvaccinated population), there could well be advantages to starting the incentive schedules as early as possible. To design optimal schedules in real time before reaching the inflection point of the uptake curve would require a separate careful forecasting model based only on earlier data.

Fig. 6.

Fig. 6

Maximizing and minimizing vaccination uptake with controllers turning on at different times. All plots show incentive schedules beginning at t0=T4n for increasing n. In all cases, the incentivized schedules reach upper (blue) and lower (red) asymptotes, indicating relative insensitivity of the optimal outcome to start times. (a) Vermont unincentivized uptake curve (black), T=88 days. Blue shows optimized upper bounds using incentive schedules, red shows lower bounds using incentive schedules; (b) Connecticut unincentivized uptake curve (black), T=86 days. Blue shows optimized upper bounds using incentive schedules, red shows lower bounds using incentive schedules; (c) Idaho unincentivized uptake curve (black), T=77 days. Blue shows optimized upper bounds using incentive schedules, red shows lower bounds using incentive schedules; (d) West Virginia unincentivized uptake curve (black), T=63 days. Blue shows optimized upper bounds using incentive schedules, red shows lower bounds using incentive schedules.

In Fig. 7 we highlight both the results of the incentive schedules as well as the optimized results starting at three different times. The time-windows are shown in Fig. 7(a) in black rectangles, before the inflection point, at the inflection point, and after the inflection point. Corresponding optimized results with schedules are shown in Figs. 7(b), (c), (d). The plots show that incentive schedules are generally more effective at pushing the final value up for the later time-windows, which is somewhat counter-intuitive. We interpret this result to mean that incentives are most effective when rolled out into a population of unvaccinated people who make up a smaller part of a larger population with a relatively high percentage of vaccinated individuals. The unvaccinated are more receptive to the incentives when they are surrounded by people who have already been vaccinated. This is what the model predicts, but whether or not this pans out in an actual vaccine rollout would need to be tested, as there might well be other factors at play not considered in our model.

Fig. 7.

Fig. 7

Optimal schedules and optimal responses for early, medium, and late times. Blue schedules and curves maximize the % vaccinated at the end of one cycle time. Red schedules and curves minimize the % vaccinated at the end of one cycle time. Dark blue/red color corresponds to the u1 schedule, light blue/red color corresponds to the u2 schedule. Black curve is the uncontrolled response curve. (a) Three black rectangles show early, medium, and late time regions over which we introduce the first optimization cycle; (b) Max and min schedules and responses for early time window; (c) Max and min schedules and responses for medium time window at the Gompertzian inflection point; (d) Max and min schedules and responses for late time window.

Incentive-response curves

We now address the question of what incentive size leads to the best response? Fig. 8 shows the percent shift in the asymptote (US population) for different size incentives, both upper (blue) and lower (red) bounds. With no incentive (0%), the asymptote remains at 58% of the population, as expected. In general, the larger the incentive, the larger the response.

Fig. 8.

Fig. 8

Absolute maximizers (blue) and minimizers (red) for a range of incentive intensities using US population model. Cycle times 44daysT352days.

Notice that for higher percentage incentives, the curves rise much faster than for low, which means the optimized results will be achieved more quickly. But it is also important to point out that even for lower incentives, the upper asymptote will eventually be reached. The bottom line here is that, as would be expected, it seems that the smaller the incentive, the longer it takes to achieve the desired result.

Diminishing returns

Is there a point of diminishing returns on implementing larger incentives, after which the response diminishes? Fig. 9 shows an incentive-response curve for the US population. Our model produces a curve (data points fit to three-parameter Gompertzian curve) depicting the incentive strength (abscissa) versus the change in asymptote (ordinate). For incentive strengths below 11%, the curve is concave up, indicating a better response with higher incentives. Above 11%, however, the curve is concave down, indicating a weaker response to higher incentives. We can think of this threshold value (11%) as a point of diminishing returns. This is in many ways analogous to dose–response curves in chemotherapy settings [10] where past a threshold, increasing the dose further shows a diminished response. This leads to a threshold value of optimal incentive size, which our model predicts is roughly 11%. We place more value in showing that such a threshold exists in our model, than the actual threshold value, which can be tricky to pin down accurately without more detailed analysis.

Fig. 9.

Fig. 9

Incentive-response curve fit to data. Abscissa indicates the incentive size (measured as % of baseline). Ordinate shows the shift in the asymptote (measured as % of baseline). Also shown is the first derivative of the response curve. Response curve is concave up for incentive strengths below 11%, and concave down for larger incentives indicating diminishing returns in terms of response.

5. Discussion

Although it is presumably unrealistic to assume that optimality will actually be achievable in practice, optimal control nonetheless gives clear upper and lower bounds on what is theoretically possible in an ideal setting. But there are several tangible ways the model could be improved. First, we make the simplifying assumption that response times to incentives are instantaneous. Building in finite-time responses (i.e. time delays) would make the model more realistic. Second, the hesitant population could be further sub-divided into groups, such as hesitant but willing, hesitant and unwilling, with incentives influencing each of those groups differently. This would lead to a higher dimensional model with more complexity but perhaps higher fidelity. Third, the model assumes what is called a well-mixed population (i.e. no spatial structure). A spatially dependent model would be significantly more complex but has the potential to be more targeted geographically. Finally, the psychological aspects of how people, states, groups, respond to different incentives are not considered in our model. Matching the size of our controllers with actual incentives/punishments would best be handled by experts in human psychology and is not addressed in our approach. Additionally, the size of the incentives (in terms of financial cost) are not part of our cost function in the optimization procedure which may well be desirable in future model improvements by including a running penalty term to our mathematical cost function (20) which takes into account incentive size.

Two strengths of our model we would like to emphasize are that only data up until the inflection point needs to be used, and starting the incentives after that point will ultimately lead to the same shift in the asymptotic percentage of vaccinated people as would have happened if the incentives started earlier. It is not a priori clear whether or not nationwide, state-wide, or even more localized data is most useful, but models that use more localized information (at least state-wide) would probably be more useful as it seems probable that different regions of the country would respond differently to different kinds of incentives.

6. Conclusion

Every vaccine rollout associated with each new epidemic will have its own natural uptake curve, depending essentially on the complex nature of the interactions between the vaccine adopters and the vaccine hesitant populations, and also the interactions within each group, all of which are nicely encoded as elements of the payoff matrix as determined by the data. But it is not unreasonable to speculate that they should all commonly follow the general form of a three-parameter Gompertzian, with different parameters in each case, and geographic location (targeted population), but of the same universal form. This general form is an outcome of the fact that there are, generally speaking, early adopters, followed by a population of players who decide to adopt as time proceeds, leaving only the most hesitant who remain towards the later stages of a rollout. As a vaccine rollout unfolds, the key parameters to obtain from the vaccine uptake curve are: (i) the initial rate of uptake (which we write as % of the relevant population per day), (ii) the inflection point location on the uptake curve (i.e. when uptake begins to slow down), and (iii) the slope of the tangent line at the inflection point (rate of uptake). As long as reliable data is available up until the inflection point, the reinforcement learning model described can then be developed and calibrated in real-time. When the uptake rate begins to slow (i.e. at or near the inflection point), using the controlled replicator dynamical system model, vaccine incentive schedules can be optimized going forward and likely responses can be predicted from the dose–response curves produced by the model. A recent review paper [39] has highlighted the importance and need for using game theory and mathematical models in designing vaccine policy, which we enthusiastically endorse and feel is an under-utilized tool in the arsenal of developing science-based decision making during an infectious outbreak. The framework developed here, which allows the mathematical models to be tailored to specific settings, offers the possibility of testing different strategies in real time for many different scenarios and is flexible, generalizable, relatively simple, and potentially actionable.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

We gratefully acknowledge support from the Army Research Office, United States of America MURI Award # W911NF1910269 (2019–2024). All authors approved the version of the manuscript to be published.

Communicated by V.M. Perez-Garcia

Data availability

Data is publicly available and links are provided.

References

  • 1.MacDonald N. Vaccine hesitancy: Definition, scope and determinants. Vaccine. 2015;33:4161–4164. doi: 10.1016/j.vaccine.2015.04.036. [DOI] [PubMed] [Google Scholar]
  • 2.Quinn S., Jamison A., An J., Hancock G., Freimuth V. Measuring vaccine hesitancy, confidence, trust and flu vaccine uptake: Results of a national survey of white and African American adults. Vaccine. 2019;37:1168–1173. doi: 10.1016/j.vaccine.2019.01.033. [DOI] [PubMed] [Google Scholar]
  • 3.Smith J., Lipsitch M., Almond J. Vaccine production, distribution, access, and uptake. Lancet. 2011;378:428–438. doi: 10.1016/S0140-6736(11)60478-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Eskola J., Duclos P., Schuster M., MacDonald N. How to deal with vaccine hesitancy? Vaccine. 2015;33:4215–4217. doi: 10.1016/j.vaccine.2015.04.043. [DOI] [PubMed] [Google Scholar]
  • 5.Higgins S., Klemperer E., Coleman S. Looking to the empirical literature on the potential for financial incentives to enhance adherence with COVID-19 vaccination. Preventive Med. 2021;145 doi: 10.1016/j.ypmed.2021.106421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Acharya B., Dhakal C. Implementation of state vaccine incentive lottery programs and uptake of COVID-19 vaccinations in the United States. JAMA Network OPEN. 2021;4(12) doi: 10.1001/jamanetworkopen.2021.38238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.McArdle M. Harsher and harsher punishments are not the way to convince the unvaccinated. Washington Post. 2021;Dec. 15 [Google Scholar]
  • 8.Jena A., Worsham C. Facts alone aren’t going to win over the unvaccinated. This might. New York Times. 2021;Dec. 21 [Google Scholar]
  • 9.Thorp H. Colleges need vaccine mandates. Science. 2021;373(6553):369. doi: 10.1126/science.abl4884. [DOI] [PubMed] [Google Scholar]
  • 10.Martin J., Dimmitt S. The rationale of dose-response curves in selecting cancer drug dosing. Br. J. Clin. Pharmacol. 2019;85:2198–2204. doi: 10.1111/bcp.13979. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Korn L., Böhm R., Meier N., Betsch C. Vaccination as a social contract. Prod. Nat’L Acad. Sci. 2020;117(26):14890–14899. doi: 10.1073/pnas.1919666117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Bauch C., Galvani A., Earn D. Group interest versus self-interest in smallpox vaccination policy. Prod. Nat’L Acad. Sci. 2003;100(18):10564–10567. doi: 10.1073/pnas.1731324100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bauch C., Earn D. Vaccination and the theory of games. Prod. Nat’L Acad. Sci. 2004;101(36):13391–13394. doi: 10.1073/pnas.0403823101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Bhattacharyya S., Bauch C. Wait and see vaccinating behavior during a pandemic: A game theoretic analysis. Vaccine. 2011;29:5519–5525. doi: 10.1016/j.vaccine.2011.05.028. [DOI] [PubMed] [Google Scholar]
  • 15.Bauch C., Bhattacharyya S. Evolutionary game theory and social learning can determine how vaccine scares unfold. PLOS Comp. Bio. 2012;8(4) doi: 10.1371/journal.pcbi.1002452. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Bauch C. Imitation dynamics predict vaccinating behavior. Proc. Roy. Soc. B. 2005;272:1669–1675. doi: 10.1098/rspb.2005.3153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Tanimoto J. Springer; 2021. Sociophysics Approach to Epidemics. [Google Scholar]
  • 18.Wang Z., Bauch C., Bhattacharyya S., Zhao D. Statistical physics of vaccination. Phys. Rep. 2016;664:1–113. [Google Scholar]
  • 19.Ma Y., Newton P. Role of synergy and antagonism in designing multi drug adaptive chemotherapy schedules. Phys. Rev. E. 2021;103 doi: 10.1103/PhysRevE.103.032408. [DOI] [PubMed] [Google Scholar]
  • 20.West J., You L., Zhang J., Gatenby R., Borwn J., Newton P., Anderson A. Towards multi-drug adaptive therapy. Cancer Res. 2020;80(7):1578–1589. doi: 10.1158/0008-5472.CAN-19-2669. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Newton P., Ma Y. Nonlinear adaptive control of competitive release and chemotherapeutic resistance. Phys. Rev. E. 2019;99 doi: 10.1103/PhysRevE.99.022404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.West J., Ma Y., Newton P. Capitalizing on competition: An evolutionary model of competitive release in metastatic castration resistant prostate cancer treatment. J. Theor. Bio. 2018;455:249–260. doi: 10.1016/j.jtbi.2018.07.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Newton P., Ma Y. Maximizing cooperation in the prisoner’s dilemma evolutionary game via optimal control. Phys. Rev. E. 2021;103(1) doi: 10.1103/PhysRevE.103.012304. [DOI] [PubMed] [Google Scholar]
  • 24.Stuckey K., Dua R., Ma Y., Parker J., Newton P. Optimal dynamic incentive scheduling for Hawk-Dove evolutionary games. Phys. Rev. E. 2022;105(014412) doi: 10.1103/PhysRevE.105.014412. [DOI] [PubMed] [Google Scholar]
  • 25.West J., Hasnain Z., Mason J., Newton P. The prisoner’s dilemma as a cancer model. Converg. Sci. Phys. Oncol. 2016;2(3) doi: 10.1088/2057-1739/2/3/035002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.West J., Newton P. Chemotherapeutic dose scheduling based on tumor growth rates provides a case for low-dose metronomic high-entropy therapies. Cancer Res. 2017 doi: 10.1158/0008-5472.CAN-17-1120. 10.1158/0008–5472.CAN–17–1120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Li G., Shivam S., Hochberg M., Wardi Y., Weitz J. Disease-dependent interaction policies to support health and economic outcomes during the COVID-19 epidemic. IScience. 2021;24(7) doi: 10.1016/j.isci.2021.102710. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Bertsekas D. Athena Scientific; Belmont MA: 2019. Reinforcement Learning and Optimal Control. [Google Scholar]
  • 29.Winsor C. The Gompertz curve as a growth curve. Proc. Nat’L. Acad. Sci. 1932;18(1):1–8. doi: 10.1073/pnas.18.1.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Laird A. Dynamics of tumor growth: Comparison of growth rates and extrapolation of growth curve to one cell. Br. J. Cancer. 1965;19(2):278–291. doi: 10.1038/bjc.1965.32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Gerlee P. The model muddle: In search of tumor growth laws. Cancer Res. 2013;73(8):2407–2411. doi: 10.1158/0008-5472.CAN-12-4355. [DOI] [PubMed] [Google Scholar]
  • 32.West J., Hasnain Z., Macklin P., Newton P. An evolutionary model of tumor cell kinetics and the emergence of molecular heterogeneity and Gompertzian growth. SIAM Rev. 2016;58(4):716–736. doi: 10.1137/15M1044825. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Traulsen A., Claussen J.C., Hauert C. Coevolutionary dynamics: From finite to infinite populations. Phys. Rev. Lett. 2005;95(23) doi: 10.1103/PhysRevLett.95.238701. [DOI] [PubMed] [Google Scholar]
  • 34.Dua R., Ma Y., Newton P. Are adaptive chemotherapy schedules robust? A three-strategy stochastic evolutionary game theory model. Cancers. 2021;13(2880) doi: 10.3390/cancers13122880. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Park J., Newton P. Stochastic competitive release and adaptive chemotherapy. BioRxiv. 2022 doi: 10.1101/2022.06.17.496594. [DOI] [PubMed] [Google Scholar]
  • 36.Nowak M.A. Harvard University Press; 2006. Evolutionary Dynamics. [Google Scholar]
  • 37.Schuster P. Replicator dynamics. J. Theor. Bio. 1983;100:533–538. [Google Scholar]
  • 38.Börgers T., Sarin R. Learning through reinforcement and replicator dynamics. J. of Econ. Theory. 1997;77:1–14. [Google Scholar]
  • 39.Piraveenan M., Sawleshwarkar S., Walsh M., Zablotska I., Bhattacharyya S., Farooqui H., Bhatnagar T., Karan A., Murhekar M., Zodpey S., Rao K., Pattison P., Zomaya A., Perc M. Optimal governance and implementation of vaccination programmes to contain the COVID-19 pandemic. Roy. Soc. Open Sci. 2021;8(210429) doi: 10.1098/rsos.210429. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Hsu J., Meyer A. McGraw Hill Book Company; 1968. Modern Control Principles and Applications. [Google Scholar]
  • 41.Ito H., Tanimoto J. Scaling the phase-planes of social dilemma strengths show game-class changes in the five rules governing the evolution of cooperation. Royal Soc. Open Sci. 2018;5(181085) doi: 10.1098/rsos.181085. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data is publicly available and links are provided.


Articles from Physica D. Nonlinear Phenomena are provided here courtesy of Elsevier

RESOURCES