Abstract
Abstract
This paper presents a hierarchical approach for controlling the spread of an epidemic disease. The approach consists of a three-layer architecture where a set of two-layer multiple social networks is governed by a (third) top-layer consisting of an optimal control policy. Each of the two-layer social networks is modeled by a microscopic Markov chain. On top of all the two-layer networks is an optimal control policy that has been developed by using an underlying Markov Decision Process (MDP) model. Mathematical models pertaining to the top-level MDP as well as two-layer microscopic Markov chains have been presented. Practical implementation methodology using the proposed models has also been discussed along with a numerical example. The results in the numerical example illustrate the control of an epidemic using the optimal policy. Directions for further research and characterization of the optimal policy have also been discussed with the help of the same numerical example.
Article Highlights
An optimal approach for controlling the spread of an epidemic infection.
The approach is able to model the uncertainties involved in the problem.
The approach is able to cater for the underlying social network.
Keywords: Markov decision process, Epidemics control, Two-layer networks, Markov chain approach
Introduction
Recent outbreak of coronavirus has reminded us to be more vigilant in developing effective mathematical models for epidemic control. Mathematical models for epidemics serve two main purposes. First is to predict the dynamics of a disease and second is to devise appropriate control measures to prevent the spread of the same. There has been a lot of research on developing various mathematical models for epidemics. Consequently, there are multiple existing models (mostly based on the discrete-time difference equations or continuous-time differential equations) such as SIR (susceptible-infected-recovered) model [1], and SEIR (susceptible-exposed-infected-recovered) model [2]. Recently proposed models employ network topologies into the development of epidemics spread and control models [3]. Some stochastic models have also been proposed, but there are very limited attempts on applying stochastic control for epidemic control [3].
Latest research on epidemic control has considerable focus on reducing the rate of contact among the individuals in a social setup [4] but the downside of reduced social activities has also been highlighted [5], hence a balance is required between controlling the spread of an epidemic and avoiding new social issues in doing so. In principle, there are three main classes of epidemic models [6], i.e., differential equations based models [7], agent-based (networked) models [8], and Markov decision process (MDP) based models [9, 10]. Each type of model has its own advantages and drawbacks. For example, the differential equations based models are flexible and offer insights such as equilibrium points and stability analysis. On the other hand, MDP based models offer handling of uncertainty and calculation of optimal decision making policy. Finally, the graph-based or agent based or networked models offer a bottom-up approach where the agents can be heterogeneous representing various population groups based on age, gender, social status etc. One can also perform sensitivity analysis on such models for identification of influential parameters.
The concept of having multiple layers in an epidemic model has previously been introduced in [11] where two-layer networked model is proposed with one layer comprising of SIR-type agents and the other layer comprising of aware-unaware agents. This model has recently been discussed further in [12] where the effectiveness of the model has been demonstrated and a mathematical result has been derived regarding the threshold of epidemic spread. Furthermore, in [8], it has been investigated how positive and negative preventive information affects the spread of the epidemic.
Although the two-layer model discussed in [11] and [12] does present an effective way of modeling the epidemics, there still exist two problems. One issue is scalability because the population of a town or city may be as much as hundreds of thousands. The second issue is the incorporation of uncertainty into the model that arises from variations in the interaction level among various subgroups of the population. In-fact there is no clear partition among various social groups in the models existing in the literature. For example, schools, colleges, universities, shopping malls, parks, offices, railway stations, airports etc. Represent various platforms for social interaction and incorporation of these platforms into the epidemic models can provide a better insight into predicting and controlling of an epidemic. The focus of this paper is to reduce the drawbacks and increase the advantages of an epidemic model by combining different approaches into a single multi-layer epidemics model.
The main idea of this paper is to propose a three-layer model where the top layer manages an awareness level, treatment, and interaction among various subgroups in the bottom layer. The bottom layer consists of multiple two-layer networks where each network represents a certain social interaction platform, e.g., a school, an office, a household, a shopping mall, a cinema etc. Each two-layer network is based on a microscopic Markov chain approach (MMCA) as discussed in [8] and [12] with some modifications needed to incorporate the difference among various social platforms. Figure 1 shows the framework of the proposed model. As indicated in the figure, the top layer uses an MDP-based model for generating optimal control policy regarding the treatment, awareness, and social interaction guidelines for the bottom layer networks. The bottom layer networks convey the infection level (i.e., fraction of the population within the network that has been infected) to the top layer along with awareness level. The top layer uses information from each of the networks in the bottom layer to decide the next set of decisions according to a pre-calculated optimal policy. Some closely relevant research contributions include the dynamic programing based optimal policy for immunization [13, 14] and work on an awareness layer within the social networks [15, 16]. Another interesting dimension of related work involves modeling of human behavior and prediction of the risk of epidemic spread [17, 18]. For example, in [17], the authors have proposed two different models for human behavior, i.e., the information forgetting curve (IFC) model and the memory reception fading and cumulating (MRFC) model. It has been shown that MRFC is well suited for modeling the epidemics that are more lethal whereas IFC is suitable for low-risk epidemics. Both of the models are based on two-layer networks.
Fig. 1.
Multi-layered model framework
Major reason for proposing the third layer in this paper is that the two-layer network-based approach (although it does provide great insights into the spread of the disease) does not provide any guidance regarding how to optimally control the spread of the disease. Controlling the spread is by no means a straightforward task because it requires the information regarding the available resources, the cost associated with the consumption of the resources, the effectiveness of the resource utilization policy etc. Therefore, a third layer is necessary for efficiently controlling the spread of the disease under various situations. The third layer from Fig. 1 is able to cater for the uncertainties involved in the problem and is able to enable the calculation of a decision policy that is optimal with respect to the available resources and the known structure of the uncertainties involved in the problem.
The use of artificial intelligence in epidemic control is endorsed by recent work on the epidemics control spread prediction using machine learning [19]. In this regard, there is another relevant approach based on game theory [20]. Both of these approaches have close link with an MDP-based approach such as the one in this paper. However, in the existing approaches, the link to the social and contact models is insufficient. Also, MDP has the advantage over other artificial intelligence-based approaches that it can model uncertainties in the problem. In terms of social lock downs and limitations on the social interactions, there have been multiple studies. For example, approaches to lift the lockdown in London [21], the impact of human mobility in China [22], the impact of travel restrictions on the spread of epidemics [23] etc. Further studies focus on the impact of the heterogenous nature of the population of the vaccination policies [24], the spread of the disease despite the restrictions on the social gatherings [25], and sideways contact tracing in large gatherings [26]. All of the recent work endorses the utilization of a complex decision-making approach that involves modeling of the social and contact layers and that utilizes multiple strategies, i.e., a combination of quarantine, vaccination, and treatment. Therefore, the focus of the approach of the current study is along the same lines.
The major contribution of the current work is to propose a multi-network model of the society with a centralized control layer to minimize the spread of an epidemic disease. Modeling the society as comprising of multiple networks (as proposed) allows for incorporation of real-life social interaction platforms in the society. The centralized control over all the networks represents the local governing bodies that are responsible for providing the treatment facilities, social awareness, and the standard operating procedures to minimize the spread of an epidemic disease. Practical implementation of the proposed model and the calculation of the associated control policy requires the identification of different social interaction platforms in the society and some initial statistical data regarding the spread of the disease. Furthermore, the costs of treatment and awareness campaigns are also needed for implementation of the proposed approach. Once the prerequisite information is provided for, our proposed model can be used to calculate the optimal policy that provides the best decision (whether to launch an awareness campaign, vaccinate people, or launch a treatment campaign) based on the given situation (number of people infected, susceptible, aware, unaware, recovered, etc., belonging to various social and contact networks).
The paper is organized such that Section II presents mathematical models, Section III includes the implementation strategy and optimal policy calculation. Section IV shows how to apply the proposed model and understand the optimal policy. Concluding remarks are written in Section V.
Mathematical model
The proposed mathematical model for epidemics is discussed in this section. We begin with the discussion of bottom layer MMCA-based model that has been adopted from [11, 12] followed by the top layer model that is indigenously developed in this paper.
A. Two-layer network model based on MMCA
Each two-layer network in our proposed model consists of a physical contact layer and a virtual communication layer as discussed in [12]. Figure 2 presents the framework of the two-layer network considered here. As shown in the figure, an individual in the communication layer can either be aware (A) or unaware (U). An individual in the network can become aware with probability and can forget about gained awareness with probability .
Fig. 2.

Framework of two-layer multiplex networks. Upper layer indicates communication among the individuals and lower layer indicates physical contact. The dotted line indicates the relationship between the two layers where an aware/unaware individual could be susceptible, infected, or recovered
In the physical contact layer, an individual can be susceptible (S), infected (I), or recovered (R). Let be the probability with which a susceptible individual can become infected (for our purposes, can be chosen to have different value for different networks/social subgroups, consequently, each network has its own ). Similarly, consider the curing rate to be . The probability of getting an infection for the aware individuals may be reduced by a factor , i.e., where .
With the two layers discussed above, each node within each network has six possible states, i.e., aware and susceptible (AS), unaware and susceptible (US), aware and infected (AI), unaware and infected (UI), aware and recovered (AR) and unaware and recovered (UR). At any given time, t, and individual i can be in any one of the six possible states. Furthermore, an individual at time step t may transition from any (of the six) state to any (of the six) state in the next time step (). The allowable state transitions include remaining in the same state as well. The corresponding state probabilities are given by and . Note that the subscript in the state probabilities indicates a node (an individual person) in the two-layer network. Hence is the probability for an individual being unaware and susceptible at time (similar definitions apply to the other five probabilities). Figure 3 shows the dynamics within a network in the form of possible transitions between the states. For example, an unaware and susceptible individual may become aware and infected or unaware and infected or aware and susceptible or stay unaware and susceptible. Network-related probabilities are used in the next section for deriving the state transition probabilities for the top layer MDP model. The matrices indicating adjacency among the individuals in the contact layer and the communication layer are and respectively. Where if two individuals i and j are adjacent in the contact layer and otherwise. Similarly, if two individuals i and j are adjacent in the communication layer and otherwise. Furthermore, in the communication layer, the probability for an individual i to remain unaware at time t is denoted as . On the contact layer, the probabilities of susceptible individuals not getting infected from the neighbors are denoted as and for aware and unaware individuals respectively. The expressions for these probabilities are as follows
| 1 |
| 2 |
| 3 |
Fig. 3.

State transitions in a network
Note that in (1), the probability of remaining unaware for an individual i increases if its neighbors (represented by the index ) are unaware. Similarly, the probability of not getting infected increases if the neighbors are not infected. The values of the probabilities may vary from network to network, but the equations/relationships remain the same.
The dynamical evolution of the six states in MMCA model using (1), (2), and (3) is presented as follows
| 4 |
where refers to the next time step (assuming the duration of a time step unit) and is self-perception rate which represents the transition probability from the state UI to the state AI, i.e., self-perception rate determines how quickly the awareness is created among the unaware individuals. Also, note that as , the probabilities in (4) reach a steady state value, i.e., . Furthermore, the following equality holds at all times.
| 5 |
It has been shown in [12] that under the assumptions that , and (when the initial value of infected nodes is small enough), the threshold of epidemic outbreak is given by
| 6 |
where is the largest eigen value of matrix H with and .
B. Top-layer MDP Model
While the bottom-level networks represent the dynamics of various social subgroups, the top layer MDP model is primarily used for control purposes. An MDP model consists of a set of discrete states (X), set of decisions (D), and cost function (J), a transition probability function (F), and a discount factor that determines whether to focus on long term decision making () or short-term decision making ().
States in an MDP model signify the information required (or available) in order to make a decision. In the problem at hand, we have a set of social networks with a population where each individual may be susceptible, infected, or recovered. Consequently, one may argue that we need the information regarding the status of each individual in a network for making a decision regarding the treatment as done in [9] and [10]. But the problem with such an approach is that the computational complexity involved in the calculation of the optimal policy is prohibitive. Therefore, a more innovative approach is required in defining the states of the MDP model. Here we notice that the decisions to be made by the optimal policy (as discussed later) is regarding an awareness campaign, and a treatment drive. An awareness campaign in real life could be a public service message on the television and radio or in the form of billboards on highways or in the downtown area of a city. Similarly, treatment drives (in real life scenario) may be launched in local hospitals for treating the infected individuals. This involves provision of necessary medical equipment and medicine. Based on the decisions to be made, there is information of two types that is needed in a state, i.e., the infection ratio of each network ( for network) and the awareness ratio of each network ( for network). The infection ratio is defined as the ratio of infected individuals to the total population of the network. Consequently, the infection ratio ranges between 0 and 1 where the value 1 means that the whole population of the network is infected. Similarly, the awareness ratio is the ratio of aware individuals in a network to the total population of the network. The value of awareness ratio also ranges between 0 and 1 where the value 1 means that all of the individuals in a network are aware of the epidemic spread. As a result, the state space is represented as
| 7 |
In (7), the infection level and the awareness level information is required for calculating the desirable control input. Both types of information have been assigned discrete values between 0 and 1 with equal spacing between any two adjacent values (for a network with individuals, ). Note that the proposed model is suitable for the SIS (Susceptible-Infected-Susceptible) type of situations where an individual may get re-infected after the recovery from disease. For situations where some of the population is immune (recovered) to the disease, the information of infection ratio alone will still be useful, but we would be ignoring the fact that not all who are not infected are susceptible and hence the decision regarding the awareness campaign may be suboptimal.
As discussed earlier, the set of decisions includes launching an awareness campaign and launching a treatment drive. In our model, a hospital (or set of hospitals) can be regarded as a social subgroup (represented by a two-layered network as described in the previous subsection). A treatment drive executed on a network shall remove infected individuals from that network and add the same to the network representing hospitals. Consequently, the decision of executing treatment drive can only be implemented on networks and not on the network representing hospitals. On the other hand, an awareness campaign can be executed in all networks including hospitals since we are allowing for the possibility of unaware infected individuals. Mathematically, the set of decisions is written as
| 8 |
All of the above decisions are Boolean variables with default value false. Executing a decision means turning the corresponding decision variable value to be equal to true for a single decision epoch (a decision epoch is the time interval between two consecutive decisions and is specified by the user). For example, is a Boolean no-decision option that is included to enable the control policy to do nothing (switching means no decision regarding awareness or treatment for one decision epoch, by default). This decision is useful in the situations where the treatment or awareness campaigns are not needed or are too expensive. Also, refers to the decision of launching an awareness campaign in network 1 and so on. Launching an awareness campaign () increases the probability of rise in the awareness ration () of the corresponding () network. Similarly, refers to the decision of launching a treatment campaign in network 1 and so on (except the network which is assumed to be the hospital where all of the individuals are already under treatment). Notice that although we have assumed only one hospital, one may assume more hospitals in the community. Moreover, launching a treatment campaign () increases the probability of fall in the infection ration () of the corresponding () network.
State transition probabilities are of two types. One is the probability of increase or decrease in awareness ratio () in jth network. Recall that the probability for an individual i in a network to remain unaware is and the probability of forgetting is (independent of the network topology). To simplify the mathematics, it has been assumed that the maximum number of individuals in each network (m) is the same and that is the inverse of the number of individuals in a network. Finally, it has been assumed that during a single time step, can either remain same, increase by or reduce by . This means that awareness and unawareness spread at the rate of one individual at a time (recall that an individual in the network can become aware with probability and can forget about gained awareness with probability ). Increasing this rate would require the mathematical relation for the transition in to accommodate more possibilities which can be done but is avoided here. Before discussing the transition probabilities of the MDP layer, it is important to mention that the state probabilities for the lower layer networks have steady states [12], e.g., the steady state value of is . Also, since we assume multiple networks (each with nodes), we define the individual node state probabilities as for the steady state probability of aware and susceptible node of network (note that since Eqs. (1–6) are for a single network, therefore, additional subscript is not needed there). Consequently, the transition probability expression for under above assumptions is given as
| 9 |
| 10 |
| 11 |
where and for jth network are given by,
| 12 |
| 13 |
Here, (9) gives the probability that the fraction of awareness in each subnetwork j remains same. Expression (10) provides the probability of an increase in awareness and (11) presents the probability of decrease in awareness. Note that the sum of right-hand sides in (9), (10), and (11) is 1. Furthermore, the sum of right-hand side of Eqs. (12) and (13) is also 1 (this is by chance since all of the six states are involved in the two equations. We shall see later in Eqs. (17) and (18) that this is not the case). The equations in (10) are based on average likelihood of someone becoming aware from being unaware and no one becoming unaware from being aware whereas, Eq. (11) is based on average likelihood of someone becoming unaware from being aware and no one becoming aware from being unaware in the network . Note that executing an awareness campaign in the network j would be modeled by reduction in (please refer to Eq. (1)) for each individual in the network by a factor that would be the inverse of the strength of the awareness campaign, i.e., stronger the awareness campaign, lower the value of . Please note that the terms on the right-hand side of Eqs. (12) and (13) are the steady state values of the network state probabilities in Eqs. (1–6). The difference between and (in Eqs. (9–11)) is the time duration between two consecutive MDP states (note that this is different from the time step used in the Eqs. (1–6) because for each network, we use steady state probabilities in the MDP model). This duration is equal to one decision epoch. The exact value of the duration of the decision epoch is chosen by the user and the calculation formulas for the probabilities are not affected by this choice. Usually, the duration of an epoch may be a day, or a week or a value in between depending upon the severity of the situation and the rate of spread of the disease. Finally, note that the right-hand sides of probabilities in (9, 10, 11) and (14, 15, 16) are independent of , this means that the MDP state transition probabilities are stationary in nature. This is important for calculating the optimal policy for infinite horizon decision-making.
The second type of probability in the MDP model is the infection ratio probability. In this, recall that the recovery rate of an individual in a network is and the infection rate is . Consequently, under the assumptions stated earlier, the transition probability for the infection ratio of jth group () is given by
| 14 |
| 15 |
| 16 |
where and for jth network are given by,
| 17 |
| 18 |
Here, (14) gives the probability that the fraction of infection in each subnetwork j remains same. Expression (15) provides the probability of an increase in infection fraction and (16) presents the probability of decrease in infection. Note that the sum of right hand sides in (14), (15), and (16) is 1. Note that executing a treatment campaign in a network results in replacement of an infected individual with a recovered individual which increases the right hand side of (18) and reduces the right hand side of (17) depending upon the strength of treatment campaign.
It is important here to mention how the assumption of a fixed maximum number of individuals shall work especially when the treatment campaign is supposed to bring individuals into the hospital network. The proposed model works by replacing empty beds in the hospitals with recovered individuals, and every time an individual is shifted from another network j to a hospital, an imaginary recovered individual is placed in the network j to keep the total number of individuals consistent. Also, this implies that if the hospital network has 100% infected individuals, the treatment decision cannot be executed which is true in real life situations.
Based on the above discussion, the state transition probabilities can be specified using the joint probability model as follows
| 19 |
Finally, the cost function involves cost of being in ‘bad’ states (i.e., the states with low awareness ratio and/or high infection ratio) and the cost of executing an awareness campaign as well as the cost of a treatment drive. A mathematical expression for the cost function is as follows
| 20 |
where,
| 21 |
In (20), are positive constants representing the cost associated with the decisions and the states. For example, represents the cost associated with the infection ratio within a network, represents the cost associated with the lack of awareness, and represents the cost associated with the awareness campaign. Also, the priority levels () are constants that are used to differentiate the cost associated with sensitive population network (e.g., pregnant women, children, and elderly) and resilient population network (e.g., healthy youngsters). Note that the only decision is without any cost. Also, the cost in a state is associated with accumulative infection ratio () and awareness ratio () scaled by the priority level ().
Finally, the selection of discount factor in this case may be chosen to have any values close to one if it is desirable to have the resulting optimal control policy be able to consider long-term effects. For short-term results, value of can be selected close to zero.
Calculation of control policy and practical implementation
In this section, the integration of the MMCA based models and the top layer of MDP model is discussed followed by a possible approach for calculation of the optimal policy.
Integration of MMCA based models with MDP
Working of the multi-layered approach has been sketched earlier in Fig. 1. Here we provide the details of the communication between the MMCA based networks and the MDP-based model.
Major steps involved in the implementation of the proposed approach are shown in Fig. 4. As shown in the figure, first, we need to calculate the steady state probabilities corresponding to each individual in each network using the equations in (4). Then these steady state values are used to calculate the intermediate values in (12), (13), (17), and (18) that are used for calculation of transition probabilities in the MDP model.
Fig. 4.

Implementation strategy for multi-layered approach
Next step is to calculate the MDP policy using any of the standard stochastic dynamic programming algorithms such as value iterations or policy iterations. Execution of the policy in real time shall be achieved by feeding the data pertaining to each network, i.e., fraction of infected individuals and fraction of aware individuals to the MDP and resulting optimal decision is obtained from the optimal decision policy.
B. Calculation of optimal control policy
Calculation of the optimal policy for infinite time horizons using the value iteration method is discussed. This method uses the concept of the value function in order to determine the optimal policy in an iterative manner. The value of a state () is a measure of importance of a state in terms of how less it costs and how likely it is to lead to another state with low cost using appropriate decision. Value iteration method uses arbitrarily assigned initial values of the states along with the transition probabilities in the MDP model to determine the optimal value of each state (). Based on these optimal values, optimal decisions are calculated for each state. An optimal decision as a function of state is known as optimal policy (). Since the method is applied to stochastic decision-making problems, the optimization is in terms of the expected value given as
| 22 |
Note that the expected value function () in (22) is conditioned upon the control policy . The expected value of a random variable is an average of the probable values. In our case, the cost in a sequence of decision-making horizon is random because we cannot determine the exact sequence of states that will result from making decisions under the policy . The decision-making horizon h in a value iteration is assumed to be infinite. Also, is the discount factor as discussed in Sect. 2.2. The value of each state is calculated as
| 23 |
From (23), the value of a state depends upon the sum of cost incurred by that state and the value of the states that it can lead to. In the value iteration algorithm, we initially assign arbitrary values to the states and then update the values iteratively using the following relation.
| 24 |
Note that the iteration in (24) can be started with arbitrary (finite) state values and after sufficient number of iterations, the value function converges to an optimal value for each state (). Once the optimal value is obtained, the optimal policy is calculated using the following expression
| 25 |
Here V* represents optimal value and represents optimal policy or optimal policy function that provides an optimal decision for each state . Note that the optimal policy in this case is stationary, i.e., the optimal decision at each state is independent of the time at which the state is reached.
Relationship among the networks
The relationship among the networks is represented using the graph theory where each network is a node and the link between the two networks is an edge. A graph where is a set of nodes and is a set of edges is characterized by an adjacency matrix. Adjacency matrix is a square matrix of the size equal to the number of networks in the model. Each element () of an adjacency matrix is either 1 (meaning that network is connected with network ) or 0 (meaning that network has no connection with network , i.e., the individuals in both networks do not interact or communicate). The impact of interconnection among the networks (as far as the centralized control MDP is concerned) is upon the calculation of the transition probabilities. There are two possible approaches. One approach is to modify the transition probability Eqs. (9–18) in order to incorporate the impact of interaction among the networks upon the probability of change in awareness ratio and the infection ratio. A second approach to deal with the interconnected networks (that has been adopted in this paper) is to combine all interconnected networks into a single network. In this way, the interaction in incorporated in the consolidated adjacency matrices of the contact layer and the communication layer within the combined network (which is treated as a single network).
Now, one may argue that if the networks are independent, why centralized control (rather than the distributed control). While in principle, the distributed control is possible, e.g., having a separate MDP based policy for each network. However, in real life situations, the budget is allocated to a town or a city involving various sub-communities, i.e., shopping malls, cinemas, offices, schools, banks etc. Since the MDP policy decides on the allocation of resources, it is more realistic to have one MDP for multiple networks that lie within the scope of one decision making authority, e.g., a county or a municipal office etc.
Computational complexity analysis
Discussion regarding the computational complexity is important wherever an MDP-based solution is involved in a stochastic optimization problem. In the problem at hand in this paper, there are two sources of complexity, one is the total number of networks () and the other is the precision with which the infection ratio and the awareness ratio are being measured (). Consequently, the size of the state space in MDP is given by
| 26 |
The above complexity is calculated from Eq. (7) based on the fact that there are variables in each state and each of the variables has possible values. In order to have a rough estimate regarding the implication of (27), consider and . This implies a total of 46,656 states. This number grows exponentially with as shown in Fig. 5. Notice that the computational complexity involved in the calculation of the optimal policy from the MDP model is proportional to the square of the size of the state space. With current computing capabilities (gigahertz processors), the complexity of the order of calculations is affordable which may translate somewhere between 5 and 7 networks depending upon the value of . Therefore, a real life application of the proposed model, does demand distributed control in a sense that small blocks of a town may be isolated via something like a smart lockdown (which has been practiced by many governments during the COVID-19 pandemic). Smart lockdown limits the number of networks (houses and other buildings) to be dealt with by a single decision making authority. In terms of hospitals, the reduction in the complexity may be achieved by allocating various blocks within a hospital to the individuals within a sub-group of networks that is being controlled with a single MDP-based policy.
Fig. 5.

Size of the state space with respect to the number of networks where and
Numerical example
In this section, a numerical example with three MMCA based networks governed by an MDP has been discussed. This numerical example not only provides insight about how to implement the proposed approach, but it also serves as a guideline for evaluation of the resulting optimal policy.
Note that the numerical example presented in this section consists of only three networks where each network has three nodes. The reason for presenting small network is to facilitate the understanding of the parameter values and the calculations involved in the model. For networks with more than three nodes, the size of the connectivity matrices (A and B) would be larger. Similarly, the steady state probability values (presented later in Table 1) would be large in number. For example, for a three-node network, we need 15 values per network (a total of 45 values for three networks as shown in Table 1). But if we were to present an example of five nodes, there would have been 75 steady state values for three networks that would be cumbersome to add in a Table. On the other hand, the three-node network is able to exhibit enough structure in the problem that is needed for understanding the implications and trends in the optimal control policy.
Table 1.
Steady state probability values for MMCA networks
| Network 1 | Network 2 | Network 3 |
|---|---|---|
Parameter values
The example discussed considers three networks () and each network consists of three individuals (). Network 1 is assumed to be consisting of individuals that have a chain-type communication where some of the individuals do not communicate directly (see Fig. 6). Connectivity within network 1 is given by
| 27 |
Fig. 6.

Topology of network 1
Here, A1 represents a connectivity matrix for the communication layer of the network whereas B1 represents the connectivity layer of the network. Corresponding graphical illustration of network topology is presented in Fig. 6. Network 2 is selected to have direct communication links among all three members. Consequently, network 2 is set up with the following connectivity matrices
| 28 |
For the third network, we assume no communication among the members, i.e., the individuals are completely isolated, therefore, . The selection of three different networks with less, moderate, and high level of communication ensures diversity in the numerical example.
The parameter values have been selected based on the discussions in [11] and some adaptations have been made in order to create a difficult situation. For example, the probability of getting infected for the aware individual has been selected to be 0.1 instead of 0. Also, the probability of becoming aware has been selected based on the proportion of the educated versus uneducated people in Pakistan. Curing rate has been selected to be lower than what has been observed for COVID recently (to make the problem more challenging). Consequently, the values of the parameters are given as follows
| 29 |
Based on the above-mentioned parameter values, the steady state probabilities for all three networks were computed using (4) which turned out to be as given in Table 1.
Based on above mentioned steady state probabilities, the intermediate probability values for MDP are as follows
| 30 |
Next, the optimal control policy for the resulting MDP model is calculated using value iteration with . Salient features of the calculated policy are discussed in the next subsection.
Optimal control policy
For the calculation of the optimal policy, the cost of treatment drive and awareness campaign is set to be and the cost weighting on infection ratio and awareness ratio is set as . Priorities of the networks have been set as . Note that network 3 (that is hospital) has lowest priority because it already is network where there is lesser need for additional awareness campaign.
A sample trajectory with optimal control policy is presented in Fig. 7. Note that the initial conditions for this trajectory are such that all three networks have zero awareness and the infection ratio is 0.33, 0.66, and 0 respectively in network 1, network 2, and network 3. It can be noted from the bottom graph in Fig. 7 that the first priority of the optimal policy is to execute awareness campaigns in all three networks resulting in full awareness among the networks (this behavior is reflective of the high cost set against the lack of awareness ). Top graph indicates that (after the awareness has been maximized) the infected individuals from network 2 (which is the highest priority network) have been transferred to the hospital (network 3). There are two characteristics of the optimal policy worth discussing here. One is that the decision making in the optimal policy is sequential, i.e., the awareness campaign and the treatment drive are not launched simultaneously. In real life scenarios, the decisions could be made simultaneously, in order to model simultaneous decision making in the MDP, we must define a single decision as a combination of multiple decisions, i.e., we may define a decision meaning that treatment drive is being executed for network 1 and the awareness campaign is being executed in network 2. Such definition of decisions results in increase in the decision space as all possible combinations of treatment drives and awareness campaigns are to be included. Another (simpler) way of dealing with sequential decision making is to have a smaller decision epoch (with same decision space as define in (8)). A decision epoch is time interval between the two consecutive decisions. A smaller epoch results in faster decision making. However, a drawback of having smaller epoch is that the value of the next state could virtually be the same. In our case, no change in infection ratio may be observed during the time interval between the two decisions (as it takes time for an individual to recover from the disease). Second characteristic of the optimal policy worth discussing is the prioritization among the networks. For example, in the results shown in Fig. 7, the optimal policy does not launch any treatment drive in network 1. Such behavior of prioritization is important when the resources are limited (note that in our example, the cost of treatment campaign is ten times that of the awareness campaign). Another justification for this behavior is that the infection ratio in Network 1 (0.33) is within the threshold of epidemic outbreak (0.35 for , see Eq. 6). Note that the epidemic threshold for Network 1 (for ) is over the largest eigen value of matrix in Eq. (27). Although the MDP does not consider this threshold explicitly, it is reflected in the policy because we utilize the steady state probability values from the two-layer networks into the state transition model of the third MDP layer.
Fig. 7.

Sample trajectory with optimal control policy where is the infection ratio for network and is the corresponding awareness ratio
For comparison of the results from the optimal policy, we also present the results of sample trajectories without the optimal policy (generated based on maximum likely state transitions) in Fig. 8. Notice that due to low recovery and infection rates chosen in our example, the infection does not change in any of the networks (see Eq. (30)). Also, the trajectories only indicate the maximum likely sequence. In general, the disease will eventually spread in the infection ratio is above the threshold presented in Eq. (6). On the other hand, due to strong connectivity in the Network 1, the awareness about the disease spreads without any awareness campaign. Another factor for the spread of awareness in any network is the initial awareness in some of the nodes of the network. In general, the infection ratio may increase or decrease even without any awareness or treatment campaigns depending upon the rate of the spread of infection.
Fig. 8.

Sample Trajectory without optimal control policy where is the infection ratio for network and is the corresponding awareness ratio
Another comparison between the presence and absence of the optimal policy in terms of the cost incurred (as per Eq. (20)) is presented in Table 2. The comparison presents the cumulative cost incurred with and without the optimal policy for five sample trajectories. The length of each trajectory is 20 decision epochs (recall that the decision epoch is a user specified time duration between the two consecutive policy decisions). It is evident form the results that in general, the cost without the policy may be as much as three times of that with the optimal policy. However, this difference is highly dependent upon the initial condition. Specifically, for the initial conditions involving high level of awareness ratio, the both the cost magnitude and the difference between the two costs are low. In extreme case, with full awareness across all three networks and zero initial infection ratio, the cost with the optimal policy is equal to that without the optimal policy (both cost values are zero).
Table 2.
comparison of cost (equ. 20) with sample trajectories of length 20 decision epochs
| Sr. No | Initial state |
Cost with the optimal policy | Cost without the optimal policy |
|---|---|---|---|
| 1 | 8600 | 26,800 | |
| 2 | 7200 | 25,200 | |
| 3 | 3300 | 6,800 | |
| 4 | 10,800 | 28,800 | |
| 5 | 0 | 0 |
In order to develop further insights into the trends in the optimal control policy, the frequency of each decision has been plotted with respect to cost parameter and in Fig. 9 and Fig. 10 respectively. In both of the figures, all plots have number of occurrences of a decision in the optimal policy as y-axis value and on the x-axis are the corresponding parameter values used for calculating the optimal policy. There are three types of decisions as discussed earlier in Sect. 2.2., i.e., means launching treatment campaign in network , means no-action, and means launching awareness campaign in network . For each value-pair of the parameters ( and ), the optimal policy has been calculated and the histogram (number of occurrences) of decisions in the optimal policies have been plotted against the corresponding parameter values. It is clear in Fig. 9 that the number of occurrences of awareness campaigns reduces as increases (recall that is the cost of an awareness campaign) whereas the number of occurrences of treatment decisions remains almost the same except for one case where the frequency increases (dotted graph on top right of Fig. 9).
Fig. 9.

Trends in optimal policy with respect to . Specifically, the frequency of each decision, e.g., no operation (), treatment campaign in network 1 (), awareness campaign in network 2 () etc. is plotted against the ranges of the parameter () values. Furthermore, the blue graphs correspond to the value and the doted red graphs correspond to
Fig. 10.

Trends in optimal policy with respect to . Specifically, the frequency of each decision, e.g., no operation (), treatment campaign in network 1 (), awareness campaign in network 2 () etc. is plotted against the ranges of the parameter () values. Furthermore, the blue graphs correspond to the value and the doted red graphs correspond to
Trends in Fig. 9 indicate that increase in results in the decrease in the frequency of treatment decisions (recall that is the cost of a treatment campaign) in the optimal control policy. Such results can be used by decision making officials for devising optimal response to changing circumstances and also for prediction of funds required in order to optimally control the spread of an epidemic disease. Furthermore, the increase in indicates that, as the resources become scarce or expensive, the optimal decision is to refrain from spending the same without justification. Main advantage of the analysis of the optimal policy with respect to cost is the determination of the thresholds where the emergency situation may be declared. Similar, analysis may be performed with respect to changes in the probabilities, for example, the curing rate .
Simulating networks with more nodes
The above example has discussed three networks each having three nodes. While we have already discussed the impact of having increase in the number of networks on the computational complexity of an MDP layer, we still need to discuss the impact of having more nodes within each network. For example, what would happen if each of the three networks discussed in the above example had 100 nodes instead of three? The good news is, that the complexity of the MDP layer (i.e., the number of states in the MDP) would still be the same regardless of the number of nodes in each network. On the other hand, increase in the number of nodes would result in the changes in size of the A, B matrices given in the Eqs. (27) and (28). For a hundred node network, the size of A and B matrices would be . This is not a computational issue since we do not need to calculate the inverse of any of these matrices. We would use Eqs. (1–4) in order to calculate the steady state values of the probabilities associated with the network layers. Finally, the corresponding probabilities for the MDP would be computed using Eqs. (12), (13), (17), and (18). The results would be similar to the values presented in the Eq. (30). Once we have the values for the MDP-layer transition probabilities, it does not matter how many nodes each network has. This is because all of the information regarding the connectivity in the network layers has been consolidated into probabilities regarding the spread of information and the spread of the disease. Having said that, it would still be an avenue of future research to study the impact of various connectivity patterns on the steady state probabilities in the MDP layer. In the opinion of the author, these probabilities would depend more on how strongly or weakly the network is connected and depend less on the size (number of nodes) of each network. Still a concise investigation on this issue is an avenue of future research.
Conclusion
This paper has discussed a multi-layered model for epidemics control that facilitates calculation of optimal control policy. The proposed model is an extension of an existing multiplex network approach for modeling the epidemic spread. Implementation of the proposed approach involves calculation of steady state probabilities for social networks and incorporation of the same into transition probabilities in an MDP model. The resulting optimal policy has been discussed through a numerical example and some trends in the optimal policy have been highlighted. It is evident from the numerical results that the optimal policy finds a tradeoff between the resource consumption and the corrective measures. For example, the trends indicated in previous section show that the treatment and the awareness campaign frequency is not solely dependent upon the infection ratio, rather it also considers the cost of treatment and the cost of spreading awareness (Fig. 9 and Fig. 10). This implies that the infection ratio can be minimized only if we have enough resources for spreading the awareness (i.e., if the cost of awareness campaign is low). Also, the treatment campaigns depend upon the cost of the treatment which in turn depends upon the availability of hospitals and medical staff. Further investigation into the structure of the optimal policy and development of theoretical results is an avenue of future research on this topic. Major takeaways from the current research are, (1) MDP policy makes sequential decisions, therefore, the decision epoch may be defined with careful deliberations while implementing the proposed approach, (2) The computational complexity grows quickly as the number of networks increases, therefore, a distributed (multiple MDP-based) approach is recommended for large scale problems. Such an approach may require smart lockdowns by the authorities (3) A spinoff benefit of the proposed approach is that the model allows for cost and chance analysis that can help the authorities to define thresholds of emergencies in terms of the availability of resources and the curing rate.
Based on the proposed model and the results discussed in the paper, it is evident that the proposed model can be used for the government for taking preventive measures against the spread of an epidemic disease in the following manner. First, the population shall be divided into social and contact networks, e.g., the people who communicate frequently with each other and the people who live together or work together such as family members and coworkers. Next, the cost of hospitalization, vaccination, and the cost of awareness campaign shall be added appropriately in the proposed model for the calculation of the optimal policy. The policy provides a guideline (optimal decision) as a function of existing situation (system state). One future direction is to modify the proposed approach by incorporating available resources such as health budget. Such modification may enable the resulting policy to provide guidelines for different amounts of health budgets. In this way, one can generate appropriate estimation of health budgets based on the statistical data regarding the spread of an epidemic disease. Note that the proposed model is only effective when the statistical data about the spread of a disease (or an appropriate approximation) is available for the calculation of associated probabilities.
Although the proposed method presents an improvement on the existing networked control approaches, still there are some key challenges and limitations involved that should be discussed here. Two major limitations of the proposed approach are scalability and availability of information. In terms of scalability, decentralization is inevitable. For large population groups, it may not be feasible to have a single MDP for making decisions. Hence the problem has to be decomposed into smaller subproblems. Second challenge regarding the availability of information is also critical because the state information is required for the determination of control policy (state information includes the exact ratio of infection and awareness among the population). Furthermore, the state transition probabilities are also determined from the statistical data that must be collected before the practical implementation of the proposed approach. In the absence of correct information or data, the MDP model should be replaced with a Partially Observable Markov Decision Process (POMDP) model. Regarding the lack of statistics, machine learning may be employed for learning the trends in state transitions. Such investigations constitute the avenues for future work. Another possible future direction is to incorporate the study on human behavior (as discussed in [17]) into the network layer model in the proposed approach (i.e., feeding the state transition probabilities in the MDP by incorporating the cumulative disease information in the MRFC model). Such an innovation may reveal useful insights regarding the effective control of the spread of epidemics.
Data availability
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
Declarations
Conflict of interest
The author confirms that there is no associated conflict of interest with this manuscript.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.T. Götz, "First attempts to model the dynamics of the Coronavirus outbreak 2020." arXiv preprint arXiv:2002.03821 (2020)
- 2.Tang K, Huang Y, Chen M, Novel Coronavirus (Covid-19) epidemic scale estimation: topological network-based infection dynamic model. Medrxiv. 2019 doi: 10.1101/2020.02.20.20023572. [DOI] [Google Scholar]
- 3.Nowzari C, Preciado VM, Pappas GJ. Analysis and control of epidemics: a survey of spreading processes on complex networks. IEEE Contrl Syst Mag. 2016;36(1):26–46. doi: 10.1109/MCS.2015.2495000. [DOI] [Google Scholar]
- 4.Ketcheson DI. Optimal control of an SIR epidemic through finite-time non-pharmaceutical intervention. J Math Biol. 2021;83:7. doi: 10.1007/s00285-021-01628-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Balderrama R, Peressutti J, Pinasco JP, et al. Optimal control for an SIR epidemic model with limited quarantine. Sci Rep. 2022;12:12583. doi: 10.1038/s41598-022-16619-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.G. Zauner, G. Popper, and Felix Breitenecker. "Evaluation of different modeling techniques for simulation of epidemics." Proceedings of the 7th EUROSIM Congress on Modelling and Simulation. Vol. 2. 2010.
- 7.Watkins NJ, Nowzari C, Pappas GJ. Robust economic model predictive control of continuous-time epidemic processes. IEEE Trans Autom Control. 2020;65(3):1116–1131. doi: 10.1109/TAC.2019.2919136. [DOI] [Google Scholar]
- 8.Wang Z, Xia C, Chen Z, Chen G. Epidemic propagation with positive and negative preventive information in multiplex networks. IEEE Transac Cybernet. 2021;51(3):1454–1462. doi: 10.1109/TCYB.2019.2960605. [DOI] [PubMed] [Google Scholar]
- 9.A. Nasir and H. Rehman, "Optimal control for stochastic model of epidemic infections," 2017 14th International Bhurban Conference on Applied Sciences and Technology (IBCAST), Islamabad, 2017, pp. 278–284.
- 10.Nasir A, Baig H, Rafiq M. Epidemics control model with consideration of seven-segment population model. SN Appl Sci. 2020;2:1674. doi: 10.1007/s42452-020-03499-z. [DOI] [Google Scholar]
- 11.Granell C, Gómez S, Arenas A. Dynamical interplay between awareness and epidemic spreading in multiplex networks. Phys Rev Lett. 2013;111(12):128701. doi: 10.1103/PhysRevLett.111.128701. [DOI] [PubMed] [Google Scholar]
- 12.Zheng C, Wang Z, Xia C. A novel epidemic model coupling the infectious disease with awareness diffusion on multiplex networks. Chinese Contr Decision Confer (CCDC) 2018;2018:3824–3830. doi: 10.1109/CCDC.2018.8407787. [DOI] [Google Scholar]
- 13.A. Alaeddini, and D. Klein. "Optimal Immunization Policy Using Dynamic Programming." arXiv preprint arXiv:1910.08677 (2019).
- 14.Gao S, Dai X, Wang L, Perra N, Wang Z. Epidemic spreading in metapopulation networks coupled with awareness propagation. IEEE Transactions on Cybernetics. 2022 doi: 10.1109/TCYB.2022.3198732. [DOI] [PubMed] [Google Scholar]
- 15.Darabi Sahneh F, Scoglio C, Van Mieghem P. Generalized epidemic mean-field model for spreading processes over multilayer complex networks. IEEE/ACM Transac Netw. 2013;21(5):1609–1620. doi: 10.1109/TNET.2013.2239658. [DOI] [Google Scholar]
- 16.Wang Z, Guo Q, Sun S, Xia C. “The impact of awareness diffusion on SIR-like epidemics in multiplex networks. Appl Math Comput. 2019 doi: 10.1016/j.amc.2018.12.045. [DOI] [Google Scholar]
- 17.Bi K, Chen Y, Zhao S, Ben-Arieh D, Chih-Hang John Wu. Modeling learning and forgetting processes with the corresponding impacts on human behaviors in infectious disease epidemics. Comput Ind Eng. 2019 doi: 10.1016/j.cie.2018.04.035. [DOI] [Google Scholar]
- 18.Zhao S, Kuang Y, Chih-Hang Wu, Bi K, Ben-Arieh D. Risk perception and human behaviors in epidemics. IISE Transac Healthcare Syst Eng. 2018;8(4):315–328. doi: 10.1080/24725579.2018.1464085. [DOI] [Google Scholar]
- 19.Hamer WB, Birr T, Verreet J-A, Duttmann R, Klink H. Spatio temporal prediction of the epidemic spread of dangerous pathogens using machine learning methods. ISPRS Int J Geoinf. 2020;9(1):44. doi: 10.3390/ijgi9010044. [DOI] [Google Scholar]
- 20.Aurell A, et al. Optimal incentives to mitigate epidemics: a Stackelberg mean field game approach. SIAM J Contr Optim. 2022 doi: 10.1137/20M1377862. [DOI] [Google Scholar]
- 21.Goscé L, Phillips PA, Spinola P, Gupta DRK, Abubakar PI. Modelling SARS-COV2 spread in London: approaches to lift the lockdown. J Infect. 2020;81(2):260–265. doi: 10.1016/j.jinf.2020.05.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Kraemer MUG, et al. The effect of human mobility and control measures on the COVID-19 epidemic in China. Science. 2020;368(6490):493–497. doi: 10.1126/science.abb4218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Chinazzi M, et al. The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak. Science. 2020;368(6489):395–400. doi: 10.1126/science.aba9757. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Markovic R, Šterk M, Marhl M, Perc M, Gosak M. Socio-demographic and health factors drive the epidemic progression and should guide vaccination strategies for best COVID-19 containment”. Result Phys. 2021 doi: 10.1016/j.rinp.2021.104433. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Gosak M, Duh M, Markovic R, Perc M. Community lockdowns ˇ in social networks hardly mitigate epidemic spreading. New J Phys. 2021 doi: 10.1088/1367-2630/abf459. [DOI] [Google Scholar]
- 26.Marco M, Andrea G, Claudio C. Vezzani Alessandro and Burioni Raffaella “Sideward contact tracing and the control of epidemics in large gatherings”. J R Soc Interface. 2022;19:0048. doi: 10.1098/rsif.2022.0048. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

