Skip to main content
PLOS One logoLink to PLOS One
. 2014 May 1;9(5):e95133. doi: 10.1371/journal.pone.0095133

Epidemic Contact Tracing via Communication Traces

Katayoun Farrahi 1,*, Rémi Emonet 2, Manuel Cebrian 3,4,5
Editor: Renaud Lambiotte6
PMCID: PMC4006791  PMID: 24787614

Abstract

Traditional contact tracing relies on knowledge of the interpersonal network of physical interactions, where contagious outbreaks propagate. However, due to privacy constraints and noisy data assimilation, this network is generally difficult to reconstruct accurately. Communication traces obtained by mobile phones are known to be good proxies for the physical interaction network, and they may provide a valuable tool for contact tracing. Motivated by this assumption, we propose a model for contact tracing, where an infection is spreading in the physical interpersonal network, which can never be fully recovered; and contact tracing is occurring in a communication network which acts as a proxy for the first. We apply this dual model to a dataset covering 72 students over a 9 month period, for which both the physical interactions as well as the mobile communication traces are known. Our results suggest that a wide range of contact tracing strategies may significantly reduce the final size of the epidemic, by mainly affecting its peak of incidence. However, we find that for low overlap between the face-to-face and communication interaction network, contact tracing is only efficient at the beginning of the outbreak, due to rapidly increasing costs as the epidemic evolves. Overall, contact tracing via mobile phone communication traces may be a viable option to arrest contagious outbreaks.

Introduction

There is great potential to deepen our understanding of disease dynamics through the analysis of digital traces of individual and collective behaviour [1][7]. This is because, traditionally, the collection of how epidemics propagate in a population has been based on individual self-reporting, known to be severely biased [8][10]. The bias is documented both at the individual level, self-reporting of symptoms, but also about the recollection of recent face-to-face interactions that potentially lead to contagion [11]. Increasing awareness of personal data privacy are unlikely to lighten this limitation in the near future [12].

We already have some examples in the digital epidemiology direction which use large-scale digital traces for simulation. For instance, a large-scale sociotechnological network based on Facebook data was used to study the role of community structure in disease dynamics [13]. Also, close proximity interactions (CPIs) captured using wireless sensors were able to map the fine-grained, face-to-face interactions of a community [14]. By studying the CPIs on an American high school community and recovering the contact network, new prevention strategies were designed with the potential to be more effective than random immunization. Infectious disease dynamics have been simulated focusing on the temporal and heterogeneity aspects considering CPIs sensed over a two day period at a conference using RFID tags [15].

While these previously investigated sources of digital sensing (Facebook and CPIs from wearable badges) are advantageous in that they capture large scale interactions in a continuous manner giving a more complete estimate of human interactions in reality, they also present some limitations. Online social networks represent online social behaviours which differ from physical proximity interactions whereby disease transmission occurs and may fail to capture the fine-grained, face-to-face interaction dynamics relevant for disease transmission [10]. CPI monitoring using wearable badges is a costly and limited resource requiring participants to wear an additional sensing device and therefore cannot readily be extended to the population at large; CPIs may not be readily extended to larger scales in the immediate future (e.g. Salathé et al. consider one school day [14], Stehlé et al. consider two conference days [15], and Isella et al. consider one week [16]).

In this regard, mobile phones provide a promising resource as they are ubiquitously carried by the population, irrespectively of socio-economic status, and provide a much larger-scale, data-driven opportunity for epidemiology. Further, mobile phones are carried by people when they travel overseas, potentially serving as a global physical proximity sensor. Its pervasiveness in countries under development, where pandemic prevention is most critical, makes then a viable option [17].

Our present effort focuses on exploiting these phone communication and interaction traces for epidemic simulation and contact tracing [18]. Communication traces obtained by mobile phones are known to be good proxies for the physical interaction network [19][21], and therefore our goal is to consider how to simulate a contact tracing model over mobile sensed interaction data. We aim at exploring the potential of communication datasets to serve as a realistic cue for physical proximity interactions at large.

We develop a model where the infection takes place over the close-proximity physical network (which can never be fully recovered in reality), and assume contact tracing occurs on a differing network, in this case a communication (phone, sms) inferred one. We explore the contact tracing model proposed in detail, particularly focusing on tracing efforts on noisy networks, representing a perturbed subset of the ideal network. Finally, we simulate our proposed model over the real mobile phone interaction data dynamics, demonstrating mobile phone interactions are a promising tool for large-scale epidemic simulations, and mobile phone communication logs can be used as a concrete source for contact tracing reducing the effects of an epidemic. Just as optimizing immunization strategies is of great interest if only incomplete immunization is possible [14], optimizing contact tracing is of great interest if tracing with incomplete or noisy information is the only possibility. In this regard, we also consider the complex relationship between physical and digital interaction overlap and contact tracing effort. This study can inform health policies aiming to use communication traces for contact tracing.

Methods

Contact Tracing

We consider a population of Inline graphic individuals whose connections to each other form a graph. The degree Inline graphic of a node Inline graphic is the number of links between Inline graphic and the other individuals in the population. The mean degree of the network is represented by Inline graphic and in random graphs, this degree distribution is Poissonian. Therefore, we do not assume that all nodes have the same degree: the overall node degree distribution forms a Poisson distribution. Individuals can be in one of four different states, susceptible (Inline graphic), infected (Inline graphic), traced (Inline graphic), or recovered (Inline graphic). The following state transitions are then defined as in the contact tracing model by Tsimring and Huerta [22], and the symbols are defined in Table 1. Following, we summarize the contact tracing model.

Table 1. Dual model symbol description.

Inline graphic population size
Inline graphic a small interval of time.
Inline graphic constant determining infection rate.
Inline graphic the ideal network in which the epidemic is actually spreading.
Inline graphic the number of infected neighbours of node Inline graphic in network Inline graphic.
Inline graphic constant determining random tracing rate.
Inline graphic constant determining contact tracing rate.
Inline graphic the dual network which is used for contact tracing.
Inline graphic the number of traced neighbours of node Inline graphic in network Inline graphic.
Inline graphic tracing-policy constant controlling the fading time for contact tracing.

Infection Inline graphic

Initially, the whole population is susceptible to infection. One node is subsequently randomly infected, which then starts to infect its neighbours and may initiate an outbreak. The probability that a susceptible node becomes infected is given by Inline graphic, where Inline graphic represents the number of infectious neighbours of node n and Inline graphic is a small time interval.

Tracing Inline graphic

We assume there is no spontaneous recovery, and individuals becomes traced for a certain period of time after which recovery takes place and the individual becomes removed. There are two types of tracing efforts to identify infected individuals, random checking and contact tracing. Random checking is done by choosing an arbitrary node with probability Inline graphic. Contact tracing of a node Inline graphic is done with probability Inline graphic, where Inline graphic is the number of neighbours of Inline graphic which are in the traced state Inline graphic. For simplification, traced people can no longer infect other nodes. Additionally, if an infectious individual does not become traced, in the current model it remains infected. An infected node cannot recover (or become removed) unless it is traced.

Removal Inline graphic

Traced individuals are transformed into the removed state, or recovered state, and are unable to become infected again. A node can recover from the traced state with a probability given by Inline graphic.

The contact tracing model can be summarized by the following equations.

graphic file with name pone.0095133.e037.jpg (1)
graphic file with name pone.0095133.e038.jpg (2)
graphic file with name pone.0095133.e039.jpg (3)

Dual Model for Contact Tracing

We first study a dual network topology which accounts for differing edge formations between the infection and tracing networks. Given the contact tracing model defined by equations 13, the underlying assumption traditionally is that Inline graphic and Inline graphic are obtained from one network [22]. Here, we propose to extend this assumption to introduce two networks, one in which disease is spreading, used in equation 1, and one in which tracing occurs, used in equation 2. Our proposed extension, which we refer to as the dual model since it considers two differing networks, represents a more realistic situation in practice which we simulate and validate using a concrete data source from mobile sensed interactions.

We define the network of physical interactions as Inline graphic. This is the network in which infection transmission takes place. It is also the network from which the number of infected nodes in proximity with node Inline graphic, Inline graphic, is obtained. In this paper, we assume that this network can never be fully recovered due to, for example, people forgetting whom they interacted with, contagion through strangers and objects, or the impossibility of ever obtaining a complete network of face to face interactions of a population at large [11]. We therefore assume the existence of a dual network, Inline graphic, which is a noisy subset of Inline graphic. The derivation of Inline graphic for simulation purposes is described in section “Dual Network Topology”. We assume Inline graphic can be obtained by concrete means since it is the network from which tracing will take place. We consider a novel and automatic means of obtaining Inline graphic in this paper, by considering Inline graphic to be taken from ubiquitously sensed interactions obtained by people’s mobile phones. The parameter Inline graphic is taken from the dual network Inline graphic, which differs from Inline graphic (in [22] Inline graphic is taken from the same network as Inline graphic).

Next, we propose a formal method for obtaining Inline graphic from Inline graphic, and later demonstrate the differences in simulating such a dual network topology in comparison to an ideal topology. We then demonstrate the successful use of contact tracing on the dual network based on phone call history by simulating the dual network topology over real interaction and communication patterns obtained by a community’s mobile phone logs.

Dual Network Topology

We define below the process by which we generate the dual network from the ideal network. By removing a portion of the actual ties we simulate a scenario in which the communication traces are only capturing a subset of the actual links. By adding new ties, we simulate the case where communication traces provide dyadic interactions that do not happen in the real world, only in the digital realm. One important measure for our study is the overlap between the two networks, which corresponds to the proportion of links that are present in both networks. The dual network topology is generated as follows:

  1. First generate the physical proximity network, Inline graphic, in which disease is spreading.

    • Generate the proximity network of Inline graphic nodes. We assume Inline graphic in all of our simulated experiments.

    • Generate N*K unique links, where Inline graphic is the average outgoing node degree. This results in Inline graphic bi-directional links, i.e. the disease can propagate in both directions.

  2. Second, generate the dual network, Inline graphic, which is a noisy version of the physical proximity network

    • Remove Inline graphic links (and thus Inline graphic bi-directional links) from the network, where Inline graphic is the average number of forgotten links (per node) which cannot be traced for some arbitrary reason.

    • Add Inline graphic new links which were not among the original ones.

In the reported experiments we used Inline graphic. The first motivation to having Inline graphic is to be able to vary the overlap between networks while keeping a constant size (number of links) for the tracing network. We thus do not need to apply a corrective factor the Inline graphic parameters. The second motivation comes from the observation that in real applications, Inline graphic can be either greater than Inline graphic (e.g., if the contact tracing uses a highly connected online social network) or less than Inline graphic (e.g., if we use meeting agendas of people). Even if we use Inline graphic, we provide in figure 1 an example of the impact of varying Inline graphic in the case of an overlap of 8%.

Figure 1. Average maximum and total infected over Inline graphic.

Figure 1

Average maximum and total number of infected people for a network overlap Inline graphic, while varying the ratio Inline graphic between the number of removed and added edges. The known network (used for contact tracing) is supposed to be a noisy version of the real network (in which epidemics spread), obtained by removing some edges and adding new ones.

Note that Inline graphic is a noise factor parameter used to generate the dual network. This parameter Inline graphic is related to the percentage of overlap, Inline graphic, between Inline graphic and Inline graphic using the following relation:

graphic file with name pone.0095133.e084.jpg (4)

These equations can be re-derived with the help of figure 2.

Figure 2. Overlap illustration.

Figure 2

Illustration of the overlap in terms of links between the ideal network and the dual network depending on Inline graphic, Inline graphic and Inline graphic. The intersection of the two networks, in blue, is of size Inline graphic and the union is of size Inline graphic.

Dataset

We present the dataset that motivates our dual model, and whose parameters, network structure, and dynamics is used in the rest of this paper. The participants in the study represent Inline graphic of the total population of an undergraduate dormitory in a North American university previously explored for other applications [23]. The dataset consists of the mobile phone logs of Inline graphic participants, roughly equally distributed across all four academic years, and Inline graphic male. The participants used Windows Mobile devices for data collection as their primary phones with their existing voice plans. Data was collected over a Inline graphic month period between October 2008– June 2009. The data used in this study consists of Inline graphic Bluetooth interactions, Inline graphic phone call records, and Inline graphic SMS records. The study had IRB approval by the Massachusetts Institute of Technology Committee on the Use of Humans as Experimental Subjects (MIT COUHES) and written informed consent from participants; further details of the dataset can be found here [23].

We consider interaction data logged by the mobile phones. Bluetooth sensors monitored the physical proximity interaction. Other non-physical interactions were monitored by phone communication logs including phone calls and SMS activity. We only consider phone communication and proximity interaction with other study participants (known devices to the study). The data has been previously studied in the framework of real-life health and obesity diffusion [23], [24], opinion diffusion [25], as well as community relationship and interaction pattern analysis [26].

For each of the mobile phone proximity interaction (Bluetooth) and communication (call and SMS) events sensed, we consider the number of events (regardless of their duration), including missed calls. Users correspond to nodes, and undirected edges to interactions. The edges are weighed by the number of events. By considering the number of events, we can readily combine the two types of phone communication logs (calls and SMS). By considering undirected interactions, the proximity interactions can be easily compared to the communication data since phone communication is directed but Bluetooth is undirected. The data is therefore symmetrized, and we assume undirected links. The static average daily networks for the phone communication and physical proximity interactions can be seen in figure 3 (a) and (b), respectively. Note, the nodes are consistent between the two visualizations and the physical proximity represents a highly connected network in comparison to the phone communication network. In this paper (in section “Contact tracing with empirical data in a dual network configuration”), we show that simulating the epidemic spread on network (b) though including its dynamic nature and tracing only based on network (a) suffices in reducing the effects of an epidemic, even given its sparse nature in this dataset.

Figure 3. Static network visualizations of the data.

Figure 3

The static networks obtained by the overall average number of daily mobile phone (a) communication (call and sms) and (b) physical proximity interactions.

Next, we consider the overlap between the real physical proximity and communication networks (figure 3 (a) and (b)) in more detail. We obtain the percentage of overlap between the communication network, Inline graphic, and the physical proximity network, Inline graphic, in figure 4, representing a key parameter for our dual model. For each participant, we compute the percentage of overlap within their community, which can be found using the relation Inline graphic. We plot the distribution of the average user’s overlap on a log-log scale. In figure 4 (a), the overlap is considered over an accumulated static network over the entire duration of the study, whereas in (b) the monthly accumulated static networks are considered. First we consider the overall networks. The maximum overlap is Inline graphic, meaning one user communicates by phone with about a quarter of the people they interacted with in the community over the duration of the study. There is never more than Inline graphic overlap between these networks. The minimum is no overlap (Inline graphic), meaning some users never called anyone within the community of people they interacted with. The overall network overlap (in terms of common edges) between the mobile phone network and the Bluetooth interaction network is Inline graphic. We consider this overlap between the two networks obtained by mobile phone sensed data as an approximate realistic measure for Inline graphic, where we set Inline graphic as a lower bound of Inline graphic to account for additional measurement error. We can see from figure 4 (a), that the probability of having less than Inline graphic overlap is quite high, and drops significantly for greater than Inline graphic overlap. We do not assume Inline graphic to be equivalent to Inline graphic, however, as Inline graphic can never be fully obtained in reality, we use the mobile phone data to obtain an estimate of Inline graphic. Our experimental results are presented over a range of Inline graphic, though Inline graphic is used as a data-driven approximation and is the focus for the discussion.

Figure 4. Real data-driven network overlap.

Figure 4

(a) Distribution of % overlap between the overall communication and Bluetooth networks on a log-log scale. (b) Monthly variations in the % overlap between the communication and Bluetooth networks averaged over all users.

In figure 4 (b) we plot the monthly variations in the overlap between the phone and Bluetooth networks. We plot three curves: (1) ‘overall’ is the mean overall average overlap between Inline graphic and Inline graphic computed over the Inline graphic months from (a), (2) ‘monthly’ is only considering the interactions which occurred over the specified month, and (3) ‘cum mo’ is the cumulative monthly, and is all the interactions which occurred up until the specified month. We can see in a specific month, the overlap between these networks is much lower than the average. With the data we are using, we can see that the cumulative monthly approaches the overall monthly after about Inline graphic months. This indicates that with our data we approach the mean after about Inline graphic months.

Simulation Results

Dual Networks

First we simulate the various network configurations to compare the spread of infection characteristics over the full range of the overlap parameter Inline graphic. For simulation, we assume a population of Inline graphic nodes and plot results as averages over Inline graphic random trials. The model parameters are as follows, Inline graphic, Inline graphic, and Inline graphic to Inline graphic. We assume Inline graphic. This corresponds to Inline graphic days, Inline graphic days, and Inline graphic ranging over 10 to 0 days and Inline graphic is chosen to be less than 1 second. These parameters correspond to a familiar SIR-type model with rescaled parameters and similar dynamics as in [22]. For comparison, the realistic infectious periods for various infectious diseases can be found in Table 5.1 in [27]. The optimal network case (considered in [22]) occurs at Inline graphic, representing 100% overlap between the infectious and tracing networks. The minimal network overlap, Inline graphic, suggested by real data is also highlighted in the results. For every simulation, there is one initial randomly selected infectious case. In the theoretical simulations (not using the real data), we assume Inline graphic.

Impact of tracing and network overlap on the size of the outbreak

Figure 5 presents how the peak of the epidemic is affected by the overlap Inline graphic, with Inline graphic, Inline graphic, Inline graphic, Inline graphic. We are considering in figure 5 (a) the value of the peak of the epidemic (i.e., the maximum value in curves such as in figure 6). In general, the greater the overlap between the tracing and disease spreading networks, the more effective contact tracing is. More precisely, the maximum number of infected people decays exponentially with the network overlap (linear slope in a log scale). The intensity of decay increasing with the contact tracing rate Inline graphic.

Figure 5. Theoretical epidemic simulations over varying tracing efforts and network overlap.

Figure 5

(a) The maximum number of infected individuals (representing the peak of the epidemic), (b) its time of occurrence, and (c)-(d) the overall number of infected individuals on log and non-log scales, respectively; all plotted as a function of Inline graphic, with Inline graphic and Inline graphic. The legend shows the range of contact tracing effort, with Inline graphic to Inline graphic. We can see in (a) that contact tracing is effective in reducing the peak number of infected people with Inline graphic to Inline graphic times fewer maximum infected cases between Inline graphic and Inline graphic. We plot a line at Inline graphic, representing a minimal network overlap which corresponds to the values suggested by the analysis of mobile phone data (see figure 4). The greater the overlap between the tracing and disease spreading networks, the more effective the tracing. At the ideal but unrealistic case of 100% overlap, a Inline graphic of 2.5 allows to get Inline graphic times fewer maximum infected people in comparison to the case with Inline graphic. A low overlap such as Inline graphic has little effect on the size of the outbreak (the overall number of infected individuals does not decrease much), but still the peak number of infected cases is lowered. With higher overlap, the peak of infections not only decreases in intensity but also gets delayed (c).

Figure 6. Time varying simulation results of the ideal network scenario and the proposed dual network topology.

Figure 6

The infected population plot as a function of time for (a) Inline graphic and (b) a network overlap of Inline graphic, where Inline graphic, Inline graphic, Inline graphic, Inline graphic. Contact tracing is always beneficial, even when there is a small overlap between Inline graphic and Inline graphic. We observe that contact tracing becomes increasingly effective as the number of infections increases in both network topologies (a) and (b). However, contact tracing becomes decreasingly effective as the number of infections decreases, particularly in the dual network topology case. This can be seen by the worsened effects of the second and sometimes third peaks for the dual network case (e.g., with Inline graphic).

However, plots from figures 5 (c) and 5 (d) suggest that the total number of infected people behaves differently from the peak value of the epidemic. With a small network overlap (e.g., Inline graphic), an increased contact tracing rate only slightly changes the total number of infected people. It is only with higher overlap (Inline graphic) that the contact tracing rate becomes a key factor in reducing the total size of the epidemic.

We are showing, in figure 5 (b), the time at which the peak of the epidemic occurs. This peak occurs earlier when the contact tracing rate is increased. This is due to the fact that the peak is smaller and thus reached earlier. When the tracing rate is high (Inline graphic), we observe that an increased overlap tends to delay the occurrence of the peak. This reflects the ability of contact tracing (with strong overlap) to effectively slow down the spread of the disease.

Temporal impact of contact tracing

The time-varying nature of the epidemic can be seen in figures 6 and 7 where the log of the average number of infected individuals is plot over time. Figure 6 shows the case with full overlap (Inline graphic) in comparison to the case with Inline graphic overlap, which is the lower bound on the actual average network overlap found in the real social interaction networks (see figure 4). We can see that even with such a low percentage of network overlap, contact tracing continues to be effective as it notably reduces the outbreak of the epidemic. In general, we observe that contact tracing becomes increasingly effective as the number of infected cases increases. In many of the cases there are two to three peaks in the epidemic and the number of infected individuals over time. The differences between the two network configurations Inline graphic (figure 6 (a)) and Inline graphic (figure 6 (b)) becomes increasingly apparent as the amount of tracing increases whereby in reality, the more realistic network configuration (b) shows even a decreased amount of tracing Inline graphic may result in fewer cases and a more optimal tracing scenario. This occurs due to the fact that there is no second epidemic peak in this scenario (Inline graphic), though with larger tracing effort, the epidemic is reduced significantly rapidly, leaving a great deal of the population susceptible for a second peak of infections.

Figure 7. Time varying simulation results of our proposed contact tracing dual network topology while varying network overlap, Inline graphic.

Figure 7

We observe the changing effects of the time-varying spread over Inline graphic. The difference in infectious spread over time becomes more apparent in the cases with two peaks, where Inline graphic particularly after the second peak, where an increase in network overlap results in fewer infected cases. Note, the log scale employed to make the graphs easily comparable tends to attenuate the differences between curves within a graph.

In figure 7 we consider four cases of fixed Inline graphic to observe the difference in the effects of the percentage of network overlap in contact tracing. Again, Inline graphic, Inline graphic, Inline graphic, Inline graphic. We see the worst case occurs for no contact tracing based on social interactions (plot (a) where Inline graphic). Different amounts of contact tracing have differing time-varying effects on the disease spreading. For small amounts of contact tracing there is only one peak ((b) Inline graphic). However as soon as Inline graphic increases beyond that point, there are two peaks ((c) Inline graphic, (d) Inline graphic). This is due to the contact tracing becoming so effective that the number of cases drops rapidly, resulting in tracing becoming less effective. Note, this effect is much more attenuated in the dual network case (figure 6 (a) versus (b)) which is the scenario closer to reality due to incomplete network information. This is due to having a smaller probability of tracing effectively (i.e. tracing an infected contact) given the smaller number of infectious cases, whereas in an optimal network case there is a higher chance of effective tracing given a smaller number of infections.

Why does contact tracing work with such low overlap?

Our results have shown that even with very small overlap between the two networks, contact tracing was still effective in limiting the peak size of the epidemic. With low overlap this behaviour might be surprising. It is actually explained by a simple fact: when using contact tracing, an increase in the number of infected people causes an increase in the tracing effort. This adaptation phenomenon is not present when only random tracing is used. We aim here at quantifying whether it is still worth doing contact tracing with a relatively small network overlap or if increasing random tracing is preferable.

We measure the tracing effort defined as the sum of the effort due to random tracing and the effort due to contact tracing:

graphic file with name pone.0095133.e186.jpg (5)

What the formula encodes is that the random tracing (with intensity Inline graphic) is done on the population of both infected and susceptible nodes, as we cannot know in advance who is infected when doing tracing. Similarly, the effort due to contact tracing (with intensity Inline graphic) targets a population made of all the neighbours (in the tracing network) of traced nodes that are still either susceptible or infected.

In figure 8, we plot both the number of infected nodes and the total tracing effort. Results are presented for the low network overlap case, Inline graphic, unless specified otherwise. The three dotted curves represent the tracing effort profile when no contact tracing is used. Considering the dotted blue curve with only random tracing, Inline graphic and Inline graphic, the tracing effort starts at 200 (1000 individuals times 0.20), and the number of infected nodes grows up to 300 (not shown), then it decreases greatly at the end of the epidemic. Generally, the tracing effort is smaller at the end of the epidemic because the traceable (susceptible and infected) population is reduced. Now, with the addition of contact tracing (Inline graphic and Inline graphic), the tracing effort increases greatly as the epidemic grows, however there is a significant reduction in the number of infected cases (below Inline graphic cases as opposed to Inline graphic).

Figure 8. Average temporal evolution of the tracing effort and the number of infected people with or without contact tracing.

Figure 8

Only the last curve considers the case with complete network overlap (Inline graphic) while all other curves are with Inline graphic.

Comparing the solid green curve (Inline graphic, Inline graphic) with the dotted red curve (Inline graphic, Inline graphic), figure 8 results show that it can preferable to design a tracing policy that employs a lower random tracing rate but that uses contact tracing (Inline graphic, even with an overlap of Inline graphic). There is a trade-off to consider in terms of tracing effort over time, maximum tracing effort, and peak in number of infected individuals. While at the peak of the epidemic, the tracing effort does increase beyond the case with random tracing alone, this increase in tracing effort is not constant. The effort is less than for the random tracing effort alone at the start and end of the epidemic. Further, the peak of the epidemic is reduced when considering the addition of contact tracing with reduced random tracing (Inline graphic, Inline graphic vs. Inline graphic, Inline graphic). A similar trend is observed for varying the level of random tracing, and considering a reduced random tracing with the inclusion of contact tracing. In figure 8, the case of Inline graphic illustrates the positive impact of having perfect knowledge of the spreading network. We observe that better knowledge of the network highly reduces the size of the outbreak down to 26, while allowing for very low random tracing (Inline graphic instead of, e.g., Inline graphic).

In general, contact tracing does not require a great effort at the beginning of the outbreak, but rapidly becomes costly when the epidemic evolves. However, it is effective in reducing the size of an epidemic with low network overlap, as is random tracing alone. An optimal solution to consider in future work may be to consider varying the random and contact tracing efforts over time to optimize costs as the epidemic evolves. A tracing policy including contact tracing allows to both adapt tracing to the number of infected people and exploit the known information about people’s interaction. Such policies have the potential of reducing the constant efforts required by random tracing and considering the use of contact tracing at particular intervals while containing an epidemic outbreak with minimal cost.

Relative proportion of contact tracing and random tracing

We observed that one benefit of contact tracing over pure random tracing is that it adapts the tracing effort to the number of detected infections and thus has a varying effort (and cost) over time. To further explore the role of contact tracing, we consider a setup where we assume a fixed amount of tracing effort is available. In such a case, we expect and observe that contact tracing with a low overlap is not advantageous.

In the simulation, we allow a fixed tracing budget (400 units). We allocate a fixed part of this budget to random tracing, the rest goes to contact tracing. In practice, we continuously adapt the Inline graphic parameters to spread the tracing budget between contact tracing and random tracing, in the desired proportions. A special case happens at instants when no individual is in state T (traced after being infected), mostly at the beginning of the epidemic. In this case, the full budget goes to random tracing.

Figure 9 shows the average maximum (and total) number of infected people depending on the part of the budget dedicated to random tracing. We provide curves with two different network overlaps (Inline graphic (left), Inline graphic(right)). With complete overlap, we clearly observe that contact tracing is strongly beneficial. Even with this perfect knowledge of the network, results show that spending more than half of the budget in contact tracing is not providing improvement (nor degradation). With low overlap (Inline graphic), simulation results show that doing some contact tracing is better than not doing any but that it soon becomes detrimental. From these experiments we can conclude that, with low overlap, the biggest cause of the previously observed positive impact of contact tracing is actually the adaptive tracing effort.

Figure 9. Average maximum and total numbers of infected people against the amount of random tracing effort.

Figure 9

Simulations consider a network overlap of Inline graphic (left) and Inline graphic (right), when the total tracing effort is constant (400).

Contact Tracing with Empirical Data in a Dual Network Configuration

In this section, we consider experiments on the real data. We apply the dual contact tracing model on the full empirical interaction and communication dataset obtained by mobile sensing. While the physical interactions obtained by Bluetooth are not a complete picture of the interaction history, they do represent a large portion of the interactions (subjects were explicitly asked to leave their Bluetooth on all the time). We consider two timescales over which the real interaction data is aggregated, daily and weekly. The two timescales are chosen to consider the time-specific nature of real data in our evaluation and to simulate the dynamics from real data considering two easily interpretable timescales. The results referred to as practice (as opposed to theory), are simulated only considering the empirical data (daily and weekly); the real interaction events occurring within the community are used to model the dynamics of the parameters Inline graphic and Inline graphic. The interactions obtained by the Bluetooth physical proximity are aggregated over weekly and daily intervals and used directly for Inline graphic to simulate the epidemic. Similarly, the interactions obtained by Bluetooth are first used directly to define the tracing effort (i.e. to determine Inline graphic) on weekly and daily timescales in figure 10, considering the case where Inline graphic. Next, the tracing effort (Inline graphic) is only determined by interactions occurring in the phone communication network to determine the effects of tracing from the phone communication network in figure 11. In this case Inline graphic, which is the real overlap between these networks inherent to the dataset. The interactions are weighed according to the following equations. Inline graphic, where Inline graphic represents the total number of interaction events between node Inline graphic and its infectious neighbours within the specified time frame and Inline graphic is the average weight over all connected participants. Inline graphic in all simulation results presented. Overall, Inline graphic (in equation 1) is the mean degree of the infectious interaction network, which is a reweighing of Inline graphic (used in the theoretical simulations) according to the real data. Similarly, Inline graphic.

Figure 10. Simulation of contact tracing over the empirical data with Inline graphic.

Figure 10

Only the real physical proximity interactions are used to obtain Inline graphic and Inline graphic. The physical proximity interactions are obtained by the mobile phone Bluetooth data and are incorporated on (a) a weekly scale, and (b) a daily scales.

Figure 11. Dual network scenario simulated over the real mobile phone data.

Figure 11

Bluetooth physical proximity is used for Inline graphic, phone communication logs are used for tracing, Inline graphic.

First, we evaluate the difference between using the physical proximity data in an ideal network scenario in comparison to the theoretical case by comparing the model outputs on this community of Inline graphic participants (note Inline graphic in this section). The contact tracing model results obtained over the data-driven proximity network are labeled as weekly data and daily data in figure 12. For the case Inline graphic, the interactions for Inline graphic and Inline graphic are both taken from the physical proximity data since we are assuming perfect network overlap. We assume Inline graphic to remove the additional effects of tracing in the comparison. What we refer to as the theoretical case is the contact tracing model simulated according to equations (1)-(3) without any real data for Inline graphic and Inline graphic. For the theoretical case, the mean degree of the physical proximity network is used to set the network weights; Inline graphic, where Inline graphic is the overall node degree of the mobile phone proximity network (refer to figure S1). Figure 12 shows the difference between the theoretical case and the contact tracing model run over the physical proximity network. We observe the results considering the weekly aggregated interactions are in close agreement with the theoretical case. This confirms the Bluetooth interaction data collected for this community considered on a weekly scale to be in good agreement with the theoretical model, though not equivalent. The incorporation of daily interaction data results in almost an order of magnitude difference in the peak of the epidemic illustrating the models generally over estimate in theory due to less sparse weekly interaction patterns in reality. Precise details of the simulation scenario can be found in “Simulation Details” in File S1.

Figure 12. Theory versus practice.

Figure 12

Considering the ideal network scenario, we run the simulated contact tracing model with Inline graphic set to the average daily (and weekly) node degree of the data (see figure S1), but consider a simulated network (labeled as Inline graphic). Two data-driven models are considered with the interactions taken from the Bluetooth proximity logs. For all cases, Inline graphic, and therefore Inline graphic. The real data is considered on weekly and daily scales, and Inline graphic are the real physical interactions logged by the community’s Bluetooth sensors.

After making a comparison of the theoretical case with the data-driven case, whereby only the physical proximity network is considered in simulating infectious spread, we evaluate the proposed dual network methodology entirely on the real dataset. First we consider the single network case (with Inline graphic) in figure 10. Again, the real physical proximity interaction network is the underlying network both for infection spread and contact tracing, considering (a) weekly aggregated interactions and (b) daily aggregated interactions. In these simulations, we are evaluating the effect of tracing on Bluetooth physical proximity data which could potentially be available for tracing by service providers. We observe that contact tracing is very effective in reducing the effects of the epidemic both on a weekly and daily rate. In figure 10 (a) we observe an optimal tracing strategy occurs on this dataset for Inline graphic, which demonstrates maximizing tracing efforts is not always the best strategy, tracing efficiently is more important. There is no significant difference in the number of infected cases over time when Inline graphic or Inline graphic, meaning this additional effort is wasted in comparison to Inline graphic. The number of interactions occurring on a daily scale diminishes in comparison to the weekly case, and therefore the tracing effort reaches its lower bound sooner (Inline graphic). However, even on such a fine-grained timescale of one day, contact tracing based on the Bluetooth physical interactions captured by the mobile phone plays a significant role in reducing the effects of the epidemic. Next, we consider tracing based on only the phone calling records obtained in the data collection.

In figure 11 we consider the dual network case on the mobile phone data, where the infection spreads on the physical proximity network, and the tracing occurs only on the phone communication network. In figure 11 (a) the weekly aggregated physical proximity and phone communication networks are considered and in (b) the overall aggregated daily networks are considered. While we demonstrate in figure 4 the overall average overlap between these networks is Inline graphic, we see the effect of the epidemic is greatly reduced by tracing using the phone call records of the mobile phone users. These effects are less visible on the daily scale than the weekly scale which is due to the small number of daily phone communication traces made in the dataset (see figure S1 (a)). The results of tracing on the phone communication records illustrate that a very small tracing effort (Inline graphic) is sufficient in reducing the effects of the epidemic and is further a concrete and easily obtainable source for tracing. Note, the participants in this community are more likely to make fewer calls to one another since they are living together in a dormitory. However even given this challenging data collection, the results are still very significant, particularly on a weekly scale, whereby tracing based on users’ phone call records over a given week results in the reduction in the effects of an epidemic outbreak.

Discussion

We explore a data-driven avenue for contact tracing in epidemic prevention using social interaction data from mobile phones. A medium-sized real communitys data is considered to get insight into the relationship between physical interactions and mobile phone communication, and whether the latter can be exploited to perform contact tracing on the former. We explore the effectivity of such a strategy using data-driven simulations with realistic parameters extracted from the social network dataset, first, and then the full dual realistic network model of physical and communication interactions. Across multiple realistic scenarios for contact tracing, we find that contact tracing is an effective means for epidemic prevention, even when there exists a low overlap between the physical and communication networks. When considering tracing effort, we observe that contact tracing is greatly beneficial when the epidemic is starting, however, this effort will increase greatly as the epidemic grows. With low overlap between the physical and communication networks, we find that this effect is mainly due to the automatic adaptation of the tracing effort to the amount of infected people. We also uncover the relationship between the network overlap and the proportion of effort spent in random tracing versus contact tracing. The study thus gives insight into what proportion of the effort should be spent in contact tracing depending on the estimated network overlap (how much we trust the communication network represents the interaction network). While contact tracing is effective in reducing the number of infected cases, a dynamic approach considering a time-evolving combination of random and contact tracing is most promising, and optimization of costs as a function of varying random and contact tracing efforts over time will be considered in future work. We are also able to uncover the nonlinear relationship between overlap (between physical and communication networks) and contact tracing effort. This is important, as different communication technologies, present and future, are likely to have a different link to physical interactions. Quantifying how the overlap interacts with the tracing effort can inform public health policies aiming to exploit digital communication traces for epidemiology. Overall, we find interactions sensed by mobile phones to be a promising tool for epidemic simulation, particularly for future large-scale scenarios, for example city-scale infectious disease transmission. This work demonstrates mobile phone communication history to be a useful data source in disease prevention by obtaining contact information readily for epidemic contact tracing.

Supporting Information

Figure S1

Averaged user node degree per individual days (or weeks) in the study. These node degree distributions are plot on a daily basis (a)–(b) and on a weekly basis (c)–(d). The node degrees averaged over the users and over the days (or weeks) are used to simulate the epidemic in figure 12.

(EPS)

Figure S2

Number of events logged over time. We observe that SMS activity is equally important to consider as call activity in mobile phone communication data. Bluetooth interaction data is highly dependent on time, more so than the phone activity, which remains more constant over time.

(EPS)

Figure S3

Community’s overall network structure. The average node degree as a function of the minimum edge weight for (a) the communication network (call and SMS) and (b) the physical proximity interaction network. Edge weight is defined as the minimum number of events necessary for an edge to connect two nodes in both networks indicating the tie strength between the pair of nodes. The units in both cases are the number of events.

(EPS)

Figure S4

Node degree distributions. Accumulated node degree distribution over the study duration for (a) the phone network and (b) the physical proximity interaction network. Each individual user’s node degree is obtained as an accumulation over the 9 month period, and the distribution is plot given one overall node degree per user. The node degree for the accumulated static phone network is 7∶3 and for the Bluetooth network is 67∶25.

(EPS)

File S1

(PDF)

Acknowledgments

We would like to thank Anmol Madan and Alex Pentland for collecting and providing the empirical dataset, as well as Juliette Stehlé for help with the network simulation procedure.

Funding Statement

Manuel Cebrian is funded by the Australian Government as represented by The Department of Broadband, Communications and the Digital Economy, and The Australian Research Council through the ICT Centre of Excellence program. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Salathé M, Bengtsson L, Bodnar TJ, Brewer DD, Brownstein JS, et al. (2012) Digital epidemiology. PLoS Computational Biology 8. [DOI] [PMC free article] [PubMed]
  • 2. Salathé M, Freifeld CC, Mekaru SR, Tomasulo AF, Brownstein JS (2013) Inuenza a (h7n9) and the importance of digital epidemiology. New England Journal of Medicine 369: 401–404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Holme P, Liljeros F (2013) Beginning and end times of dyadic relationships control disease spreading in empirical contact data. arXiv preprint arXiv: 13076436.
  • 4. Balcan D, Vespignani A (2011) Phase transitions in contagion processes mediated by recurrent mobility patterns. Nature physics 7: 581–586. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Balcan D, Gonçalves B, Hu H, Ramasco JJ, Colizza V, et al. (2010) Modeling the spatial spread of infectious diseases: the GLobal Epidemic and Mobility computational model. Journal of com-putational science 1: 132–145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Balcan D, Colizza V, Gonçalves B, Hu H, Ramasco JJ, et al. (2009) Multiscale mobility networks and the spatial spreading of infectious diseases. Proceedings of the National Academy of Sciences of the United States of America 106: 21484–21489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Christakis NA, Fowler JH (2009) Connected: The surprising power of our social networks and how they shape our lives. Hachette Digital, Inc.
  • 8. Eagle N, Pentland AS, Lazer D (2009) Inferring friendship network structure by using mobile phone data. Proceedings of the National Academy of Sciences 106: 15274–15278. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Margolin D, Lin YR, Brewer D, Lazer D (2013) Matching data and interpretation: Towards a rosetta stone joining behavioral and survey data. In: Seventh International AAAI Conference on Weblogs and Social Media.
  • 10. Jones JJ, Settle JE, Bond RM, Fariss CJ, Marlow C, et al. (2013) Inferring tie strength from online directed behavior. PloS one 8: e52168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Pentland AS (2008) Honest Signals: How They Shape Our World. The MIT Press.
  • 12.Barabási AL (2013) Scientists must spearhead ethical use of big data. Politico.
  • 13. Salathé M, Jones JH (2010) Dynamics and control of diseases in networks with community structure. PLoS Comput Biol 6: e1000736. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Salathé M, Kazandjieva M, Lee JW, Levis P, Feldman MW, et al. (2010) A high-resolution human contact network for infectious disease transmission. Proceedings of the National Academy of Sciences 107: 22020–22025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Stehlé J, Voirin N, Barrat A, Cattuto C, Colizza V, et al. (2011) Simulation of an seir infectious disease model on the dynamic contact network of conference attendees. BMC medicine 9: 87. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Isella L, Romano M, Barrat A, Cattuto C, Colizza V, et al. (2011) Close encounters in a pediatric ward: Measuring face-to-face proximity and mixing patterns with wearable sensors. PLoS ONE 6: e17144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Wesolowski A, Eagle N, Tatem AJ, Smith DL, Noor AM, et al. (2012) Quantifying the impact of human mobility on malaria. Science 338: 267–270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Eames KT, Keeling MJ (2003) Contact tracing and disease control. Proceedings of the Royal Society of London Series B: Biological Sciences 270: 2565–2571. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Crandall DJ, Backstrom L, Cosley D, Suri S, Huttenlocher D, et al. (2010) Inferring social ties from geographic coincidences. Proceedings of the National Academy of Sciences 107: 22436–22441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Wang D, Pedreschi D, Song C, Giannotti F, Barabasi AL (2011) Human mobility, social ties, and link prediction. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1100–1108.
  • 21.Cho E, Myers SA, Leskovec J (2011) Friendship and mobility: user movement in location-based social networks. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1082–1090.
  • 22. Tsimring L, Huerta R (2003) Modeling of contact tracing in social networks. Physika A 325: 33–39. [Google Scholar]
  • 23. Madan A, Cebrian M, Moturu S, Farrahi K, Pentland S (2012) Sensing the health state of a community. IEEE Pervasive Computing 11: 36–45. [Google Scholar]
  • 24.Madan A, Cebrián M, Lazer D, Pentland A (2010) Social sensing for epidemiological behavior change. In: Ubiquitous Computing (UbiComp). Copenhagen, Denmark, 291–300.
  • 25.Madan A, Farrahi K, Gatica-Perez D, Pentland AS (2011) Pervasive sensing to model political opinions in face-to-face networks. In: Pervasive Computing, Springer. 214–231.
  • 26.Farrahi K, Emonet R, Ferscha A (2012) Socio-technical network analysis from wearable interactions. In: Wearable Computers (ISWC), 2012 16th International Symposium on. IEEE, 9–16.
  • 27.Stehlé J (2012) Réseaux de proximité humaine: analyse, modélisation et processus dynamiques. Ph.D. thesis, Aix-Marseille Université.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure S1

Averaged user node degree per individual days (or weeks) in the study. These node degree distributions are plot on a daily basis (a)–(b) and on a weekly basis (c)–(d). The node degrees averaged over the users and over the days (or weeks) are used to simulate the epidemic in figure 12.

(EPS)

Figure S2

Number of events logged over time. We observe that SMS activity is equally important to consider as call activity in mobile phone communication data. Bluetooth interaction data is highly dependent on time, more so than the phone activity, which remains more constant over time.

(EPS)

Figure S3

Community’s overall network structure. The average node degree as a function of the minimum edge weight for (a) the communication network (call and SMS) and (b) the physical proximity interaction network. Edge weight is defined as the minimum number of events necessary for an edge to connect two nodes in both networks indicating the tie strength between the pair of nodes. The units in both cases are the number of events.

(EPS)

Figure S4

Node degree distributions. Accumulated node degree distribution over the study duration for (a) the phone network and (b) the physical proximity interaction network. Each individual user’s node degree is obtained as an accumulation over the 9 month period, and the distribution is plot given one overall node degree per user. The node degree for the accumulated static phone network is 7∶3 and for the Bluetooth network is 67∶25.

(EPS)

File S1

(PDF)


Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES