Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2021 Jul 12;17(7):e1009146. doi: 10.1371/journal.pcbi.1009146

OpenABM-Covid19—An agent-based model for non-pharmaceutical interventions against COVID-19 including contact tracing

Robert Hinch 1,*,#, William J M Probert 1,#, Anel Nurtay 1, Michelle Kendall 1,2, Chris Wymant 1, Matthew Hall 1, Katrina Lythgoe 1, Ana Bulas Cruz 1, Lele Zhao 1, Andrea Stewart 1, Luca Ferretti 1, Daniel Montero 3, James Warren 3, Nicole Mather 3, Matthew Abueg 4, Neo Wu 4, Olivier Legat 4, Katie Bentley 5,6, Thomas Mead 5,6, Kelvin Van-Vuuren 5, Dylan Feldner-Busztin 5, Tommaso Ristori 7, Anthony Finkelstein 8,9, David G Bonsall 1,10, Lucie Abeler-Dörner 1, Christophe Fraser 1,11
Editor: Benjamin Muir Althouse12
PMCID: PMC8328312  PMID: 34252083

Abstract

SARS-CoV-2 has spread across the world, causing high mortality and unprecedented restrictions on social and economic activity. Policymakers are assessing how best to navigate through the ongoing epidemic, with computational models being used to predict the spread of infection and assess the impact of public health measures. Here, we present OpenABM-Covid19: an agent-based simulation of the epidemic including detailed age-stratification and realistic social networks. By default the model is parameterised to UK demographics and calibrated to the UK epidemic, however, it can easily be re-parameterised for other countries. OpenABM-Covid19 can evaluate non-pharmaceutical interventions, including both manual and digital contact tracing, and vaccination programmes. It can simulate a population of 1 million people in seconds per day, allowing parameter sweeps and formal statistical model-based inference. The code is open-source and has been developed by teams both inside and outside academia, with an emphasis on formal testing, documentation, modularity and transparency. A key feature of OpenABM-Covid19 are its Python and R interfaces, which has allowed scientists and policymakers to simulate dynamic packages of interventions and help compare options to suppress the COVID-19 epidemic.

Author summary

Throughout the COVID-19 pandemic, computational modelling has been used to inform key uncertainties facing policymakers such as the number of cases and deaths, hospital capacity, tests and contact tracers. Models need to be: sufficiently complex to yield realistic predictions; computationally efficient to allow calibrations; and easy to use so that various policy mixes can be evaluated. OpenABM-Covid19 is a detailed epidemic model of the spread of COVID-19, simulating every individual in a population. Our model enables scientists and policymakers to quickly compare the effectiveness of non-pharmaceutical interventions like lockdowns, testing, quarantine, and digital and manual contact tracing. The model considers a hypothetical city with a default population of 1 million people whose ages and contact patterns are parameterised according to UK demographics. All of the parameters are openly documented and modifiable so that they can be adapted to fit other countries’ data, and refined to match our understanding of COVID-19 as the epidemic progresses. The computer model simulates people’s movement between their homes, workplaces, schools, and social interactions. OpenABM-Covid19 is open source and has been developed collaboratively by teams from academia and industry. Its modularity, documentation, testing framework, and accessibility via Python and R have provided validation, invited contributions, and encouraged wide adoption.

Introduction

The novel coronavirus SARS-CoV-2 first appeared in China in late 2019 and spread across the globe in early 2020, causing several hundred thousand deaths world-wide in the first half of the year and overwhelming health systems [1]. Restrictions on movement were imposed in many countries, with severe impacts on social life, education, and economies [2]. Mathematical models have long been used to explain and forecast the course of epidemics and to predict the effects of public health interventions [3,4]. Most governments and policymakers use mathematical models to inform their decision-making [5]. The scientific community has responded by adapting old models and designing new models to learn more about the COVID-19 epidemic and inform public health.

Compared to compartmental models and branching-process models, agent-based models (ABMs) of the spread of infection allow for a more complete representation of the social contact network in which contagion occurs [6]. Major advantages include the ability to simulate heterogeneity in contact rates and local saturation effects, and the ability to better simulate contact tracing. Alongside other non-pharmaceutical interventions, contact tracing is an important intervention to help reduce the spread of COVID-19 [7,8]. In an ABM, the full history of all contacts can be stored, allowing for the impact of contact tracing to be explored in detail. For example, ABMs can include clustering in the contact network, so if incidence is high in a region of the contact network, an uninfected person who is contract-traced will be protected from this high level of local incidence. A downside of ABMs is that they are comparatively complex to code, are often not very parsimonious, and can be very computationally intensive to run, limiting the ability to explore a wide range of parameter combinations. ABMs have been used throughout the COVID19 pandemic to inform the public health response [914]. Here, we focus on developing OpenABM-Covid19, an agent-based simulation which addresses these downsides, by focussing on parsimony, computational efficiency, code transparency, and a robust testing framework.

A particular focus of our work applying OpenABM-Covid19 has been exploring different ways in which contact tracing, and in particular digital contact tracing using mobile phone apps that record proximity events, can contribute to epidemic control [15]. Several other groups have approached this problem with similar ABMs [9,10].

We developed the agent-based model OpenABM-Covid19 to simulate an outbreak of COVID-19 in an urban environment. The default population is one million inhabitants with demographic structure based upon UK-wide census data, and household size and age-structure matched to data from the UK 2011 Census survey (for example, older people tend to live together and young children tend to live with younger adults).

On a daily basis all individuals in the model move between networks representing households and either workplaces, schools, or regular social environments for older people. Individuals also interact through random networks representing public transport, transient social gatherings etc. Membership of each type of network is determined by age, giving rise to age-assortative mixing patterns. Network parameters are chosen such that the average number of interactions match age-stratified data reported in [16]. The number of daily interactions in random networks is drawn from a negative binomial distribution, allowing for rare super-spreading events.

Infections are seeded in the population and spread through the networks. Biological and epidemiological characteristics of COVID-19 disease have been derived from the scientific literature. The model takes into account asymptomatic infections and different stages of severity, and includes the simulation of hospitalisations and ICU admissions. Since symptoms, disease progression and infectiousness are highly age-dependent, disease pathways in the model are age-stratified.

The ABM was developed to simulate different non-pharmaceutical interventions including lockdown, physical distancing, self-isolation on symptoms, testing and both manual and digital contact tracing. Modelling contact tracing requires the model to keep a record of previous interventions for a set number of days. A variety of contact tracing algorithms are included in the ABM, including tracing on symptoms and/or after a positive test, notifying first-degree contacts only or second-degree contacts as well, notifying household members or contacts of household members, testing of traced contacts, and imperfections in test-trace-isolate programmes such as delays, missed contacts and partial compliance. The model reports both aggregated data, such as incidence, tests required, individuals quarantined for various reasons, etc., and individual data such as transmission relationships.

OpenABM-Covid19 is available on Github (https://github.com/BDI-pathogens/OpenABM-Covid19), including model documentation, dictionaries for input parameters and output files, over 200 tests in a consistent testing framework used in model validation, and examples for running the model. The core of the model is implemented in the C language for speed; however, the model is run via Python using a SWIG-interface (see Implementation Details). This interface allows for dynamic intervention strategies to be modelled, as well as providing full transparency about the state of the model. This manuscript was prepared using v1.0 of the model and code for reproducing all figures in this manuscript from model output are publicly available online (https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper).

OpenABM-Covid19 enables simulation of interventions to help policymakers determine the best options to suppress the COVID-19 epidemic in various settings. Default demographic parameters were chosen to reflect the UK and fit well to the COVID-19 epidemic in England; however, all parameters of the model can be changed by the user.

Results

OpenABM-Covid19 was originally developed for evaluating the design of digital contact-tracing applications for the technology division of the National Health Service (NHSX) in April 2020 [15]. This work quantified the importance of rapid test processing and the level of user uptake required for digital contact-tracing to be effective. In a further study, OpenABM-Covid19 was used to investigate the benefits of deploying an application based on Google’s Exposure Notification System (ENS) in Washington State [17]. This study included a calibration of the model to County-level data and demonstrated that digital contact-tracing provides benefits even when manual contact-tracing is deployed. OpenABM-Covid19 has been used throughout the pandemic by the National Health Service England (NHSE) to model hospital admissions at a regional level in England [18,19].

Whilst the principle aim of this paper is to give a detailed description of the model and its software interfaces, we now demonstrate some results of the model. First we look at some general features of the model including the interaction networks, the infection dynamics and the effect of population size. Next, we consider a case-study of the first wave of the COVID19 epidemic in England to demonstrate that a lightweight calibration can provide a close fit to several pieces of observed data. Finally, we simulate some of the intervention strategies which can be modelled by OpenABM-Covid19, including the requisite Python or R code, to demonstrate how complex multi-component intervention strategies can be simulated.

General model properties

Interaction networks

In each of the interaction networks, individuals are represented as nodes. Constant and dynamic connections occur between the nodes in the networks, representing interactions between individuals. The three networks represent different types of daily interactions: household, occupation (workplaces, schools or regular social environments for older people), and random (public transport, essential shopping, transient social gatherings) (Fig 1). The interaction networks have two roles in the ABM. First, the infection can be transmitted between two individuals on a day that they interact. Second, the interactions for each individual are stored and can be used for contact tracing. The membership of different networks leads to age-group assortativity in the interactions. Details of the construction of these three networks are given in the Methods. The distribution of the number of interactions on a simulated network by type and by age are shown in Fig 2A and 2B. Note how the mean number of contacts decreases with age as found in empirical studies [16]. The total number of interactions on the household network by age are shown in Fig 2C. Note how interactions are clustered on the diagonal (people of the same age tend to live to live together) and the off-diagonal between children and adults aged 30 to 50 years (families).

Fig 1. Schematic depiction of the interaction networks within OpenABM-Covid19.

Fig 1

The household network is recurrent, the occupation network is a daily sample of a recurrent network, and the random network is transient and rebuilt each day.

Fig 2. Summary of interactions between individuals within OpenABM-Covid19.

Fig 2

(A) Distribution of daily simulated interactions stratified by the network upon which they occur. (B) Distribution of daily simulated interactions stratified by age group. Distribution of daily simulated interactions stratified by age group of both individuals in the (C) occupation, (D) household, and (E) random networks. Summarised interactions are from the first day of a single simulation in a population of 1 million individuals with UK-like demographics and household structure. Zero counts are shown in white in panels C, D, E.

Infection dynamics

The infection is spread by interactions between infected (source) and susceptible (recipient) individuals. The rate of transmission is determined by three factors: the infectiousness of the source, the age-dependent susceptibility of the recipient, and the type of interaction, i.e. on which network it occurred. Details of the infection model and how it was calibrated are in the Methods.

An example of how transmissions can be stratified by the infection status of the source and the age of both source and recipient is depicted in Fig 3. In this simulation of an uncontrolled epidemic, most transmissions occur from pre-symptomatic individuals with mild disease who are more numerous than individuals who go on to develop severe disease, followed by symptomatic individuals with mild disease. Interventions that reduce the rate of growth of transmission will change the relative contributions of different symptomatic stages. Note that the largest number of transmissions occur pre-symptomatically before a mild infection in adults and children of secondary school age.

Fig 3. Transmission events stratified by age of source and recipient and by infectious status of source.

Fig 3

Infectious status of source is specified in panel title. Data are from a single simulated epidemic of 1 million individuals with OpenABM-Covid19 following the first wave of the COVID19 epidemic in England. Zero counts are shown in white.

An important property of the epidemic is the offspring distribution which quantifies the amount of super-spreading. The offspring distribution was calculated by counting transmissions from the first 1% of people infected (i.e. the first 10,000 individuals) during the initial exponential growth phase of a simulation (S19 Fig). A negative-binomial distribution was fit using maximum likelihood estimation and gave an estimate of k = 0.5. This estimate is in-line with empirical estimates from surveys which give the estimates of 0.49–0.52 [20], 0.35–1.18 [21] and 0.32–0.49 [22]. The simulation to estimate k was run using the R interface for OpenABM and the code is shown in S20 Fig.

The household secondary attack rate was calculated by counting the number of intra-household transmissions from the first person infected in each household and who was also in the first 1% of people infected in the initial exponential growth phase of a simulation. The household secondary attack rate for the default parameterisation of the model is 25%, which lies within the range of empirical estimates from Germany (21%) [23], the US (24%) [24] and the Netherlands (28%) [25]. Note that estimates from China are lower [26], however, the European/US studies are more applicable for a UK simulation.

Population-size effects

We investigate the sensitivity of the model on the total population and the effect of aggregating sub-populations by estimating the systematic and stochastic variability of key statistics of the epidemic (S16 Fig, details in legend). The analysis showed that the stochastic variation in doubling rate (mean 3.5 days) was less than 0.2 days and the total number infected (mean 85%) less than 0.5% for simulations with at least 1 million people. The results showed no measurable difference in the mean value of the statistics for populations greater than 50k. Additionally, there was no measurable difference between running a simulation on a single population and aggregating across sub-populations with the same total population.

Simulated epidemic for first wave in England (spring 2020)

The model was run on a population of 56 million people with UK demographics for the first wave of the COVID19 epidemic in England, by aggregating 56 simulations of 1 million people. An infection was seeded and grew exponentially with a doubling time of 3.5 days. A nationwide lockdown was introduced when prevalence reached 1.55% and the model then ran for a further 77 days. Several metrics are presented using the same parameterisation but for a simulation of 1 million individuals so that results can quickly be reproduced on a modern laptop or desktop computer. The age-dependent infection fatality ratio (IFR) for a representative simulation is depicted in Fig 4 and presented in Table 1, and is in line with other studies (e.g. [27,28]); other age-dependent outcomes are shown in S1 and S2 Figs. S3 Fig shows the corresponding waiting time distributions.

Fig 4. Age-stratified infection fatality ratio (IFR) from a single simulation of OpenABM-Covid19.

Fig 4

Simulation in a population of 1 million with UK-like demography and with a lockdown when SARS-CoV2 prevalence reached 1.55%.

Table 1. Age-stratified infection fatality ratio (IFR) from a single simulation of OpenABM-Covid19.

Age group IFR (%)
0–9 0
10–19 0
20–29 0
30–39 0.0292
40–49 0.1173
50–59 0.3165
60–69 1.655
70–79 3.7406
80+ 9.4691
Whole population 0.8659

Simulation in a population of 1 million with UK-like demography and with a lockdown when SARS-CoV2 prevalence reached 1.55%.

The main outputs of the model include the number of infected individuals, hospitalisations, ICU admissions and deaths, and for the aggregate simulations of 56 million individuals these can be compared to observed data for England (Fig 5). Additional outputs are the number of people in quarantine and the number of tests required, which is of particular interest when comparing different interventions. Transmissions can be depicted according to their type (pre-symptomatic, symptomatic and asymptomatic). The model provides a good fit to data on the first wave of the COVID19 epidemic in England with minimal calibration, matching the timing of the peak in daily deaths to within a few days, the trajectory of COVID19 patients in hospital beds, peak in hospital admissions, and national estimates of seroprevalence by early June 2020 (Fig 5). Calibration involved fitting a transmission parameter (infectious_rate) so that doubling time of deaths of 3.5 days was matched (estimates of the doubling time in the UK were between 3–4 days [29]), and a two-dimensional grid search was then performed across the prevalence at which a national lockdown was implemented (calibrated to 1.55%) and the reduction in daily contacts under lockdown (calibrated to 0.33 of pre-lockdown levels; similar to values reported in [30] from the first wave in the UK).

Fig 5. Example of model outputs from OpenABM-Covid19 compared to observed data from the first wave in England.

Fig 5

Simulations are from 50 simulations in a population of 56 million individuals with UK-like demographics and control interventions. The beginning of the national lockdown is 23rd March 2020. Overlaid data are provisional counts of the number of deaths (measured by date of death) involving the coronavirus (COVID-19) registered in England (accessed on 5th June 2020), COVID19 patients in hospital beds (England), daily hospital admissions (England) from the UK government’s COVID19 dashboard, and estimates of seroprevalence in England from the UK Office of National Statistics. Simulations are not calibrated to hospitalisation data, only shown for completeness.

Non-pharmaceutical interventions and vaccinations

OpenABM-Covid19 can model a range of non-pharmaceutical interventions. Given the many types of intervention and interest in introducing them at different times, the interventions are controlled in the simulation dynamically through the Python interface. This allows for policy interventions to be applied in response to change in the growth of the epidemic (e.g. stricter policies such as lockdown can be applied when prevalence is above a threshold). Below we give brief descriptions of the interventions and sample Python code is given in the S5S12 Figs with links to Jupyter Notebooks. All model parameters involved with non-pharmaceutical interventions are given in S10 and S11 Tables.

Self-isolation upon symptoms

A proportion of individuals self-isolate upon developing symptoms. Self-isolation is modelled by stopping interactions on the individual’s occupation network and greatly reducing their number of interactions on the random network. The default time for self-isolation is 7 days with a daily dropout. The ABM contains the option to quarantine everybody within the household of the symptomatic individual. The ABM also considers individuals without COVID-19 who develop flu-like symptoms and may therefore self-isolate. S5 Fig is a Jupyter Notebook demonstrating how self-isolation upon symptoms reduces the rate of spread of the infection.

Hospitalisation

Once admitted to hospitals, a patient immediately stops interacting with the household, occupation and random networks. In the default model, we do not model interactions within hospitals. A preliminary module for hospital interactions has been developed and will be refined in future work.

Lockdown

Lockdown is modelled by reducing the number of interactions that people have on their occupation and random networks (by default by 67%). Additionally, given that during lockdown people stay at home, the transmission rate for interactions on the household network is increased by a factor of 1.5. S6 Fig is a Jupyter Notebook demonstrating the rapid reduction in the instantaneous reproduction number, R, when a lockdown is imposed. The impact of lockdown on the instantaneous and actual R is given in S13 Fig and an animation showing the age-stratified infection and disease compartments is in S1 Video.

Shielding

Contact reductions can be applied to certain age groups only. For example, given that fatality ratio is highly skewed towards the over 70s, we have the option of applying a reduction in contacts to this demographic group only. S7 Fig is a Jupyter Notebook demonstrating how new infections can be kept low in a shielded group.

Physical distancing

Measures such as physical distancing and mask wearing will reduce the probability of transmission in certain types of interactions (i.e. random interactions but not household interactions). The ABM allows for this to be modelled by allowing for the network-specific transmission multipliers to be adjusted during a simulation. S8 Fig is a Jupyter Notebook demonstrating how new infections can be kept low after a lockdown with (extreme) physical distancing measures.

Testing and contact tracing

OpenABM-Covid19 is able to model contact tracing (both manual and digital) and how it operates with or without an integrated testing system. The model contains many of the real-world imperfections which affect test and contact tracing programmes, such as test sensitivity and specificity, delays in testing and contact tracing, incomplete coverage, failure to recall contacts, contact tracer resource limitations and impartial adherence to quarantine requests. It also has the ability to model recursive contact tracing with and without testing. Below we give descriptions of the test and contact tracing features, with sample code given in S9S12 Figs along with links to Jupyter Notebooks.

Testing for SARS-CoV-2 infection

Testing can occur in both the community and hospital (where an immediate clinical diagnosis is allowed). Tests are assumed to be sensitive from 3 days post-infection to 14 days post-infection with a default sensitivity of 80% and specificity of 99%. For community testing, delays can be introduced for ordering a test and for receiving the test result. Testing of an individual in the community is triggered by reporting symptoms and can also be triggered by being contact traced. S9 Fig demonstrates the importance of quick testing if self-isolation only occurs after a positive test (as opposed to on symptoms). The time-series output of the model shows both total infections (regardless of a test being performed) and total cases (those who have tested positive).

Digital contact tracing

Contact tracing is vital to control epidemics with a high level of pre-symptomatic transmission. A variable fraction of individuals in each group can be assigned to have a digital contact tracing smartphone app. Ownership of smartphones is based on age-stratified OFCOM data (S4 Fig and S9 Table). Digital contact tracing can only occur between two app users. Digital proximity sensing is likely to miss some interactions, so when contact tracing a number of interactions are randomly dropped. For contact tracing, the model takes into account all interactions the individual has had with other app-users for the past seven days which have not been dropped. The model can simulate different app-based contact tracing algorithms. The app can send out notifications with the request to quarantine based on symptoms, or based on a positive test result of the index case. It can ask the household members of the index case and/or household members of the contacts to quarantine and also send out notifications deeper into the network if desired. It can request tests for contacts of index cases if desired. S10 Fig demonstrates how digital contact tracing following rapid testing can prevent a second wave even when the average uptake is at only 50% of the total population. S11 Fig demonstrates the calculation of the benefit to individuals of digital contact tracing.

Manual contact tracing

Manual contact tracing works in a similar way to digital contact tracing with a few key differences. First, since it does not rely on an individual being a smartphone user, it can originate from anybody who tests positive (particularly important in the elderly where smartphone usage is lower). However, since the identification of interactions relies on the index case recalling them, only a fraction of actual interactions are traced. In particular, the fraction of interactions recalled depends on the type of interaction (i.e. occupation based interactions are more likely to be recalled than random interactions). Manual contact tracing only occurs after a delay following a positive test, to account for contact tracers contacting both the index and traced individuals. Finally, during a peak in the epidemic the amount of contact tracing required increases and risks overwhelming a manual contact tracing service. Therefore the model contains constraints on the total number of interviews that contact tracers can perform on a single day. S12 Fig demonstrates how a well-staffed manual contact tracing following rapid testing can lessen a second wave.

Quarantine

Contact traced individuals can be asked to quarantine (default 14 days) either because they are directly traced or because they are a household member of somebody who has been traced. Like self-isolation, quarantine is modelled by stopping interactions on the workplace network and greatly reducing the number of interactions on the random network. The model includes a daily dropout rate to simulate imperfect adherence. Quarantine can be ended if the index case later tests negative (after tracing based upon their symptoms), or if the quarantined individual tests negative.

Vaccination

OpenABM-Covid19 has the ability to model the effect of vaccination programmes in controlling the epidemic. Two types of vaccines are modelled: full protection where an individual cannot be infected; or protection from symptoms, where an individual can be infected but is asymptomatic. Efficacy is modelled as all-or-nothing for each individual, and for those who gain protection there is a delay between inoculation and the time at which the vaccine gives protection. The Python and R interfaces allow a vaccine schedule by age to be specified and multiple types of vaccine can be applied in a single simulation. S17 Fig demonstrates the effect of a vaccine programme (both full and from-symptoms protection) where 2% of the adult population are vaccinated each and S18 Fig shows the R code used to simulate a bespoke vaccination programme.

Discussion

We present OpenABM-Covid19, a COVID-19-specific agent-based model suitable for simulating the epidemic in different settings and assessing non-pharmaceutical interventions, including contact tracing using a mobile phone app. The model is well documented with a simple interface, allowing non-experts to easily evaluate complex dynamic intervention strategies in a few lines of Python or R code. OpenABM-Covid19 is an open-source project and is easily extensible, with new features already being added by multiple external teams. The model is fully documented and is thoroughly tested in a formal testing framework.

The model was designed to be as parsimonious as possible, with complexity only added when it was essential to model important features of COVID-19 or details of non-pharmaceutical interventions, and with parameters being inferred from published studies. Due to the substantial pre-symptomatic and asymptomatic transmission of the virus, it is necessary to model each individual’s normal daily interactions. Further, on developing symptoms or during interventions such as contact tracing, the interaction pattern of individuals changes to only include those in the household. We therefore took the decision to model interactions using three social networks (household/occupational/random) with non-pharmaceutical interventions affecting each network differently. Recurring small-world networks were used to model interactions at home and at work, whereas a transient random network was used to model other daily interactions such as on public transport or in shops. The strong association of COVID-19 disease progression with age along with the age assortativity of social networks, led us to use a decade age-structure. The model simulated an urban population of 1 million rather than the population of a whole country to allow realistic estimates for hospitalisation and ICU admission forecasts on a regional level. Nevertheless, its use can be extended to perform an analysis on a country-wide level, as in Fig 5 for England. Large national epidemics will also exhibit meta-population dynamics rather than the spatially unstructured mixing modelled here.

One of the key aims of OpenABM-Covid19 was to model non-pharmaceutical interventions and, in particular, different forms of contact tracing. The model of digital contact tracing allows for questions such as the role of: testing delays, different quarantine requests, compliance rates, recursive testing, and app uptake to be investigated. The model of manual contact tracing allows for questions such as resource limitations, partial contact recall and interview delays to be investigated. The vaccination model allows for questions such as the order in which people are vaccinated to be investigated. Importantly, due to the simple Python and R interfaces it is possible for non-experts to simulate all these features and to investigate the effect of applying multiple intervention policies at different stages of the epidemic.

The current version of the model does not include nosocomial transmissions, transmission in care-home settings, non-hospital deaths, gender/sex of individuals, comorbidities, or any geographical structure apart from that implicit within the three modelled networks. All of these limitations are being currently addressed by collaborators and will become available on the Github repository in the near future. For example, a preliminary hospital model has been created to characterise the effect of SARS-CoV-2 transmission between patients, health-care workers and the wider community. The hospital model allocates patients to general and ITU wards according to symptoms; and then models the interactions between patients and healthcare workers within wards and the hospital as a whole. The hospital model is currently available on the Github repository.

Another important area to develop will be the modelling of multiple-variants of SARS-CoV2. The current version of the model supports multiple-variants with different transmission rates, but assumes complete cross-immunity between variants. Whilst this is a reasonable model for the B.1.1.7 variant, it will be insufficient for modelling the B.1.351 or P.1 variants. These limitations are being addressed by collaborators and will become available on the Github repository in the near future.

OpenABM-Covid19 is a versatile tool to model the COVID-19 epidemic in different settings and simulate different non-pharmaceutical interventions including contact tracing. OpenABM-Covid19 is a modular tool that will help scientists and policymakers weigh decisions during this epidemic. Our vision is that, with the help of the world-wide modelling community, it will develop into a family of models for infectious diseases that are at risk of causing pandemics in the future, adding to the international toolkit for epidemic preparedness.

Methods

Demographics

Within the ABM, individuals are categorised into nine age groups by decade, from “0–9 year” to “80+ years”. Decades were used instead of broader age groups because of the strong age-structure of the disease progression. By default, the demographics of the ABM are set to UK national data for 2018 from the Office of National Statistics (ONS). The proportion of individuals in each age group is the same as that specified by the population level statistics in S1 Table. Since we only consider simulating the epidemics up to a year, we do not consider changes in the population due to births, deaths due to other causes, and migration.

Interaction network

A previous study of social contacts for infectious disease modelling, based on participants being asked to recall their interactions over the past day, has estimated the mean number of interactions that individuals have by age group [16]. We estimate mean interactions by age group by aggregating data (S2 Table). Fig 2A depicts the resulting distribution of contacts by network and Fig 2B by age.

Every individual is assigned to live in a single household. The household network is formed by all members of every household interacting with each other every day. The distribution of household sizes is the ONS estimate for the UK in 2018 (S1 Table). In addition to the household size, the mix of ages in households is important since multi-generational households provide a path by which the infection can be transmitted from young to old. To model this we used a reference panel of 10,000 households taken by down-sampling the UK-wide household composition data from the 2011 Census produced by the ONS. The overall household structure was generated by sampling from the reference household panel with replacement and using rejection-sampling to match the aggregate statistics for the age demographics and household size. The rejection-sampling method sequentially adds sampled households to the network if the deviation between the aggregate statistics and the target values (also from the ONS) was less than an acceptance threshold. The acceptance threshold is reduced as the sampled network grows and the final network is only accepted if the aggregate demographic and household size statistics are within tolerance of the targets (sum square error < 10−5). The household members then makes up the population in the model. The number of daily interactions on a simulated network within each household by age is shown in Fig 2D.

Each individual is also a member of a recurring occupation network to model school, workplace or social networks. The occupation networks are modelled as small-world networks [31]. The network has a fixed set of connections between individuals, and each day a random subset (50%) of these connections are chosen as the interactions between individuals. When constructing the occupation networks, the ABM ensures the absence of overlaps between the household interactions and the local interactions on the small-world network. For children, there are separate occupation networks for the 0–9 year age group (i.e. nursery/primary school) and the 10–19 year age group (i.e secondary school). On each of these networks we introduce a small number of adults (1 adult per 5 children) to represent teaching and other school staff. Similarly for the 70–79 year age group and the 80+ year age group we created separate networks representing daytime social activities among elderly people (again with 1 younger adult per 5 elderly people to represent some mixing between the age groups). All remaining adults (the vast majority) are part of the 20–69 network. Due to the difference in total number of daily interactions, each age group has a different number of interactions in their occupation network. Parameters and values corresponding to the occupation network are shown in S3 Table. The number of daily interactions on a simulated occupational network by age is shown in Fig 2C.

In addition to the recurring structured networks of households and occupations, we include random interactions. These are drawn randomly each day, independent of previous connections. The number of random connections an individual makes is the same each day (in the absence of interventions), drawn at the start of the simulation from an over-dispersed negative-binomial distribution. This variation in the number of interactions introduces some “super-spreaders” into the network who have many more interactions than average. The mean numbers of connections were chosen so that the total number of daily interactions matched that from a previous study of social interaction [16]. The number of random interactions was chosen to be lower in children in comparison to other age groups. Interactions in the random network are listed in S4 Table. The number of daily interactions on a simulated random network by age is shown in Fig 2E.

OpenABM also allows users to specify their own networks which can be added in addition (or instead of) to the default networks in the model. An example of how to add a user specified network via the R interface is given in S13 Fig.

Infection dynamics

Infectiousness varies over the natural course of an infection, i.e. as a function of the amount of time the source has been infected, τ. Infectiousness starts at zero at the point of infection (τ = 0), increases to a peak at an intermediate time, and decreases to zero a long time after infection (large τ). Following [7], we took the functional form of infectiousness to be a scaled gamma distribution. We chose the mean and standard deviation as intermediate values between different studies [7,32,33]. Once infected, we split individuals into three groups based upon the eventual severity of the disease: asymptomatic, mild symptomatic and moderate-severe symptomatics. The level of infectiousness depends upon this grouping, i.e. pre-symptomatic individuals who go on to develop moderate-severe symptoms are more infectious than those who go on to develop mild symptoms. By default, the overall infectiousness of asymptomatic individuals and individuals with mild symptoms, is 0.33 and 0.72 times that of individuals with moderate-severe symptoms respectively [34].

The susceptibility of the recipient to infection is modelled with a scale factor dependent on the recipient’s age. To determine these factors, we identified studies of whether or not transmission occurred from index cases to monitored close contacts [21,3541]. Lower probability of infection in children was reported in almost all studies, including that of Zhang et al [35] which observed more infections than the rest of the studies combined, with consistent adjustment for other covariates of transmission risk. We used the susceptibility by age of Zhang et al., interpolated to match our ten-year age categories. The merged data and fit are shown in S5 Table.

Transmission dynamics

We model the type of interaction, i.e. on which of the three networks the interaction took place. Whilst we do not have data on the length of interactions, interactions which take place within a person’s home are likely to be closer than other types of interactions leading to higher rates of transmission. This is modelled using a scale factor, which is 2 by default and gives a good estimate of the household secondary attack rate (see Results section). Finally, to fully account for the over-dispersion in offspring infections, we add an individual infectious factor which is drawn independently for each individual (and is the same for all their interactions). Combining all effects, we model the hazard rate per interaction at which the virus is transmitted by

λ(t,d,a,n)=LSaAdBnGI¯t1tfΓ(u;μi,σi2)du

where t is the time since the source was infected; d indicates the disease severity of the source (asymptomatic, mild, moderate/severe); a is the age of the recipient; n is the type of network where the interaction occurred; I¯ is the mean number of daily interactions; fΓ(u; μ, σ2) is the probability density function of a gamma distribution; μi and σi are the mean and width of the infectiousness curve; L scales the overall infection rate; Sa is the relative susceptibility of the recipient based on age; Ad is the relative infectiousness of the source based on disease severity; Bn is the scale factor for the network on which the interaction occurred; and G is the individual infectious factor which is drawn for each individual from a Gamma distribution with mean 1 and s.d. σII. S6 Table contains the values of the parameters used in simulations. The transmission hazard rate λ is converted to a probability of transmission via P = 1−eλ. The epidemic is seeded by randomly infecting individuals on the day before the simulation starts.

Natural history of infection

Upon infection, an individual enters a disease progression cascade where the outcome and rates of progression depend on the age of the infected person. The disease state transitions are shown in Fig 6 and the model parameters in S7 and S8 Tables.

Fig 6. Schematic of infection and disease transitions within OpenABM-Covid19.

Fig 6

The disease status of an individual, and the probability and time distribution of transitions. The Φxxx(age) variables are the probability of transition to a particular state, when the individual can progress to more than one state within the model, where the probability depends upon the age of the individual. The τxxx are the gamma distributed variables of the time taken to make the transition.

A fraction Φasym(age) of individuals are asymptomatic and do not develop symptoms, a fraction Φmild(age) will eventually develop mild symptoms, and the remainder develop moderate/severe symptoms. Each of these proportions depend on the age of the infected individual (S7 Table). Those who are asymptomatic are infectious at a lower level (see Infection Dynamics section) and will move to a recovered state after a time τa,rec drawn from a gamma distribution.

Once an individual is recovered the model allows immunity to wane through time using two parameters: a fixed period for which every individual must wait, τwaning-shift, and then a geometric distribution of waiting times until individuals become susceptible, parameterised by its mean τwaning-mean. By default, the model assumes τwaning-shift to be 10,000 days (essentially no waning immunity). During this waiting period, infection is assumed to be completely immunising (recovered individuals cannot be reinfected).

Individuals who will develop symptoms start by being in a pre-symptomatic state, in which they are infectious but have no symptoms. The pre-symptomatic state is important for modelling interventions because individuals in this state do not realise they are infectious and therefore will not self-isolate based on symptoms to prevent infecting others. Individuals who develop mild symptoms do so after time τsym and then recover after time τrec (both drawn from gamma distributions). The remaining individuals develop moderate/severe symptoms after a time τsym drawn from the gamma distribution.

Whilst most individuals recover without requiring hospitalisation, a fraction Φhosp(age) of those with moderate/severe symptoms will require hospitalisation. This fraction is age-dependent. Those who do not require hospitalisation recover after a time τrec drawn from a gamma distribution, whilst those who require hospitalisation are admitted to hospital after a time τhosp, which is drawn from a shifted Bernoulli distribution. Among all hospitalised individuals, a fraction Φcrit(age) develop critical symptoms and require intensive care treatment, with the remainder recovering after a time τhosp,rec drawn from a gamma distribution. The time from hospitalisation to developing critical symptoms, τcrit, is drawn from a shifted Bernoulli distribution. Of those who develop critical symptoms, a fraction ΦICU(age) will receive intensive care treatment. For patients receiving intensive care treatment, a fraction Φdeath(age) die after a time τdeath drawn from a gamma distribution, with the remainder leaving intensive care after a time τcrit,surv. Patients who require critical care and do not receive intensive care treatment are assumed to die upon developing critical symptoms. Patients who survive critical symptoms remain in hospital for τhosp,rec before recovering.

OpenABM-Covid19 also includes a meta-population model which runs a simulation on parallel sub-populations (which can be parameterised separately). After each time step, new cases can be seeded in each region based upon the incidence of infections in connected sub-populations.

Implementation details

The core of OpenABM-Covid19 is coded in C with object-oriented interfaces in Python and R. The code is written in a modular manner to ease readability and encourage extension of the code base. It is open source and is being actively developed by multiple teams. The model uses the GNU Scientific Library (GSL) for mathematical functions, statistical distributions, and random number generation [42] and so any distribution or function available within the GSL can be easily incorporated into the model (for instance in modelling waiting-time distributions). Memory is pre-allocated at the start of the simulation for efficiency.

An important feature of the implementation are the Python and R interfaces using SWIG, which is a package for providing interfaces between high-level languages and C/C++ [43]. Running the model via Python allows for complex dynamic interventions strategies to be easily modelled (see examples in S5S12 Figs). All states of the model (e.g. transmission events, interactions, individual characteristics) are exposed in Python, which gives full transparency to the results of the model. For example, S12 Fig is a Notebook showing how to calculate the relative personal protective effect of app users versus non-app users when digital contact tracing is used. Python is also a ubiquitous language amongst data scientists, and the interface allows them to fully interact with the model whilst keeping the high speed and memory performance of C.

The metapopulation model is implemented in Python using the standard module multiprocessing [44], with each sub-population running in a separate process. Both the initial model set-up and the daily time-step are run in parallel, allowing for approximate linear speed-ups when running on a multi-core machine.

The model codebase includes over 200 tests used to validate the model. Each test ensures an expected output from the model is realised for a specified set of input parameters. Tests are written in a consistent manner, using the pytest framework. All tests are automatically run when new contributions to the codebase are made. Tests vary input parameters ensuring that expected behaviour of the model is realised across a wide range of input parameter values. Tests cover a range of domains including: disease dynamics, infection and transmission dynamics, non-pharmaceutical interventions, network construction, the C and Python interface, the waiting time distributions, file concordance across the multiple output files from the model, and non-disease related demographics.

Performance

The computational efficiency was measured by simulating an epidemic with default parameters until the total prevalence reached 1%; then reducing the transmission rates on the occupational and random networks so that R was reduced to 1; and simulating for 100 days in total. On a 2019 MacBookPro (2.4 GHz Quad-Core Intel Core i5), the simulation took 1.5s per day for a population of 1m people and scaled linearly with population size (S15A Fig). The time is dominated by daily rebuilding of networks, and running the simulation with static-networks took 50ms per day for a population of 1m people. The meta-population model of OpenABM with multiple sub-populations of 100k running in parallel threads took 250ms per day for a total population of 1m people. The meta-population model with static-networks took 15ms per day for a population of 1m people. The persistent memory used in the simulation was measured using the iprofiler command and Instruments. For a simulation where only 1 day of interactions are stored, the persistent memory is about 1kb per person (S15B Fig) and scales linearly with the population. The memory usage is split in approximate thirds between: networks; interactions; and data about individuals (e.g. timings of events such as infection). For a simulation where 7 days of interactions are stored (i.e. for contact-tracing modelling), the persistent memory is about 3kb per person of which about 80% is storing the interactions.

Supporting information

S1 Fig. Proportion of each age group ever infected, ever hospitalised, or deceased.

Simulations are from the end of a single simulation in a population of 1 million individuals with UK-like demographics and control interventions. The simulation was run for 77 days after lockdown started. The denominator in each calculation is the number of individuals in each age group in the total population (e.g. of the total population, the middle panel shows the proportion of each age group that was hospitalised in the simulation).

(TIF)

S2 Fig. Age-stratified hospital admissions, ICU admissions, and deaths.

Data are from a single simulation of 1 million individuals with UK-like demographics and control interventions. The simulation was run for 77 days after lockdown started. The denominator is the number of individuals ever having been in the state in question (e.g. of all simulated hospitalisations, the top panel shows the distribution of these by age).

(TIF)

S3 Fig. Waiting time distributions for transitions between infection and disease states.

All distributions are gamma except time to hospital which is a shifted Bernoulli distribution. Mean of each gamma distribution is shown with a vertical dashed line.

(TIF)

S4 Fig. Smartphone usage by age in the UK.

(TIF)

S5 Fig. Example notebook of self-isolation on symptoms.

Code for this notebook is provided at https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/master/notebooks/example_self_isolation.ipynb.

(TIFF)

S6 Fig. Example notebook of a lockdown.

The Python code used for this simulation, the code is also at https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/master/notebooks/example_lockdown.ipynb.

(TIFF)

S7 Fig. Example of lockdown followed by shielding.

Code for this notebook is provided at https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/master/notebooks/example_lockdown_shield.ipynb.

(TIFF)

S8 Fig. Example lockdown followed by social distancing.

Code for this notebook is provided at https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/master/notebooks/example_lockdown_social_distance.ipynb.

(TIFF)

S9 Fig. Example of self-isolation after testing.

Code for this notebook is provided at https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/master/notebooks/example_testing.ipynb.

(TIFF)

S10 Fig. Example of digital contact tracing.

Code for this notebooks is provided at https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/master/notebooks/example_digital_contact_tracing.ipynb.

(TIFF)

S11 Fig. Example app user protection calculation.

Code for this notebook is provided at https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/master/notebooks/example_digital_contact_tracing_protect.ipynb.

(TIFF)

S12 Fig. Example of manual contact tracing.

Code for this notebook is provided at https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/master/notebooks/example_manual_contact_tracing.ipynb.

(TIFF)

S13 Fig. Reproduction number.

Data from a single simulated outbreak with R calculated using the complete simulated transmission tree (actual) or using the time series (instantaneous). Simulation data are for a single simulation in a population of 1 million individuals with UK-like demographics. The vertical dashed mark where interventions were introduced (self-isolation on symptoms followed by lockdown), note that Ractual is reduced prior to the introduction of each intervention.

(TIF)

S14 Fig. Example adding a user defined network.

R script demonstrating how to add a user specified network. The R code is at: https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/master/R/figS14_example_add_network.R.

(TIFF)

S15 Fig. Performance.

A. The computation time per day for different size populations. The default networks are dynamic and are rebuilt each day, whereas the static networks are not changed after the first day. The meta-population model was run on a quad-core processor. B. The required memory for a simulation which is linear in population.

(TIFF)

S16 Fig. Scaling and stochastic variation.

Simulations for epidemics were run for different population sizes and split into different numbers of equal sub-populations in meta-models (zero case migration). Initially the epidemic was seeded with 0.05% infections and an uncontrolled epidemic was allowed to develop for 100 days, with approximately no new infections at the end. A minimum of 20k people was required in each subpopulation in order for there to be sufficient seed infections to prevent a stochastic extinction at the start. Each simulated epidemic was characterised by 2 basic statistics: the doubling time in days to go from 1% to 2% of the population infected; and the total fraction of the population infected. Each configuration was run 10 times and the figure is a box plot of the results, with the number of subpopulations shown as separate colours. The simulations show that the mean doubling time and fraction infected are roughly independent of the total population size. The stochastic variation is determined by the total population and is independent of the number of subpopulations. With a total population of at least 1 million people, the stochastic variation in the doubling time was <0.2 days and in the total number infected was <0.5%.

(TIFF)

S17 Fig. Vaccine result.

A simulation of a vaccine programme implemented after a lockdown period to control the epidemic. The epidemic was allowed to grow until 2% of the population had been infected, at which point a lockdown was implemented for 30 days along with a vaccination programme where 2% of adults were inoculated each day (vaccine 90% effective after 15-days). The figure compares the total deaths and infections for a vaccine which offers full protection from symptoms, to one which only offers protection from symptoms and to no vaccine programme. The R code for generating this figure is at https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/master/R/figS17_vaccine.R.

(TIFF)

S18 Fig. Vaccine R code.

R used for generating the simulations with vaccination programmes. The R code is at https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/master/R/figS18_example_vaccination.R.

(TIFF)

S19 Fig. Offspring distribution.

The offspring distribution (in blue) and the sibling distribution (in grey). A negative-binomial fitted to the offspring distribution gives the estimate of k = 0.51. The inset is the cumulative sibling distribution against the cumulative offspring distribution and shows that the 70% of infections are generated by the top 20% of individuals.

(TIFF)

S20 Fig. Offspring distribution R code.

R script used for generating the offspring distribution is at https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/master/R/figS20-offspring-distribution.R.

(TIFF)

S21 Fig. Mean square error comparing simulated and observed data during the first wave of the COVID-19 epidemic in England across four data sources.

Simulations are of 56 million individuals, performed separately for a grid of values across a two-dimensional grid of 1) prevalence of SARS-CoV-2 at which lockdown was implemented (y axis), and 2) reduction in daily contacts during lockdown (x-axis). Surface has been interpolated from a grid of values. Transmission parameters (infectious_rate) fixed to assume a doubling time of approximately 3.5 days. Red dots highlight those parameter sets with the smallest 5% error with observed data. Observed data are from the UK Governments COVID19 dashboard and the UK’s Office of National Statistics (seroprevalence).

(TIF)

S22 Fig. Epidemic doubling time (in daily deaths) as a function of infectious_rate parameter.

Simulations of 56 million individuals using OpenABM-Covid19 across a range of values of the infectious_rate parameter (black dots) across a range from 3.5 to 8.5 in increments of 0.1. Each black dot is the slope of fitting a linear regression to log of daily simulated deaths (truncated to between the first 100 to 1000 deaths). The red line represents a fit of the form a(x-b)^c * exp(nu), where nu is a noise term, to these data. Each red dot gives the value of the parameter infectious_rate (in brackets) for a doubling time of 3 (7.1), 3.5 (5.8), and 4 (5.0) days respectively.

(TIF)

S1 Table. UK population stratified by age and UK households stratified by household size.

Data provided by the ONS. Parameter values match the OpenABM-Covid19 baseline parameters, April 25, 2021.

(CSV)

S2 Table. Average number of non-household interactions stratified by age.

Values shown are for random and occupational interactions for an individual in each age group per day from empirical estimates [16]. Parameter values match the OpenABM-Covid19 baseline parameters, April 25, 2020.

(CSV)

S3 Table. Occupational network parameters.

Mean numbers of daily occupational connections for members of each age group, fraction of adults involved in occupational networks for children and for elderly people, and rewiring parameters for randomisation of daily interactions. Parameter values match the OpenABM-Covid19 baseline parameters, April 25, 2020.

(CSV)

S4 Table. Parameters for numbers of random connections that members of each age group have per day.

Parameter values match the OpenABM-Covid19 baseline parameters, April 25, 2020.

(CSV)

S5 Table. Susceptibility by age.

(CSV)

S6 Table. Infection parameters.

The mean of the generation time distribution and the standard deviation of the infectious period were calculated from [7,4548]. Parameter values match the OpenABM-Covid19 baseline parameters, April 25, 2020.

(CSV)

S7 Table. Age-stratified infection and disease parameters.

Proportion of people in each stage of illness whose disease progresses further [Calibration of [49] & Spanish Serology Survey for fraction of asymptomatic and mild symptoms; Calibration of [27,49] & Spanish Serology Survey for fraction hospitalised; [27] for fraction of hospitalised that require critical care; [27,5052] for fatality fraction]; [53,54]. Parameter values match the OpenABM-Covid19 baseline parameters, April 25, 2020.

(CSV)

S8 Table. Parameters for waiting time distributions.

Mean and standard deviation for density functions of the times that each transition–disease progression or recovery–takes [29,51,55]. For the shape of the functions see S2 Fig. Parameter values match the OpenABM-Covid19 baseline parameters, April 25, 2020. * Personal communication with SPI-M; data soon to be published.

(CSV)

S9 Table. Smartphone usage stratified by age in the UK.

Data based on the Technology Tracker (fieldwork 9 Jan– 7 Mar 2020) and the Children’s Media Literacy tracker (fieldwork 25 April– 11 July 2019), data provided by the Office of Communications.

(CSV)

S10 Table. Parameters for hospitalisation and self-quarantine upon symptoms.

Parameter values match the OpenABM-Covid19 baseline parameters, April 25, 2021.

(CSV)

S11 Table. Parameters corresponding to testing and contact tracing.

Parameter values match the OpenABM-Covid19 baseline parameters, April 25, 2020.

(CSV)

S1 Video. Animation of a simulated outbreak.

Data from a simulated outbreak in a population of 1 million individuals with UK-like demographics and control interventions showing age-stratified histograms of individuals in each compartment. (https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/d01baf1ca160aec649ce52c26001d8721dfb6bf9/figures/fig_outbreak_animation.mp4) Arrangement of sub-panels are in the same arrangement as the model compartments (Fig 4).

(MP4)

Data Availability

All data generated by the model are available without restriction from the following repository - https://github.com/BDI-pathogens/OpenABM-Covid19 - as the analysis described in this paper is fully reproducible. The data used to parametrise the model are publicly available, with all sources stated and linked to in the manuscript and its Supporting Information files. The observed hospitalisation data for Fig 5 are from UK govt coronavirus dashboard available here: https://coronavirus.data.gov.uk/ The seroprevalence data for Fig 5 are from ONS: https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/articles/coronaviruscovid19infectionsurveyantibodydatafortheuk/3february2021.

Funding Statement

This work was funded by the Li Ka Shing Foundation (www.lksf.org), through an award to C.F. (funding the contributions of R.H, W.P., M.K., C.W., M.H., A.B.C., L.Z., A.S., L.F., D.B., L.A.D., and C.F.)and by research grant funding from the UK Department of Health and Social Care (DHSC), through an award to C.F. (funding the contributions of C.W. D.B. L.A.D., L.F., M.K., R.H. and C.F.). A.N. is funded by the ARTIC Network (Wellcome Trust Collaborators Award 206298/Z/17/Z). K.B., T.M., K.V.V. and D.F.B. were supported by the Francis Crick Institute which receives its core funding from Cancer Research UK (FC001751), the UK Medical Research Council (FC001751), and the Wellcome Trust (FC001751). For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1009146.r001

Decision Letter 0

Virginia E Pitzer, Benjamin Muir Althouse

25 Jan 2021

Dear Bulas Cruz,

Thank you very much for submitting your manuscript "OpenABM-Covid19 - an agent-based model for non-pharmaceutical interventions against COVID-19 including contact tracing" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Benjamin Muir Althouse

Associate Editor

PLOS Computational Biology

Virginia Pitzer

Deputy Editor-in-Chief

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The manuscrit describe a new computing agent-based code to simulate pandemic evolution, and to study some non-pharmaceutical interventions impact on this pandemic evolution. This approach allow to include some complex phenomena, such as heterogeneous social interactions, impact of physical distancing on the contaminations, contact tracing, social interaction in schools, at home, public transport, etc. This open-source code is writtend in C with a Python interface to allow various interactions between the user and the code, and for results display.

The authors are invited to add in the manuscrit some clarifications of the limitation of their model (~ line 391), and some precisions, as described bellow.

- the authors use full randomized contact network to simulate public transports (line 157). What is the impact of such approach on "real" contacts, where people have some habits ? Is a partially-randomized network feasible ?

- are the different interactions temporaly splitted to avoid overlap ? If no did you study the impact of the order of the different daily interaction computing on the evolution of the pandemy ?

- line 206 : 65 millions people simuled by aggregating 65 simulation of 1 million people. Are 65 non-interactive simulations similar than a simulation of 65 M agent where everybody can interact with everybody ?

- is an homogeneous spatialized distribution of people representative of real demographic distribution, where some regions are more dense than other ? (in particular for public transport and number of daily social interactions)

- what are the data used for model calibration ? Number of death ?

- line 270 : the lockdown is taken into account with a reduction of socials interactions of 71%. Is this value deduced to fit to the mesures or taken from a previous study ?

- line 283 : did you deduce the probability of transmission with and without physical distancing ?

- line 431 : every individual interact with each other every day. Same queestion than before : is this temporally splitted to avoid overlap contamination (A <-> B <-> C contribute each day of the contamination A <-> C) ?

- line 504 : is the 2 factor used is deduced to fit to the observed data or is this documented ?

- line 600 : is the code parallelized (OpenMP or MPI) ? Did he scales correctly (in term of computing time / number of agents, and in term of number of core used if paralelized) ?

- line 600 : 1.7 Gb memory for 1 million agents means ~ 400 simple precision (or ~ 200 double-precision) values used per agent without contact tracing. What are the main memory consuption to need so much RAM ?

- due to high stochastic characteristics of a pandemy evolution (due to the random contacts between agents, and the random contagion from an agent to another), did you study the reproductibility of the simulations, and the range of results obtained with exactly the same parameters ? Some others agent-based approach, which use fixed contact-network, can show a wide variation of the max number of infected people, only due to the random aspect of contagion at each social contact. A study on these variations needs to be added to the manuscrit (These effects are visible for initial conditions with low number of infected people). The results used for decisional help are they an average of many simulations, or only one-shot simulation, without any error bars due to the stochastics effects ? This clarification is imperative, and may need to balance the conclusions obtained by the simulation described in the manuscrit.

- what are the initial conditions ? Some infected at t = 0 ? How many ?

Reviewer #2: The paper describes the implementation of an ABM for simulating the spread of Covid-19 within a population of up to 1 million people, although as it is described this number could be increased. Spreading occurs through both static and dynamic contact networks using statistics on contact patterns, which addresses the inhomogeneous spread in different age-groups. Also, disease progression and asymptomatic cases are modelled. The strength of the presented simulation model is the inclusion of a range of different prevention measures in addition to the classic NPIs for contact reduction. As this is a core advantage of this implementation, systematic assessment and testing of intervention strategies should be performed and discussed more deeply.

• The authors state that the model is parameterized for the UK, however, the paper lacks information on the calibration and validation process and how well the model reproduces the historic epidemic curve and efficiency of actual NPIs.

• The model seems to be able to reproduce the characteristics of a certain (?) time interval of the COVID epidemic in the UK. Line 229 "The model provides a good fit to UK data, correctly matching: the cumulative number of deaths; the magnitude of the peak in daily deaths; the timing of the peak in daily deaths to within a few days; and peak hospitalisation to within 25% of the recorded number" This could be supported with additional quantification of errors/correspondence in the comparison between real data and simulation results. How was this correspondence achieved? How good is the quality of fit compared with other models? Several times in the text the authors mention that the model (and parameters) were calibrated, but a description or discussion of the calibration approaches is not found in the manuscript.

• The line "The model was run on a population of 65 million people with UK demographics, by aggregating 65 simulations of 1 million people. An infection was seeded and grew exponentially with a doubling time of 3.5 days." It should be described why does the model fits nevertheless? Based on the methods, the doubling time is a result of the simulation. This should be explained and why is it (?) constant?

• Paragraphs with important "messages" should be revised thoroughly. To give a more or less random example in line 600 ff "Performance. The ABM for 1 million individuals takes approximately 3s per day to run…" It is described in the abstract, but missing at this point for what scenario and which time range. Such shortcomings can be resolved easily and can improve the quality of the paper to a large extent.

• Evaluation of results of prevention measures is provided in supporting information only (see comments below). The analysis and assessment of interventions should be systematic with visual displays supporting a thorough discussion. In the current state, intervention scenarios are somehow treated as technical demonstrations.

• The abstract description of the model is well written and insightful, but technical details and modelling decisions are not or provided. The appendix is very useful and the parameters are well described.

• Even though established approaches in an agent-based simulation of epidemic spread and the state of the art (and its limitations as a motivation for the paper) should be recognized (if it is the goal of the paper to show the methodological improvement). The motivation for modelling decisions and assumptions should be discussed in the presentation of a simulation model. E.g. motivate the use of certain types of networks/graphs and probability distributions! What are the features of the specific mathematical concepts? Why are they suitable to model certain aspects?

• The paper lacks technical details on implementation for assessing whether the efficiency claims hold, however as it is open source it is possible to review the code itself.

• A technical overview of the implementation as a simulator/framework should be provided. E.g. what is an "object-oriented programming style"? To the knowledge of the reviewer, C is not an object-oriented programming language. What is the procedure for sampling a population? A description of the initialization phase would be helpful. How are individuals aggregated into households on a technical level? How are the networks sampled from data? This should be included despite the source code is publicly available and well documented.

• In the discussion, a kind of outlook is included, which should be described in more detail. A focus is set on hospitals, but big importance on epidemiological modeling will be set on the possibility to include also pharmaceutical interventions like vaccination

• How is the model dealing with "unreported cases"? As it is immanently clear, this aspect should be at least mentioned.

The simulation model is of high quality and shows also high potential. But at the current state, the paper is a conceptual presentation of a simulation model but does not include or present interesting results on one of the three areas 1) epidemiology, 2) HCI and Usability, or 3) technical novelty. The paper should increase focus towards one of those directions. If the model can be adapted as easily as claimed by the authors the model could provide a good framework for additional research for the assessment of NPIs if the model is correctly parameterized for the addressed research question.

Some of the figures in the manuscript look not very appealing. For instance, Fig1 could be supported by context if placed in the Methods section. The inclusion of parameters is important. All mathematical symbols should also be used in the parameter tables. It would be good to provide additional context to the parameter values with some of the supporting tables.

The model is fully documented and is thoroughly tested. The formal testing framework (mentioned in line 358) could be described in more detail e.g. line 589 "The model codebase includes over 200 tests used to validate the model." What was included in the tests? How was validation implemented?

Model descriptions and modelling process could be set concerning international standards or guidelines like "Modelling Good Research Practices of ISPOR-SMDM" (https://www.sciencedirect.com/science/article/pii/S109830151201652X) . For example "V-8 If using an agent-based model, thoroughly describe the rules governing the agents, the input parameter values, initial conditions, and all sub-models."

Reviewer #3: This paper presents a detailed description of the OpenABM-Covid19 model, which is an agent-based COVID-19 transmission model informed by detailed data on contact networks and validated against data from the UK. As a methods paper, it does not include results per se, but rather illustrates the analyses the model can be used for.

Overall, I found the paper to be exceptionally well written and clearly laid out. The model has been carefully conceived, and the use of modern software practices (testing, documentation, concern for adaptability, an easy-to-use Python user interface, etc.) make this study an exemplar for how such models should be developed and communicated. The following comments are mostly intended as nonbinding suggestions for improving the paper.

p. 4, line 56: This could be interpreted to mean that the population size fixed at 1 million, rather than that 1 million is the default.

p. 5, line 85: I'm not sure I understand how people who are contact-traced are themselves "protected" -- wouldn't they be contact traced following exposure to a known positive?

p. 6, line 96: Quite a few groups have developed COVID ABMs; while a comprehensive literature review is probably beyond the scope of the introduction, additional citations of influential ABMs might help the reader better understand the modeling landscape. The following are suggestions only:

* The Imperial model, which was influential in UK policy decisions (https://www.imperial.ac.uk/media/imperial-college/medicine/sph/ide/gida-fellowships/Imperial-College-COVID19-NPI-modelling-16-03-2020.pdf)

* Blakeley et al., which was influential in Australian policy decisions (https://www.mja.com.au/journal/2020/213/8/probability-6-week-lockdown-victoria-commencing-9-july-2020-achieving)

* Koo et al. (https://www.thelancet.com/journals/laninf/article/PIIS1473-3099(20)30162-6/fulltext)

* Aleta et al. (https://www.nature.com/articles/s41562-020-0931-9)

* Rockett et al. (https://www.nature.com/articles/s41591-020-1000-7)

In addition, the specific claims about OpenABM-Covid19 in comparison to other models may not be entirely accurate. Rockett et al. and Bicher et al. used 23 million and 9 million agents, respectively, which are larger populations than are typically used in OpenABM-Covid19. In addition, my understanding is that Covasim's computational efficiency is comparable to OpenABM-Covid19 (1 second per 2 million person-days; see Fig. S6 of https://www.medrxiv.org/content/10.1101/2020.07.15.20154765v4.full.pdf), and has also been designed with extensibility and ease of development in mind (e.g. 100 forks on GitHub). The authors are encouraged to rephrase in a way that emphasizes the strengths of OpenABM-Covid19 while also acknowledging the strengths (and weaknesses!) of other models. (Disclosure: I am one of the authors of Covasim.)

p. 7, line 111: While I can see a (strong) argument for community and perhaps workplace contacts to be drawn from a negative binomial distribution, is this true of household and school contacts as well? We have found that overdispersion in infectiousness, rather than overdispersion in number of contacts, is the most important factor for driving superspreading events. If in your model the latter are alone sufficient to account for the observed distribution of secondary infections, this is an interesting finding!

p. 8, line 139: Out of curiosity, is there any reason why this version cannot be considered 1.0? The codebase seems mature, tested, and documented, and the bulk of development seems to have been completed >6 months ago, which would seem to exceed the threshold for a 1.0 release in most contexts. (Very minor point: the Python package installs as version 0.2, not 0.3.) While acknowledging that the software practices are light-years ahead of most models, I did find myself wishing for a changelog, at least for backwards-incompatible changes (i.e. if the same parameters would no longer run, or would give a different result).

p. 10, Figure 2:

1. For Fig. 2A, showing the three distributions separately might be easier to read (as is, it looks like some of them are negative).

2. I am also surprised at how low the number "random" contacts are -- I assume this does not count the 50+ people one would be in "contact" with on the metro or at a grocery store.

3. I didn't realize until getting to the Methods section that school students were included in the "occupation" network. This surprised me since school class sizes and workplace sizes tend to have fairly different distributions (the former having a larger mean and smaller variance than the latter). In addition, given the central importance of school closures as a COVID policy measure, including school networks explicitly would seem to be desirable. Unless this can be added quickly, it could be noted as a limitation of the model.

p. 11, line 208: Typo, "day"

p. 11, Fig. 4: Perhaps a few words could be said about how the model-derived IFR compares to empirical estimates, e.g. Ferguson et al. (which seems like it was used for some of the input parameters), O'Driscoll et al. (https://www.nature.com/articles/s41586-020-2918-0), and/or Brazeau et al. (https://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/covid-19/report-34-ifr). By eye at least, the match looks quite good (i.e., at least as consistent as these estimates are with each other), which is nice evidence for model validation.

p. 13, Fig. 5:

1. What process, if any, was used for calibrating the parameters to produce these outputs? It seems some relatively straightforward tuning would be able to produce a better fit (e.g., lowering the infection hospitalization rate; increasing the hospitalization mortality rate). But if no tuning was done at all to parameters, the fact that the parameter "priors" produce such a good fit is worth highlighting.

2. X-axis labels are missing -- I assume this is days? If actual date labels could be used, it would make it easier to read.

3. It would be interesting to know how well the model can fit the UK's 2nd and 3rd waves, but one can easily get lost in an infinite spiral of fitting the model to the latest data, so I would not consider doing this a requirement.

4. Does the model produce estimates of diagnoses, and if so, would it be possible to see the projections for these as well?

5. Some indication of uncertainty would be valuable -- stochastic uncertainty if not parametric uncertainty. I understand that this will be highly dependent on the number of seed infections used, however.

6. The commit hash mentioned here does not match the previous one, even though this figure seems to be the central result of the paper. Does the previously mentioned commit hash refer to the IFR results? Would one get different results if running with a different commit hash (e.g., 536adae at the time of writing)?

p. 18, line 365: If I'm not mistaken, it has recently become possible to run from R as well as Python? (Although this seems less well documented.) This might be worth mentioning given the large number of epidemiologists who use R.

p. 20, line 394: It is also probably worth mentioning pharmaceutical interventions, since I see vaccination has also been recently implemented in the model.

p. 20, line 403: I was surprised there was no mention in this paper of the contexts and applications OpenABM-Covid19 has been used for. I feel silly saying this, but citing https://www.medrxiv.org/content/10.1101/2020.08.29.20184135v1 would seem to be needed at minimum!

p. 23, line 468: It sounds like some individuals are assigned a high daily number of contacts and persist with that number of contacts. It may be more realistic to redraw the number of contacts per person each day as well since superspreading events tend to happen at non-daily venues (e.g. churches, restaurants). However, this is unlikely to make much difference to the results. In addition, it would be interesting to see the distribution of primary vs. secondary cases, such how well the model matches the data that e.g. ~20% of people are responsible for ~80% of transmissions.

p. 28, line 571: I am curious to know why it's claimed that the approach is object-oriented -- C is not generally considered an object-oriented programming language (https://softwareengineering.stackexchange.com/questions/113533/why-is-c-not-considered-an-object-oriented-language). I see structs being used extensively to handle data of different types, but I don't see much use of pointers to functions being used to emulate class methods, for example, and as such it looks a bit more functional to me. (Of course, one could "just" recode the whole thing in C++!)

p. 28, line 579: It might help to explain what SWIG is (I had to google it).

p. 29, line 603: Is it possible to disable rebuilding the daily interaction network? I imagine this would increase performance by perhaps an order of magnitude, and can approximated as a larger network with lower transmission probabilities (i.e., 10 new contacts per day for 5 days each with a 1% transmission probability is virtually identical to 50 static contacts for 5 days with an 0.2% transmission probability -- given the wide uncertainty bounds of how many contacts should exist in the first place).

p. 52, line 814: Perhaps consider a protected branch instead of a commit hash, e.g. https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/ploscb/notebooks/example_digital_contact_tracing.ipynb

-- Cliff Kerr

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Cliff Kerr

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions, please see http://journals.plos.org/compbiol/s/submission-guidelines#loc-materials-and-methods

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1009146.r003

Decision Letter 1

Virginia E Pitzer, Benjamin Muir Althouse

4 Jun 2021

Dear Bulas Cruz,

We are pleased to inform you that your manuscript 'OpenABM-Covid19 - an agent-based model for non-pharmaceutical interventions against COVID-19 including contact tracing' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Benjamin Muir Althouse

Associate Editor

PLOS Computational Biology

Virginia Pitzer

Deputy Editor-in-Chief

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The new version answer all my questions.

Reviewer #2: Thank you very much for dealing with our comments. I think all important aspects have been adressed and changes have been implemented.

Reviewer #3: The authors have done an excellent job responding to all points. I was especially pleased to see the changes related to the code, including the performance section and the vaccination interventions.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Cliff Kerr

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1009146.r004

Acceptance letter

Virginia E Pitzer, Benjamin Muir Althouse

8 Jul 2021

PCOMPBIOL-D-20-02225R1

OpenABM-Covid19 - an agent-based model for non-pharmaceutical interventions against COVID-19 including contact tracing

Dear Dr Bulas Cruz,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Katalin Szabo

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Proportion of each age group ever infected, ever hospitalised, or deceased.

    Simulations are from the end of a single simulation in a population of 1 million individuals with UK-like demographics and control interventions. The simulation was run for 77 days after lockdown started. The denominator in each calculation is the number of individuals in each age group in the total population (e.g. of the total population, the middle panel shows the proportion of each age group that was hospitalised in the simulation).

    (TIF)

    S2 Fig. Age-stratified hospital admissions, ICU admissions, and deaths.

    Data are from a single simulation of 1 million individuals with UK-like demographics and control interventions. The simulation was run for 77 days after lockdown started. The denominator is the number of individuals ever having been in the state in question (e.g. of all simulated hospitalisations, the top panel shows the distribution of these by age).

    (TIF)

    S3 Fig. Waiting time distributions for transitions between infection and disease states.

    All distributions are gamma except time to hospital which is a shifted Bernoulli distribution. Mean of each gamma distribution is shown with a vertical dashed line.

    (TIF)

    S4 Fig. Smartphone usage by age in the UK.

    (TIF)

    S5 Fig. Example notebook of self-isolation on symptoms.

    Code for this notebook is provided at https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/master/notebooks/example_self_isolation.ipynb.

    (TIFF)

    S6 Fig. Example notebook of a lockdown.

    The Python code used for this simulation, the code is also at https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/master/notebooks/example_lockdown.ipynb.

    (TIFF)

    S7 Fig. Example of lockdown followed by shielding.

    Code for this notebook is provided at https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/master/notebooks/example_lockdown_shield.ipynb.

    (TIFF)

    S8 Fig. Example lockdown followed by social distancing.

    Code for this notebook is provided at https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/master/notebooks/example_lockdown_social_distance.ipynb.

    (TIFF)

    S9 Fig. Example of self-isolation after testing.

    Code for this notebook is provided at https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/master/notebooks/example_testing.ipynb.

    (TIFF)

    S10 Fig. Example of digital contact tracing.

    Code for this notebooks is provided at https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/master/notebooks/example_digital_contact_tracing.ipynb.

    (TIFF)

    S11 Fig. Example app user protection calculation.

    Code for this notebook is provided at https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/master/notebooks/example_digital_contact_tracing_protect.ipynb.

    (TIFF)

    S12 Fig. Example of manual contact tracing.

    Code for this notebook is provided at https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/master/notebooks/example_manual_contact_tracing.ipynb.

    (TIFF)

    S13 Fig. Reproduction number.

    Data from a single simulated outbreak with R calculated using the complete simulated transmission tree (actual) or using the time series (instantaneous). Simulation data are for a single simulation in a population of 1 million individuals with UK-like demographics. The vertical dashed mark where interventions were introduced (self-isolation on symptoms followed by lockdown), note that Ractual is reduced prior to the introduction of each intervention.

    (TIF)

    S14 Fig. Example adding a user defined network.

    R script demonstrating how to add a user specified network. The R code is at: https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/master/R/figS14_example_add_network.R.

    (TIFF)

    S15 Fig. Performance.

    A. The computation time per day for different size populations. The default networks are dynamic and are rebuilt each day, whereas the static networks are not changed after the first day. The meta-population model was run on a quad-core processor. B. The required memory for a simulation which is linear in population.

    (TIFF)

    S16 Fig. Scaling and stochastic variation.

    Simulations for epidemics were run for different population sizes and split into different numbers of equal sub-populations in meta-models (zero case migration). Initially the epidemic was seeded with 0.05% infections and an uncontrolled epidemic was allowed to develop for 100 days, with approximately no new infections at the end. A minimum of 20k people was required in each subpopulation in order for there to be sufficient seed infections to prevent a stochastic extinction at the start. Each simulated epidemic was characterised by 2 basic statistics: the doubling time in days to go from 1% to 2% of the population infected; and the total fraction of the population infected. Each configuration was run 10 times and the figure is a box plot of the results, with the number of subpopulations shown as separate colours. The simulations show that the mean doubling time and fraction infected are roughly independent of the total population size. The stochastic variation is determined by the total population and is independent of the number of subpopulations. With a total population of at least 1 million people, the stochastic variation in the doubling time was <0.2 days and in the total number infected was <0.5%.

    (TIFF)

    S17 Fig. Vaccine result.

    A simulation of a vaccine programme implemented after a lockdown period to control the epidemic. The epidemic was allowed to grow until 2% of the population had been infected, at which point a lockdown was implemented for 30 days along with a vaccination programme where 2% of adults were inoculated each day (vaccine 90% effective after 15-days). The figure compares the total deaths and infections for a vaccine which offers full protection from symptoms, to one which only offers protection from symptoms and to no vaccine programme. The R code for generating this figure is at https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/master/R/figS17_vaccine.R.

    (TIFF)

    S18 Fig. Vaccine R code.

    R used for generating the simulations with vaccination programmes. The R code is at https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/master/R/figS18_example_vaccination.R.

    (TIFF)

    S19 Fig. Offspring distribution.

    The offspring distribution (in blue) and the sibling distribution (in grey). A negative-binomial fitted to the offspring distribution gives the estimate of k = 0.51. The inset is the cumulative sibling distribution against the cumulative offspring distribution and shows that the 70% of infections are generated by the top 20% of individuals.

    (TIFF)

    S20 Fig. Offspring distribution R code.

    R script used for generating the offspring distribution is at https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/master/R/figS20-offspring-distribution.R.

    (TIFF)

    S21 Fig. Mean square error comparing simulated and observed data during the first wave of the COVID-19 epidemic in England across four data sources.

    Simulations are of 56 million individuals, performed separately for a grid of values across a two-dimensional grid of 1) prevalence of SARS-CoV-2 at which lockdown was implemented (y axis), and 2) reduction in daily contacts during lockdown (x-axis). Surface has been interpolated from a grid of values. Transmission parameters (infectious_rate) fixed to assume a doubling time of approximately 3.5 days. Red dots highlight those parameter sets with the smallest 5% error with observed data. Observed data are from the UK Governments COVID19 dashboard and the UK’s Office of National Statistics (seroprevalence).

    (TIF)

    S22 Fig. Epidemic doubling time (in daily deaths) as a function of infectious_rate parameter.

    Simulations of 56 million individuals using OpenABM-Covid19 across a range of values of the infectious_rate parameter (black dots) across a range from 3.5 to 8.5 in increments of 0.1. Each black dot is the slope of fitting a linear regression to log of daily simulated deaths (truncated to between the first 100 to 1000 deaths). The red line represents a fit of the form a(x-b)^c * exp(nu), where nu is a noise term, to these data. Each red dot gives the value of the parameter infectious_rate (in brackets) for a doubling time of 3 (7.1), 3.5 (5.8), and 4 (5.0) days respectively.

    (TIF)

    S1 Table. UK population stratified by age and UK households stratified by household size.

    Data provided by the ONS. Parameter values match the OpenABM-Covid19 baseline parameters, April 25, 2021.

    (CSV)

    S2 Table. Average number of non-household interactions stratified by age.

    Values shown are for random and occupational interactions for an individual in each age group per day from empirical estimates [16]. Parameter values match the OpenABM-Covid19 baseline parameters, April 25, 2020.

    (CSV)

    S3 Table. Occupational network parameters.

    Mean numbers of daily occupational connections for members of each age group, fraction of adults involved in occupational networks for children and for elderly people, and rewiring parameters for randomisation of daily interactions. Parameter values match the OpenABM-Covid19 baseline parameters, April 25, 2020.

    (CSV)

    S4 Table. Parameters for numbers of random connections that members of each age group have per day.

    Parameter values match the OpenABM-Covid19 baseline parameters, April 25, 2020.

    (CSV)

    S5 Table. Susceptibility by age.

    (CSV)

    S6 Table. Infection parameters.

    The mean of the generation time distribution and the standard deviation of the infectious period were calculated from [7,4548]. Parameter values match the OpenABM-Covid19 baseline parameters, April 25, 2020.

    (CSV)

    S7 Table. Age-stratified infection and disease parameters.

    Proportion of people in each stage of illness whose disease progresses further [Calibration of [49] & Spanish Serology Survey for fraction of asymptomatic and mild symptoms; Calibration of [27,49] & Spanish Serology Survey for fraction hospitalised; [27] for fraction of hospitalised that require critical care; [27,5052] for fatality fraction]; [53,54]. Parameter values match the OpenABM-Covid19 baseline parameters, April 25, 2020.

    (CSV)

    S8 Table. Parameters for waiting time distributions.

    Mean and standard deviation for density functions of the times that each transition–disease progression or recovery–takes [29,51,55]. For the shape of the functions see S2 Fig. Parameter values match the OpenABM-Covid19 baseline parameters, April 25, 2020. * Personal communication with SPI-M; data soon to be published.

    (CSV)

    S9 Table. Smartphone usage stratified by age in the UK.

    Data based on the Technology Tracker (fieldwork 9 Jan– 7 Mar 2020) and the Children’s Media Literacy tracker (fieldwork 25 April– 11 July 2019), data provided by the Office of Communications.

    (CSV)

    S10 Table. Parameters for hospitalisation and self-quarantine upon symptoms.

    Parameter values match the OpenABM-Covid19 baseline parameters, April 25, 2021.

    (CSV)

    S11 Table. Parameters corresponding to testing and contact tracing.

    Parameter values match the OpenABM-Covid19 baseline parameters, April 25, 2020.

    (CSV)

    S1 Video. Animation of a simulated outbreak.

    Data from a simulated outbreak in a population of 1 million individuals with UK-like demographics and control interventions showing age-stratified histograms of individuals in each compartment. (https://github.com/BDI-pathogens/OpenABM-Covid19-model-paper/blob/d01baf1ca160aec649ce52c26001d8721dfb6bf9/figures/fig_outbreak_animation.mp4) Arrangement of sub-panels are in the same arrangement as the model compartments (Fig 4).

    (MP4)

    Attachment

    Submitted filename: OpenABM PLOS CB Response to Reviewers.pdf

    Data Availability Statement

    All data generated by the model are available without restriction from the following repository - https://github.com/BDI-pathogens/OpenABM-Covid19 - as the analysis described in this paper is fully reproducible. The data used to parametrise the model are publicly available, with all sources stated and linked to in the manuscript and its Supporting Information files. The observed hospitalisation data for Fig 5 are from UK govt coronavirus dashboard available here: https://coronavirus.data.gov.uk/ The seroprevalence data for Fig 5 are from ONS: https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/articles/coronaviruscovid19infectionsurveyantibodydatafortheuk/3february2021.


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES