Abstract
Edge-based network models, especially those based on bond percolation methods, can be used to model disease transmission on complex networks and accommodate social heterogeneity while keeping tractability. Here we present an application of an edge-based network model to the spread of syphilis in the Kingston, Frontenac and Lennox & Addington region of southeastern Ontario, Canada. We compared the results of using a network-based susceptible-infectious-recovered (SIR) model to those generated from using a traditional mass action SIR model. We found that the network model yields very different predictions, including a much lower estimate of the final epidemic size. We also used the network model to estimate the potential impact of introducing a rapid syphilis point of care test and treatment intervention strategy that has recently been implemented by the public health unit to mitigate syphilis transmission.
Keywords: epidemiology, percolation, syphilis
1. Introduction
1.1. Syphilis
Syphilis is a sexually transmitted and blood-borne infection (STBBI) caused by the bacterium Treponema pallidum subspecies pallidum [1,2].
Despite record low syphilis rates in the 1990s, the infection has resurfaced as a growing public health issue in Canada, with a 109% increase in reported cases from 2018 to 2022 [3]. In Ontario, the largest province by population, the number of syphilis cases between 2013 and 2022 grew from 3.9 to 23.6 cases per 100 000 people [4]. Women aged 15 to 39 were identified as a growing at-risk population [4,5], which has also led to congenital syphilis cases increasing, going from six cases in 2021 to 26 cases in 2022 [5]. A significant proportion of syphilis infections have been associated with underserved groups with less access to medical resources and increased risk factors.
The Kingston, Frontenac and Lennox & Addington Public Health (KFL&A PH) unit reported a syphilis outbreak in December 2022 when the rates of infectious syphilis (41.2 per 100 000) surpassed the provincial average (23.6 per 100 000) [4].
Congenital syphilis occurs when a pregnant woman with untreated syphilis passes the infection to her baby during pregnancy or at birth. This can result in severe adverse pregnancy outcomes including miscarriage, stillbirth and infant death shortly after birth. Infected infants can also be born seemingly healthy but later on develop serious problems including nerve damage and hearing loss [5]. Five congenital cases (2.8 cases per 1000 live births) were also reported to KFL&A PH, which is one of the highest rates within the 34 public health units in Ontario. Such rapid increase in congenital syphilis also raises the concern about syphilis prevalence and increases the significance of controlling the outbreak.
Typically, a one-time intramuscular injection of benzathine penicillin G. can effectively treat early syphilis infections (less than 1-year duration) [6,7]. Left untreated, syphilis can have very serious health consequences and move through the following four stages:
(i) primary stage: symptoms may include swollen glands and one or more painless chancres [8], typically appear around three weeks after infection. This stage usually lasts 9−90 days [9,10] and the chancre heals regardless of treatment [11];
(ii) secondary stage: symptoms include fever, headache, patchy hair loss, swollen glands in the groin or neck, condylomata lata and rashes. These symptoms can develop 4−10 weeks after exposure and then symptoms last for 3−12 weeks [9,10];
-
(iii) latent stage: there are no noticeable symptoms. This can last for up to 20 years. Cases in this stage can be further separated into two sub-classes based on infectious duration:
-
–
early latent stage: no symptoms but infected with syphilis for less than 12 months. At this stage the disease can still be transmitted to others [8]; and
-
–
late latent stage: no symptoms but infected for more than 12 months. Late latent stage syphilis is typically not transmissible to others [8,11];
in practice, diagnoses of early versus late latent stages are determined based on serological testing and clinical input; and
-
–
(iv) tertiary stage: syphilis invades significant organ systems like the brain, heart and blood vessels. Untreated tertiary syphilis can eventually lead to death.
Neurosyphilis occurs when syphilis infection reaches the central nervous system, and can occur at any stage of the disease [8,11]. The typical duration of each stage is based on the literature and consultation with local public health professionals. These ranges are important for our modelling work and are summarized in figure 1.
Figure 1.
Time frame of syphilis stages after infection. The upper and lower bounds of each interval is the maximum and minimum durations we found in our review of the literature [6–11]. The crosses represent the typical observed duration of each stage, determined from consultation with local public health professionals, which we use as the expected duration of each stage in our modelling work. For neurosyphilis, the typical value is taken to be at the maximum duration of interest of our project, which is 260 weeks (about 5 years).
As part of a strategic response to this outbreak, KFL&A PH implemented a rapid test and treat protocol using a recently approved point-of-care test (POCT) in Canada. This public health intervention allowed trained healthcare providers to screen for syphilis infections in conjunction with existing outreach services and immediately treat persons who screened positive for syphilis. The goal was to reduce the time between traditional test and treat with serology, and ultimately reducing the number of people who become lost to follow-up. More details on the intervention are available in Mackrell et al. [10].
The impact of this rapid test and treat intervention will be important to inform public health policy in areas experiencing similar outbreaks. Here we present a method to model syphilis transmission and estimate the impact of this intervention using edge-based percolation techniques.
1.2. Percolation network
Mathematical models of infectious disease spread are often based on the classical susceptible-infectious-recovered (SIR) framework originally introduced by Kermack & McKendrick [12]. Many implementations of the SIR models come with an assumption of homogeneous mixing of individuals in the population yielding a ‘mass action’ (MA) term in the resulting system of ordinary differential equations. We refer to these traditional SIR models with a MA assumption as MA-SIR models. These have been widely used for many types of diseases, since this assumption keeps the dynamics of the model tractable for analysis. However, the MA assumption can be problematic for modelling sexually transmitted infections where assuming homogeneous mixing can cause overestimates of the actual outbreak size. There have been many studies that attempt to eliminate the MA assumption without significantly sacrificing analytical tractability. Newman et al. [13] developed the basic network percolation model for epidemiology based on random simple graph theory introduced by Bollobás [14]. For these models, vertices in a network represent individuals in the community, and edges between vertices represent potentially disease-causing connections. The number of edges connected to each vertex is called the degree of the vertex. For a sexually transmitted infection, the relevant network would be the sexual contact network and degrees would represent the number of sexual partners. The social structure is represented by degree sequences, and predictions can be made on disease spreading by computing the expectation among all possible graphs with such degree sequences. In further work, Newman [15] generalized and optimized the method using the bond percolation model by Frisch & Hammersley [16] and Grassberger [17]. We refer to this generalized method as the typical percolation method.
Percolation theory can be applied to model outbreaks on epidemic networks with generalized degree sequences that reflect more diversity in social structures among different communities. Moreover, these methods also provide an expression for transmissibility, and allows the model to work with a distribution of key epidemiological parameters. Even under such generalization, the method manages to retain a relatively simple form for analysis and forecasting. This method has gained a lot of attention as one of the most significant network approaches for epidemiology models. It has also been modified to fit more complicated assumptions, such as partial immunity [18].
The typical bond percolation method has two significant limitations: it lacks dynamical information about the outbreak, and it can only be applied to static configuration networks without network changes. However, its idea of using percolation theory inspired many other approaches for modelling epidemics in networks. One of the most well-known is the method first introduced as a pair-approximation model by Keeling [19]. This model was then generalized by Volz & Meyers [20], and further formalized by Miller et al. [21]. We refer to the Miller et al. [21] model as the modified percolation model. This constructs a connection between the network patterns and a relatively tractable dynamical system, which provides dynamical information about the outbreak. In a previous paper, we proved that the results are equivalent to Newman’s method on configuration networks [22], and can be applied to some networks with evolutionary changes with reasonable tractability. More details about the theoretical background, definition, assumptions of the edge-based model and the connection between typical and modified percolation models can be found in the author’s previous work [22].
To model syphilis transmission, we use the modified percolation model with an SIR-type disease progression. In this paper, we refer to this as the network-SIR model. We compare the predictions provided by the network-SIR model with those from more traditional MA-SIR models.
2. Data
We obtained KFL&A PH syphilis case data collected from January 2019 to December 2023, which included 306 reported cases, with information on the date of encounter/diagnoses, age at the time of encounter, syphilis stage, medical, social and environmental risk factors, and contacts with potential for transmission in the past six months (P6M) provided by cases. The reported cases consisted of 300 adult cases and six newborn cases. As we can see in figure 2a, among the adults, 237 (79.0%) of them provided both the number of transmissible contacts in the P6M with names, four (1.3%) provided only their number of contacts and 59 (19.7%) did not provide any information on their contacts. Figure 2b shows the stage distribution of syphilis among reported adult cases. Moreover, figure 2c illustrates the proportion that reported reinfection of syphilis, with 289 (96.3%) of the cases having had syphilis once during the monitoring period, seven (2.3%) twice, two (0.7%) three times and two more (0.7%) only reporting having it once during the monitoring period but with a history of syphilis before monitoring.
Figure 2.
Proportions of adult case data.
The visualized data of the time series of the number of reported syphilis cases is shown later in figure 8 in §5, together with the trajectories from the fitted models. Since the KFL&A PH unit implemented the rapid test and treat protocol in June 2023, which possibly resulted in a change in both the reporting probability of infected individuals and recovery rate (owing to shorter time to treatment), we only use the first 116 bi-weeks (terminating on 29 May 2023) for model fitting. Throughout this paper we set time zero to be 1 January 2019 and plot time in units of bi-weeks.
3. Model set-up
We constructed a network-SIR model, as described in §1.2, to evaluate syphilis transmission within the underserved population in the KFL&A area. This model serves as the base for understanding the transmission dynamics of the disease and its interaction with the community structure before POCT implementation. The construction of this base model consisted of two phases: the first is to generate representative random networks, characterized by their size and degree distribution, of the community. This will be discussed in §§3.1−3.2. The second phase is the use of the modified percolation process discussed in §1.2 and fitting this to the time series data on syphilis cases. These results will be presented in §3.3. The results derived using a network-SIR model were compared against those from a traditional MA-SIR model in §6.
3.1. Estimating the size of the target population
The KFL&A PH unit identified the underserved communities of people experiencing homelessness or having housing instability, and people using drugs as a high-risk population. Even though individual-level data on each case’s social and economic status is not routinely collected, this correlation is evident from the high incidence of related risk factors. Therefore, we constructed the network-SIR model based on the collected data from the cases’ reported contacts. We supplemented this data with information from a literature review and consultation with local public health experts.
To estimate the size of the relevant network of individuals at risk of syphilis infection within the KFL&A area, we made the following considerations: the size of the population served by KFL&A PH was 214 513 in 2023. However, it is clear that only a subset of that population is at an elevated risk of syphilis transmission. We refer to this subpopulation as the target population. From consultation with KFL&A PH, we make the assumptions that the individuals in this target population have the following characteristics:
(i) financially or socially constrained, so they have reduced access to public health services; and
(ii) have at least one direct or indirect potentially disease-causing connection with other members of the at-risk population.
Since we expect that the majority of the transmission in KFL&A is via sexual contact, one way to incorporate both (i) and (ii) is to estimate the number of individuals in the financially or socially constrained subpopulation that are also sexually active. Specifically, to take into account assumption (i), we restrict to the fraction of population with low-income measure after tax (LIM-AT) as defined by Statistics Canada [23]. We took assumption (ii) into account by restricting to individuals between ages 18 to 64 since this is considered to be an age range that includes most sexually active individuals. This choice is supported by the local and provincial data [4] as shown in the electronic supplementary material, figure S2, which also shows the concentration of syphilis cases in these age ranges.
Based on economic data and demographic data, approximately 10.6% of the population qualify as low income (using LIM-AT) and are between the ages 18 to 64. KFL&A PH also noted a small spillover of syphilis transmission to the local medium/high-income community so we assumed that the target population is slightly larger, at 12.0% of the population, yielding approximately 26 000 individuals. Thus, we set the size of the network N = 26 000.
We note that age and economic status were only used to estimate the target population size. The random network model is originally built on networks with infinite size, and all predictions are given as proportions of population. Therefore the population size is only required for fitting the model to the data (mainly affecting estimates of the reporting probability), but has limited impact on the dynamical behaviour of the model.
For further simplicity and tractability of the model equations, we need to make several more assumptions about the target population. To make these assumptions explicit we clearly label them as (H1)–(H10). The first assumption is given below.
(H1) No individual enters or leaves the target population during the time duration of interest.
Assumption (H1) is a strong assumption to impose on the model, but it is necessary for a simple approach to our first pass at fitting a network-SIR model to the data. Under assumption (H1) there is no need to consider birth, death and immigration. The size of the target population is assumed to be constant. The general difficulty in tracking changes in relationships among the underserved community, owing to many issues such as the lack of permanent addresses, avoidance of administrative services, etc., makes (H1) a necessary assumption at this point. It is also not an unreasonable assumption owing to the following reasons:
(i) we fitted the data over a relatively short time period (2019−2023) so that the change in the population owing to births and natural death is relatively small; and
(ii) syphilis is curable and has a low direct death rate, especially in the age group we are looking at, so the impact of death owing to disease is limited.
3.2. Degree distribution
The next step to build the network-SIR model is to construct a degree distribution of the target population. To do this, we require a few more simplifying assumptions.
(H2) For each vertex, its degree is the number of other vertices that could have had transmissible contact with it during the whole duration of interest.
We made this assumption because of the absence of detailed contact tracing data. Without any information on duration of contacts, we cannot properly model changes in the networks during the time period that we are interested in. Thus, by making assumption (H2), we assume that the edges in networks reflect all transmissible contacts within the whole duration of interest, regardless of the duration of each contact, such that the edges and network will be invariant with respect to time. Together, assumptions (H1)–(H2) allows us to construct a static model with no changes of vertices and edges once formed, which further allows us to directly apply the configuration network model used by Miller et al. [21] as discussed in §1.2. It also makes the base model more comparable with MA-SIR models, where contact rates are also assumed to be invariant with respect to time.
The degree distribution of the network is assumed to be the distribution of the number of sexual contacts for the target population. Since edge-based percolation techniques are based on degree distributions instead of any specific network structure, we do not need the actual social structure of the community. We only need to determine the degree distribution. In the absence of other data on this, we decided to treat the reported cases as a random sample of the target population. While this would clearly be a biased sample of the general population, the bias is reduced if we only consider the target population.
(H3) The cases in the data who reported their contacts represent an unbiased random sample from the target population.
We acknowledge the limitations caused by this assumption. Since our data only contains contact information of infected and reported cases, (H3) would clearly make our target population network more close to the notion of ‘core groups’ [24]. Member of core groups have higher risk of being infected and are more likely to have higher degrees than a random individual of the general population. Therefore, this could lead to an overestimate to the connectivity of the network and to consequentially the outbreak size. However, we note that in the literature dealing with network models [24–26], improvements to assumption (H3) usually requires extra knowledge of the background population, which in turn requires a much more comprehensive survey of the population, contact tracing data and information on the specific community population. Owing to limitations in data, time and resources, this is not available to us for now, and we need to perform our analysis of the outbreak given the limited information available. We acknowledge that (H3) is likely to lead to an overestimate of the degree distribution, and therefore also of the epidemic size. However, we will see later in §6, that even with the potential overestimation, the application of our edge-based network model still provides a more reasonable estimation and prediction closer to the real observation than the more traditionally used MA-SIR models.
Our data includes 237 cases with a contact list within P6M of diagnosis. It also includes four cases who were only able to provide their estimated numbers of sexual contacts in the P6M. The rest of the cases did not provide information on their contacts. The data also contains the risk factors associated with each case, including sexual, behavioural, medical, pregnancy-related, financial and social status. This list includes factors that are commonly considered to have a significant impact on the risk of syphilis infection by provincial public health [4] and additional information KFL&A PH determined important given the local context. A complete list of risk factors is shown in the electronic supplementary material, table S2 and these factors are grouped into 10 aggregate risk classes based on type.
The degree distribution could be based directly on the reported numbers of P6M sexual contacts from the data. However, upon discussion we decided that this is likely to underestimate the degrees owing to the following reasons:
(i) stigma is widely recognized as a significant barrier to STBBI prevention, diagnosis and treatment [27]. Even those who provided information on their contacts might not be willing to reveal all of their contacts for fear of identification if public health notifies the contact of the risk of syphilis;
(ii) based on the high frequency of cases reporting anonymous sex, sex trade and judgement impaired due to alcohol or drugs as risk factors, it is possible diagnosed individuals might not be able to remember or identify all of their contacts;
(iii) the contacts listed within P6M may not reflect all the transmissible contacts; and
(iv) there were multiple individuals with risk factors usually associated with high numbers of sexual contacts (e.g. sex trade workers) who reported much lower numbers of contacts than would be expected based on the literature.
We adjusted up the reported number of sexual contacts of each case using their ages and reported risk factors. The details of the adjustment are provided in the electronic supplementary material, S4. A comparison of the original degree sequence and the adjusted degree sequence is shown in the electronic supplementary material, figure S1.
Studies based on empirical research [28–30] suggest that the degree distributions of sexual networks generally follow a power-law scale-free type of distribution. In such a distribution, for any randomly chosen vertex in the network, its degree equals to with probability
(3.1) |
for some constant . We refer to this type of distribution as a power-law distribution. Networks with this distribution are famous for its ‘small-world’ [31] and ‘the rich get richer’ [29] effect, which are the small average path lengths between nodes and small amount of vertices have high degree while the majority of vertices have low degree. This characteristic is also evident in our observations from the collected data as in the electronic supplementary material, figure S1: more than cases would have degree less than even after modification adjustment, while two cases have degree larger than . Thus, we take the following assumption for the degree distribution.
(H4) The degree distribution of the target population follows a power-law distribution.
However, please note that in the electronic supplementary material, S3, we also justify the usage of power-law distribution by comparing its fit with negative binomial and geometric distribution, where the power-law distribution provides largest maximum likelihood when fitting with observed data.
The adjusted degree sequence was fitted to a power-law distribution using the fit_power_law function from the igraph R package, following the algorithm developed by Clauset et al. [32]. The fitting resulted in a discrete power-law distribution with , which was also capped at maximum degree and normalized. We denote for as the normalized probability mass function (PMF) for the degree distribution, satisfying equation (3.1). The PMF of the fitted distribution is presented in figure 3 together with the frequency probability distribution of the adjusted data sequence used for fitting.
Figure 3.
Comparison between the adjusted sequence and fitted distribution. For better illustration (a) only shows probability from degree 1 to degree 50. A logarithmic scale comparison for all degrees no less than is given in (b) to illustrate the fit of the power-law degree distribution.
3.3. Epidemiological assumptions
Individual differences on epidemiology parameters, like recovery rate and per-infection transmission rate, can be averaged out since the percolation model considers accounts for randomness in networks (see Zhao & Magpantay [22] for details). Thus we assume the following.
(H5) There is no individual differences between vertices other than their degree.
As a result, all the vertices in the network are homogeneous on disease transmission dynamics, so the network is a single layer when considering dynamics on the compartment level. This allows us to use uniform constant parameters when setting up the dynamics of every vertex. This makes our model simpler, and also makes the comparison clearer between the network-SIR model and the MA-SIR model, since the only difference between them now is the social heterogeneity.
There are methods in the literature that consider multiple risk groups with different transmission parameters with multi-layer networks, which have different types of vertices and edges. Bansal & Meyers [18] created a framework based on Newman’s typical percolation model [13,15,33] for partial immunity. Other works considering other types of network models are reviewed by Kinsley et al. [34]. However, multi-layer networks require more parameters which we are unable to ascertain owing to our limited data.
It is also possible to consider more complex models than SIR-type models using percolation methods, such as those allowing for reinfection. As discussed in Johnson & Geffen [35], the reinfection of syphilis is possible with immunity waning in about weeks following primary or secondary stages of syphilis and weeks following latent stages. As shown in figure 2c, a few cases of reinfection were actually observed in the data. However, owing to limitations in our data, it is difficult to construct a proper model for reinfection with reliable parameter estimation, especially if we consider heterogeneity in social structure for the reinfection process. Since only a relatively small portion of cases involves reinfection, the impact of reinfection to infection dynamics would also be limited. Therefore, for now we treat each reported reinfection case as an independent new case and make the following simplifying assumption.
(H6) There is no reinfection. The disease is modelled using SIR compartments.
This also allows us to use the SIR compartment model proposed by Miller et al. in their original paper [21]. Considering reinfection and immunity will be a future topic if more data can be obtained.
Under assumptions (H1)–(H6), the model satisfies the conditions for applying a modified bond percolation process on a configuration model network based on an SIR compartmental model. Following Miller et al. [21], the dynamics of the disease transmission can be described by the following system of ordinary differential equations (ODEs):
(3.2) |
Similar to the normalized ODE system of the traditional MA-SIR model (reviewed later in §6), , and , respectively, represent proportions that are susceptible, infectious and recovered. The parameter is the uniform per-infected transmission rate and is the uniform per-infected recovery rate. Miller et al. [21] introduced to be the probability that a randomly chosen edge of a randomly chosen vertex in the network has not transmitted the infection at time . The function is the probability generating function of the degree distribution, which we derived in §3.2. Using this, one can define the basic reproduction number of the network-SIR model, as the average number of infections caused by a node infected early in an epidemic [21]. Unlike in equation (6.2) of the MA-SIR model discussed later in §6, the expression for the network-SIR model is given by:
(3.3) |
which depends on the network structure represented by derivatives of . Further details on this can be found in the literature [21,22].
To fit this network-SIR model (equation (3.2)) to data, we require initial conditions. Since data collection began in 2019, we set the initial time to correspond to 1 January 2019 and set parameter to be the initial number of infected individuals. We assume that initially the rest of the population is susceptible to the disease. The remaining parameters associated with equation (3.2) are the transmission rate and the recovery rate . In order to fit this model to data, we need to also account for imperfect reporting of syphilis, so we introduce a reporting probability which is further discussed in the next section. Thus the model has four epidemiological parameters ( , , and ).
Before discussing the estimation of parameters (§§4 and 5), we note how the case data is related to the model trajectories. Since the development of tertiary syphilis requires a long time (typically more than 5 years), the time frame we consider precludes tertiary cases from developing. Similarly, we also exclude cases with an unknown stage when fitting the dynamic of the current outbreak. We aggregated the reported case numbers on a bi-weekly basis (14 days) when fitting the parameters. Starting from 1 January 2019 , there are 116 bi-weeks until 29 May 2023 (the end time of data to be fitted), and 131 bi-weeks until 9 January 2024 (the end time of all the data available to us). This leaves 230 reported cases that fall within the first 116 bi-weeks, including reinfection cases, which are treated as independent cases owing to (H6).
From figure 1, we also note that syphilis takes weeks and even months to go through each of its stages, so there is potentially a longer duration between exposure and being reported compared with the duration between reporting and treatment. Thus, we consider reported cases as being reported as they transition from infectious to ‘recovered’ instead of classifying them as actively infectious. Additionally, we note that the data only contains the encounter date, where the case has been recorded and tested. Before implementation of POCT, the only testing procedure for syphilis in the KFL&A area was the serology test through the laboratory of Public Health Ontario. According to KFL&A PH, the typical turn-around time (TAT) between getting tested and receiving treatment is 9 days, considering transportation, notification and 4−5 days at a laboratory. Only after the test result is received will the individual get notification and then treatment. To reflect this, we add 9 days to each encounter date of cases as our final date of recovery.
4. Relationship between reporting probability and recovery rate
In this section, we show that the recovery rate and reporting probability are not independent, and we present a simple way to tie them together. As discussed in §1.1, syphilis is a treatable but not self-limiting disease. Without treatment, syphilis will not self-recover, however individuals will move on to late stages that might not have noticeable symptoms for up to 20 years [8]. Syphilis is typically considered not to be transmissible when it develops to late latent stage or later, which typically happens after 12 months (26 bi-weeks) of exposure.
From the transmission dynamics point of view, we can consider an infected individual as ‘recovered’ if they are no longer infectious, and thus if they have been treated or if they were not treated but are no longer transmitting the disease.
(H7) The infected individual moves from infectious compartment to recovered compartment if one of the following happens:
(i) the individual is tested, treated and reported before their syphilis infection reaches late latent stage; or
(ii) the individual’s syphilis develops to late latent stage, regardless of whether they were tested, treated and reported afterwards.
For simplicity, we assume a uniform constant probability of reporting.
(H8) Every ‘recovered’ individual has a probability of being tested, reported and recorded, regardless of time of testing and duration of infection.
Now, recall that we aggregated the data by bi-week. To fit the data, we also separate the contiguous time into short time intervals each with length of one bi-week by discrete time points . Consider the number of cases ‘recovered’ during the time point to is given by . If is an integer then since each individual has the same probability of reporting, the number of reported cases would follow a binomial distribution. We make the following assumptions.
(H9) The number of reported cases is a random variable following the binomial distribution.
(H10) The reporting process at any different reporting times are mutually independent. Consequentially, the corresponding random variables and are also mutually independent for any .
The corresponding PMF is given by
(4.1) |
In the next section, we will introduce the continuous binomial distribution adjustment used when is not an integer. For now, we note that . By (H9), the expectation of total reported recovered case until any time point is given by
(4.2) |
As discussed in Zhao & Magpantay [22], the application of the system (equation (3.2)) also requires the assumption that the infectious period (or ‘recovery time’) of all infected cases follows an exponential distribution with rate . The base MA-SIR model also requires similar assumptions. For such an assumption, the expectation of the infectious period is equal to the infectious duration given by .
The exponential distribution, together with (H7) and (H8) indicates that we can find a positive correlation between and . Increasing reporting probability means more cases are reported and treated. Since we assume is the same for all different stages, this results in more cases being reported and getting treatment at early stages, thus lowering the average infectious period and leading to higher .
As shown in figure 2, the collected data includes stage of syphilis based on interpretation of the serological test results. Therefore, we can estimate the infectious duration of reported cases, which we can use to further simplify the model by creating a bijection between and .
Consider the 292 total reported cases and equation (4.2) based on the properties of independent binomial distributions, we take as an estimate of the total recovered number . We now use this on the data to find as a function of . With (H7), the cases reported at primary, secondary and early latent stage would be the only cases that recovered before natural loss of transmission ability at the late latent stage. Based on typical duration of each syphilis stage given in figure 1, there are 88 reported cases of primary syphilis so we made an initial estimate that is the proportion of all ‘recovered’ cases that recovered before four bi-weeks (eight weeks is the estimated typical duration of primary syphilis). Next we noted that there are reported cases of secondary syphilis so we made an initial estimation as the proportion of ‘recovered’ cases before 10 bi-weeks (20 weeks is the estimated end of secondary syphilis). Since neurosyphilis can occur at any stage, for consistency and simplicity, we distributed all nine cases uniformly across 130 bi-weeks (about 5 years) so that each bi-week duration gets value of adjustment. As a result of adjusting we get the probabilities of ‘recovery’ at four and 10 bi-weeks respectively to be and . By (H7), we assume that the infectious period in most cases is finished before the late latent stage so we add at bi-weeks. We fitted these three percentiles (at four, 10 and 26 bi-weeks) with an exponential distribution using the ‘get.exp.par’ function from the R package rriskDistributions, which finds the optimized exponential distribution with least sum of square difference between the given probabilities and the theoretical probabilities evaluated at the given percentile points. An example of such optimization results with input percentiles is shown in figure 4a.
Figure 4.
Relationship between ‘recovery’ rate and reporting probability : (a) is an example of fitting the exponential distribution from percentiles to determine ‘recovery’ rate from reporting probability with as input and the resulting estimate is ; (b) is the curve of relationship between best fitted ‘recovery’ rate and reporting probability .
The departure from the percentiles is also partially offset by the individual difference in ‘recovery’. The only variable during such fitting procedure is , and it generates a unique value for each given , which reflects the correlation as shown in figure 4b.
This bijection relationship between and accounts for both factors of ‘recovery’ mentioned in (H7). As sexually transmitted infection(STI), the factors that determine the reporting probability in (H8) is affected by the social structure of the target population, so it cannot be directly determined from the literature. However, it is hard to model this probability with very limited information about reporting behaviour of the target population. Therefore, our approach is a simplifying assumption that allows us to estimate the two related parameters efficiently using only the diagnosed stages of cases at the time being reported as data. We acknowledge that this approach has limitations. As shown in figure 4a, there is a noticeable departure of the exponential distribution curve from the three points of percentile aggregated from data. This is partially explained by the individual differences in recovery time at each stage of syphilis. However, we also acknowledge that the exponential distribution, which is commonly used to model these recovery times, might not accurately reflect the real recovery time distribution. A better fit might be possible if we consider a bottom-up approach for reporting probability, or a multi-layer network in which infected individuals are treated as two or more risk groups by their reporting status and stages. This could be the topic of future work with more available data for the reporting probability and recovery time.
5. Maximum likelihood estimation of parameters
We showed in the previous section that the recovery rate can be determined from a given value of . Thus, to fit the model to the time series data we only need to find estimations of three parameters: the initial number of infectious individuals , the reporting probability and the transmission rate . We did this using maximum likelihood.
5.1. Computation of likelihood
Here we describe how to compute the likelihood that a given set of values of the parameters , and gave rise to our time series of syphilis cases aggregated by bi-weeks. We set (where corresponds to 1 January 2019). Since, there were limited cases reported before the initial time then we also assume that the start of this outbreak is not likely to be more than 1 year prior to 2019. Thus we set since there should not be enough time for any infected cases develop to late latent stage. Therefore, we have the following initial conditions: , , and can be found as solution of the equation
(5.1) |
Solving this yields all the required initial conditions to generate the trajectory of equation (3.2) for a given , and .
Given any trajectory of the recovered individuals (whether computed via an MA-SIR model or the network-SIR model), we can determine the likelihood using the same under-reporting binomial model discussed in §4. We use the notation for the trajectory given parameter vector . Therefore, the number of recoveries over each time duration between and (one bi-week later) is denoted by . Since is not necessarily an integer and we are unable to use the standard discrete binomial distribution as described in equation (4.2). Instead, to compute the likelihood we consider a continuous analogue of the binomial distribution [36], which has cumulative density function given by
(5.2) |
Here is the incomplete function given by
(5.3) |
For our computation of likelihood, we approximate the probability density function by
(5.4) |
We note that unlike the discrete binomial distribution which requires , the continuous analogue allows . Thus to compute the likelihood, we replace the in equation (4.1) by in equation (5.4) when fitting the data. Figure 5 provides an example comparing the with .
Figure 5.
Comparison between the probability mass function of binomial distribution and the probablity density function of its continuous analogue (equation (5.4)) with and .
Now let be the adjusted aggregated bi-weekly reported case count from the data, where , and let be the vector of data. Thus, given case count data for the first 116 bi-weeks, for any parameter vector using the binomial under-reporting model the log-likelihood function is given by
(5.5) |
5.2. Maximum likelihood estimates
Using equation (5.5) we searched for the maximum likelihood estimate (MLE) of the parameters , and for the network-SIR model. The result is presented in table 1. All model trajectories were generated by EpiNetPerco, the R package that we developed. Code used to generate the figures in this paper will be uploaded as part of EpiNetPerco documentation.
Table 1.
MLEs and predictions of the network-SIR model. (Rates ( and ) have unit bi-week . The intervals in brackets are the Wald confidence intervals (CIs) of the corresponding fitted single parameter.)
notation |
definition |
model value |
---|---|---|
|
population/network size |
26 000 |
|
optimal power law parameter of degree distribution |
1.738004 |
|
optimal initial active case count |
27 |
|
optimal uniform per-infected transmission rate |
0.001926 |
|
optimal uniform reporting probability |
0.2755 |
|
optimal uniform per-infected recovery rate |
0.05192518 |
|
maximal log likelihood |
−180.2353 |
|
basic reproductive number |
1.8548 |
|
final infected proportion |
0.06606 |
|
final infected size |
1717.59 |
5.2.1. Finding the optimal
To better illustrate the results, we first look at the likelihood profile over . For each fixed value, we calculate the log-likelihood for a large grid of values of the other parameters and (for example as in figure 7), and recorded the maximum log-likelihood attained. This plot is shown in figure 6. We see also that attains the highest likelihood.
Figure 6.
Profile over for the network-SIR model. The red point corresponds to the optimal parameter value .
5.2.2. Log-likelihood as a function of and for the optimal
The log-likelihood as a function of and is shown as a heat map in figure 7. For clarity, the colour in the heat map is actually determined by the adjusted log-likelihood which is given by
Figure 7.
Heat map of log-likelihood for the network-SIR model with . In (a), the single variable confidence interval is shown using the black bars while the multivariate confidence area is shaded in blue.
(5.6) |
This means the higher is, the closer it is to red in the heat map and the corresponding log-likelihood is higher and closer to the maximum likelihood. This adjustment is applied to all the heat maps in this paper.
In figure 7a, the grey area indicates parameters values for which some data points are actually larger than from the prediction, which gives it a value of and the corresponding log-likelihood is . We also present the Wald 95% confidence interval for each single variable and the multivariate confidence area.
The results are presented in table 1. We included here the epidemiological predictions, such as the final infected proportion and final infected size, computed using our R package EpiNetPerco. Further details on the theory behind this are provided in Zhao & Magpantay [22]. We also present a comparison of the expected trajectory of equation (4.2) and all of aggregated data in figure 8.
Figure 8.
Comparison of the trajectories of the expected reported cases of the network-SIR model with data. We note that the expected reported number of cases resulting from the network-SIR model follows the general tendency of the fitted data, based on both the running average of bi-weekly case-counts and cumulative case-counts. The dotted vertical lines in both (a) and (b) reflect the date when the rapid test and treat protocol was implemented. We note that we expect that this protocol led to an increase in the reporting probability (and thus the reported case-counts). However, owing to the limited time duration and coverage of the early implementation, we do not expect the new protocol would significantly change the tendency of the disease dynamics in the short-term period included in the data.
6. Comparing the network susceptible-infectious-recovered and mass action-susceptible-infectious-recovered models
To illustrate the difference between network-SIR and MA-SIR models, we present the results of fitting the standard MA-SIR model using the same data, same under-reporting model and same algorithms. The only thing we need to alter is the assumption about social structure. Instead of a network characterized by degree distribution discussed in §3.2, we assume that the population is fully mixed and the dynamics are given by
(6.1) |
We again assume and the rest of the population is susceptible at time . By contrast to the expression in equation (3.3), the basic reproduction number of this MA-SIR model is given by
(6.2) |
In figure 9, we present the profile likelihood over the initial condition , which can be compared directly with the results we found for the network-SIR model in figure 6. We see that the MA-SIR model is far less sensitive to the initial number of infected individuals owing to trade-offs with other parameters (such as the reporting probability). In this case, the likelihood profile is relatively flat for a large range of values and the MLE is at a much higher initial active case count at compared with for the network-SIR model. We present the results of the MA-SIR model with both and , and compare these with the results from the network-SIR model.
Figure 9.
Maximum log-likelihood of different values for the MA-SIR model. The red point is the optimal value , and the green point is the optimal value for network-based model .
For the case, the heat map of log-likelihood is shown in figure 10, the numerical parameters, confidence intervals and predictions are given in table 2. For the case, these are shown in figure 11 and table 2.
Figure 10.
Heat map of log-likelihood for the MA-SIR model with .
Table 2.
MLE and predictions of the MA-SIR model with and . (Rates ( and ) have unit bi-week . The intervals in brackets are the Wald CIs of the corresponding fitted single parameter.)
notation |
definition |
MA-SIR |
MA-SIR |
---|---|---|---|
|
population/network size |
26 000 |
|
|
optimal initial active case count |
117 |
N/A |
|
assumption initial active case count |
N/A |
27 |
|
optimal uniform per-infected transmission rate |
0.007390 |
0.07654 |
|
optimal uniform reporting probability |
0.0605 |
0.2485 |
|
optimal uniform per-infected recovery rate |
0.04463588 |
0.05091438 |
|
maximal log likelihood |
−176.8698 |
−178.8482 |
|
basic reproductive number |
1.6556 |
1.5033 |
|
final infected proportion |
0.6731 |
0.5835 |
|
final infected size |
17501.32 |
15171.09 |
Figure 11.
Heat map of log-likelihood for the MA-SIR model with .
We see from table 2 that in both cases the MA-SIR model tends to predict a larger final infected size that is more than nine times greater than what we get from the network-SIR model. These MA-SIR models actually predict that more than half of the target population will be infected. We can further see such behaviour from the comparison of the dynamics in figures 12 and 13. It is clear that the homogeneous mixing MA assumption has very strong consequences.
Figure 12.
Comparison of the trajectories of the expected reported cases of the two MA-SIR models and the network-SIR model with data. The dotted vertical lines in both (a) and (b) reflect the date when the POCT was implemented. Like the network-SIR model in figure 8, the expected number of reported cases of MA-SIR models still match with the general tendency of the fitted data. However, once not guided by the fitted data, both MA-SIR models predict a much faster growth to reported cases, which does not follow the trends we see in the unfitted data.
Figure 13.
The prediction trajectories for and until the end of outbreak. The trajectories of the two MA-SIR models and network-SIR model are shown. For each sub-figure, the first dotted vertical line is when the POCT is implemented and the second is the end of time in current collected data.
We also note that in figures 10a and 11a, the boundary of the grey area on right side of the heat maps is no longer a relatively smooth curve like in figure 7 for network models. As mentioned in §5.2, the grey areas on the left for all three cases reflect prediction trajectories where is lower than contributing to a likelihood of zero. This is also true for the right grey area for network case, however since the MA-SIR models tend to estimate more cases and more rapid bi-weekly incidence while some of the case counts from data are low, the corresponding probability density function (PDF) can go very close to zero. This becomes more of an issue as increases (going to the right on the heat maps). Numerical underflow eventually causes and , which is also plotted in grey. Our investigation of this issue says that the parts of this grey area on the right side of the heat maps for MA-SIR models is still feasible but with extremely low probability or likelihood. Moreover, with further empirical investigation, the irregular boundary is combined by several relatively smooth curves, each corresponding to issues with a low value data point.
We present the results of a sensitivity analysis of the final infected size with respect to some of the fitted parameters in figure 14. For the network model, we consider the impact of the per-infected transmission rate , reporting probability and the power law parameter of degree distribution . For the MA-SIR models, we show the impact of and . We perturb one parameter each time while keep all other parameters fixed and calculate the resulting final infected size accordingly. Note that and are still tied by the relationship as described in §4. The range of values used for and is given by their confidence intervals given in tables 1 and 2. The range used for is given by . We did not include sensitivity to here since this can already be observed in figures 6 and 9.
Figure 14.
Sensitivity analysis of final infected size and corresponding proportion for the network-SIR and MA-SIR models. The range of values used when varying and is their confidence intervals in tables 1 and 2. The range used for is given by . The dashed vertical line represent the final infected size from the optimal fitted model.
We also note again that the results from the network-SIR model is normalized as proportions of the total target population since it is originally built on networks with infinite size. So the population size is mainly required when fitting to data to determine the magnitude of parameters, but has limited impact on normalized prediction results. To confirm this, we also present the result of changing the estimated population size N = 26 000 to N × (1−5%) = 24 700 and N × (1+5%) = 27 300 the network model as sensitive analysis of the population size. All parameters other than is fitted again based on new population size. The comparison of the parameter and model prediction is given in table 3.
Table 3.
MLE and predictions of the network-SIR model with N = 24 700, N = 26 000 and N = 27 300. (Rates ( and ) have unit bi-week .)
notation |
definition |
model value |
||
---|---|---|---|---|
|
population/network size |
24 700 |
26 000 |
27 300 |
|
optimal power law parameter of degree distribution |
1.738004 |
||
|
optimal initial active case count |
23 |
27 |
26 |
|
optimal uniform per-infected transmission rate |
0.0020 |
0.0019 |
0.0019 |
|
optimal uniform reporting probability |
0.260 |
0.275 |
0.285 |
|
optimal uniform per-infected recovery rate |
0.05134114 |
0.05192518 |
0.05228828 |
|
basic reproductive number |
1.9444 |
1.8548 |
1.8184 |
|
final infected proportion |
0.0712246 |
0.06606 |
0.06391 |
|
final infected size |
1759.248 |
1717.588 |
1744.713 |
7. Point of care test and treat
Typically, syphilis diagnosis requires a suspected case to identify symptoms or be informed from a known case of a possible infection, find an appropriate clinic, access health services with a valid health insurance number, attend the initial appointment for serological testing and come back to the clinic days later to learn the results of the test. Each step in this process makes it less accessible for underserved community members to be diagnosed and treated, and more likely to be lost to follow up during the early stages when the infection is most infectious.
As a response to the reported local outbreak, KFL&A PH implemented a rapid test and treatment protocol using a rapid syphilis antibody POCT. The medical directive was to treat suspected cases based on POCT results immediately, significantly reducing the TAT time and allowing testing without clinic or laboratory environments. This protocol is employed by outreach nurses, and implemented at various sites, including community-based organizations and advertised outreach events (Blitzes), making it more accessible to the population at greatest risk. Since this POCT was only recently approved by Health Canada, it is still in the very early stages of real-world investigation [10]. Using our existing model, we aimed to support this effort by building prediction models with some conjectures to illustrate the potential impact of POCT.
We assume that the test is accurate and would be applied to every case reported. The network-SIR model in §5.2 is considered our baseline model, for the scenario when POCT is not implemented. We consider different scenarios where we increase coverage of POCT implementation, leading to an increase in the reporting probability while also decreasing the test TAT for all reporting cases. We consider the following scenarios:
(i) reporting probability is increased to and infectious duration of reported cases is decreased by 9 days;
(ii) reporting probability is increased to and infectious duration of reported cases is decreased by 9 days; and
(iii) reporting probability is increased to and infectious duration of reported cases is decreased by 9 days.
The decrease in the infectious duration is done by reducing by bi-week units the data point for the percentiles of primary infections (from four bi-weeks to bi-weeks) and secondary stage (from 10 bi-weeks to bi-weeks when fitting the relationship between the reporting probability and the corresponding modified recovery rate for scenarios using the procedure discussed in §4. The per-infected transmission rate is left unchanged.
In figure 15, we show the effect of scenarios (i)−(iii) in two different settings. First, we consider the effect of POCT on an entirely new epidemic by generating trajectories as in figure 13 using the same transmission rate as the baseline model but with , new recovery rate and new reporting probability for each of the scenarios. The resulting trajectories are shown in in figure 15 as solid curves.
Figure 15.
Comparison of the trajectories under three different scenarios and two different settings of POCT implementation. The trajectories of the baseline network-SIR model is also included for comparison. The vertical lines in both (a) and (b) denote when POCT was actually implemented by KFL&A PH.
The second setting we consider is the effect of POCT on the ongoing outbreak in KFL&A by assuming there was a switch in the recovery rate and reporting probability in June 2023 (, after the end of the fitted data). To do this, we stopped the baseline model at and then restarted to generate trajectories for the next time period with the new reporting probability and new recovery rate for each scenario. The results are shown in figure 15 as dotted curves.
From the theory discussed in the literature [21,22], we know that the final size of the epidemic is determined by the equilibrium of the system (equation (3.2)), and thus independent of the setting. We see this in figure 15. The values of the equilibrium under different POCT implementation is presented in table 4.
Table 4.
Model assumptions and predicted final epidemic sizes for the different POCT implementation scenarios. (Rates ( and ) have unit bi-week .)
notation |
definition |
baseline |
POCT 1 |
POCT 2 |
POCT 3 |
---|---|---|---|---|---|
|
population/network size |
26 000 |
|||
|
power law parameter of degree distribution |
1.738004 |
|||
|
initial active case count |
1 |
|||
|
uniform per-infected transmission rate |
0.001926 |
|||
|
uniform reporting probability |
0.2755 |
0.3255 |
0.3755 |
0.4255 |
|
optimal uniform per-infected recovery rate |
0.05192518 |
0.05652490 |
0.05874144 |
0.06111252 |
|
basic reproductive number |
1.8548 |
1.7088 |
1.6464 |
1.5845 |
|
final infected proportion |
0.06606 |
0.05722 |
0.05325 |
0.04917 |
|
final infected size |
1717.588 |
1487.721 |
1384.389 |
1278.466 |
N/A |
final infection size relative to baseline model |
100.00% |
86.62% |
80.60% |
74.43% |
We see from table 4 that increasing reporting probability by , together with the TAT reduction on reported cases has a significant impact on the final infected size, with a reduction of compared with the baseline model. After the first increase in reporting probability, the other two additional increases in reporting probability each lead to an additional decrease of approximately 6% of the final infected size. Since the final infected sizes are independent of initial conditions, these results suggest that there can be a great advantage in implementing POCT even if the intervention was started after the peak of the infection. The POCT can still efficiently decrease the final infection size even if it only adds a increase in the reporting probability.
To illustrate the benefit and efficiency of the rapid test and treat strategy, we present a simple economic analysis here based on the POCT 1 scenario. Our model prediction in table 4 suggests that increasing the reporting probability by 5% and using the rapid test and treat strategy in the KFL&A area would lead to an average of infections (0.05722 of the target population), which is about less infections than the baseline. Since this scenario requires an overall reporting probability, an expected total of of cases would need to be reported. The infected fraction of the population under this scenario is 0.05722 so we assume that strategies can be devised to find this many cases after testing approximately individuals. Based on KFL&A PH estimates, the average cost of each POCT is $45 CAD ($10–$16 for test kit and $25–$40 for average nursing time and associated costs) so the POCT 1 scenario has an estimated cost of 8643 × 45 = 388 935 CAD. On the other hand, Chesson & Peterman [37] estimated that the discounted lifetime cost per infection of syphilis was 1 190 USD in the United States in 2019. After adjusting for the exchange rate and inflation, we projected an average discounted lifetime cost of 2 000 CAD for KFL&A PH in 2024. Even without accounting for the cost of the baseline scenario using traditional testing, the decrease of just 230 infections from baseline in the POCT 1 scenario would already yield a savings of 460 000 CAD, enough to cover the entire estimated cost of the POCT 1 strategy. This very conservative economic analysis favours implementing POCT 1, assuming the test is accurate. The estimated economic benefits could be even higher if we also consider (i) the public health resources that are saved when attempting to track ‘lost’ cases, and (ii) the reduction in average infection time, since earlier treatment of syphilis would lead to better health outcomes [8].
From figure 15a, another potential effect of POCT would be ‘flattening the curve’ and delaying the peak of the outbreak, which creates more time for public health professionals to implement more time-consuming interventions such as awareness campaigns to promote safe sex to suppress .
8. Conclusions
In this paper, we applied percolation-based network epidemiology models to model the transmission of syphilis in the KFL&A PH area. In §3, we described the construction of the baseline network-SIR model from the collected data, including assumptions we made on the size of the target population and the role of risk factors in scaling up the reported number of contacts from the data. We discussed how we fitted a power law distribution to the scaled sample degree distribution and presented the equations used for the dynamics of the model. In §4, we discussed the connection between reporting probability and recovery rate, tying them together by fitting the cumulative distribution function of an exponential distribution for the infectious period.
In §5, we presented the parameters of the network-SIR model are determined using maximum likelihood. We also presented the expected trajectory and final epidemic size. In §6, we compared the results of the network-SIR model with those of traditional MA-SIR models. The comparison clearly illustrates the advantages of percolation-based network models and the limitations of MA-SIR models when modelling disease outbreaks wherein network structure is very important. The MA-SIR model tends to estimate far larger final epidemic sizes than the network-SIR model.
Finally in §7, we modelled the impact of rapid test and treat intervention under different scenarios. For each scenario, we also considered two different settings, reflecting different timings of the implementation of POCT. The results shows that the POCT implementation can be very impactful in suppressing the final outbreak size, even with relatively low level of reporting. Moreover, the model indicates that if the POCT is implemented before the outbreak, it will flatten the curve and delay the peak of infection, allowing public health workers time to implement other control strategies. We included a conservative economic analysis of the POCT strategy which supports the use of POCT.
As discussed in §§3 and 4, our model employs many assumptions owing to limitations in data and resources. The estimates and predictions we generated should only be used to compare the scenarios considered and the models used (MA-SIR versus network-SIR). We believe this exercise is still useful as a tool for supporting public health in modelling the spread of syphilis and the effect of different control efforts.
We plan to conduct further investigation on this work from multiple perspectives. One goal is to remove some of our simplifying assumptions. For example, we would be interested in modelling reinfection by using a susceptible-infectious-recovered-susceptible model and removing (H6). Another aim for future work concerns the construction of the network. We would be interested in comparing our scaled degree distribution to real-life degree distributions if more data were available through behaviour surveys in the underserved communities. This can improve our estimates of the size of the target population as well as the network degree distribution discussed in §4, and the results would also apply to modelling other types of sexually transmitted infections. In addition, for simplicity we had assumed that the sensitivity of the POCT strategy is perfect, but empirical evidence from the current intervention process indicates otherwise. We plan to extend our economic analysis to explore imperfect sensitivity in future work.
The rapid test and treat intervention has been expanded to include eight public health units across Ontario, known as the SPRITE study [10] Implementation data will be collected using the Reach, Effectiveness, Adoption, Implementation and Maintenance framework, following a community-based participatory approach. The data will be used to refine our estimates of the impact of rapid test and treat better, without relying on hypothetical scenarios discussed in §7. We also aim to expanding our framework to other communities with different social structures or to other diseases with different transmission dynamics.
Acknowledgements
We thank Duy A. Dinh for his help with the literature review.
Contributor Information
Sicheng Zhao, Email: zhaos126@mcmaster.ca.
Sahar Saeed, Email: sahar.saeed@queensu.ca.
Megan Carter, Email: megan.carter@queensu.ca.
Bradley Stoner, Email: bradley.stoner@queensu.ca.
Maggie Hoover, Email: maggienhoover@gmail.com.
Hugh Guan, Email: hugh.guan@kflaph.ca.
Felicia Maria G. Magpantay, Email: felicia.magpantay@queensu.ca.
Ethics
This work did not require ethical approval from a human subject or animal welfare committee.
Data accessibility
Data and relevant code for this research work are stored in GitHub: https://github.com/RichardSichengZhao/Data-Repository-for-EBMforDisTransOnRandGraphs-An-Application-to-Mitigate-a-Syphilis-Outbreak and have been archived within the Zenodo repository: [38].
Supplementary material is available online [39].
Declaration of AI use
We have not used AI-assisted technologies in creating this article.
Authors’ contributions
S.Z.: conceptualization, data curation, formal analysis, investigation, methodology, project administration, software, visualization, writing—original draft, writing—review and editing; S.S.: conceptualiza‑ tion, funding acquisition, investigation, methodology, project administration, writing—review and editing; M.C.: conceptualization, data curation, funding acquisition, investigation, methodology, project administration, writing— review and editing; B.S.: conceptualization, funding acquisition, investigation, methodology, project administration, writing—review and editing; M.H.: conceptualization, data curation, funding acquisition, investigation, methodol‑ ogy, project administration, writing—review and editing; H.G.: conceptualization, data curation, funding acquisition, investigation, methodology, project administration, writing—review and editing; F.M.G.M.: conceptualization, formal analysis, funding acquisition, investigation, methodology, project administration, supervision, writing—review and editing.
All authors gave final approval for publication and agreed to be held accountable for the work performed therein.
Conflict of interest declaration
We declare we have no competing interests.
Funding
This work was supported by the Canadian Institutes of Health Research (CIHR) through a Catalyst (SR8 190795), Operating (AS1-192619) and Knowledge Mobilization (EKS 193138) granted to the NPA SS. F.M.G.M. also acknowledges the support of the Natural Sciences and Engineering Research Council of Canada (NSERC), Mathematics for Public Health and the Canadian Foundation for Innovation.
References
- 1. de Schryver A, Meheus AZ. 1990. Syphilis and blood transfusion: a global perspective. Transfusion 30, 844–847. ( 10.1046/j.1537-2995.1990.30991048793.x) [DOI] [PubMed] [Google Scholar]
- 2. Ghanem K, Hook E. 2020. Syphilis. In Goldman-cecil medicine, vol. 2, 26th edn (eds Goldman L, Schafer AI), pp. 1983–1989. Philadelphia, PA: Elsevier. [Google Scholar]
- 3. Aho J, Lybeck C, Tetteh A, Issa C, Kouyoumdjian F, Wong J, Anderson A, Popovic N. 2022. Rising syphilis rates in Canada, 2011–2020. Can. Commun. Dis. Rep. 48, 52–60. ( 10.14745/ccdr.v48i23a01) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Public Health Ontario (PHO) . 2024. Syphilis in Ontario- focus on 2022. See https://www.publichealthontario.ca/-/media/Documents/S/2023/syphilis-ontario-focus-2022.pdf.
- 5. Public Health Ontario (PHO) . 2024. Maternal and early congenital syphilis in Ontario: 2020-2022. See https://www.publichealthontario.ca/-/media/Documents/S/2023/syphilis-ontario-focus-2022.pdf.
- 6. Singh AE, Romanowski B. 1999. Syphilis: review with emphasis on clinical, epidemiologic, and some biologic features. Clin. Microbiol. Rev. 12, 187–209. ( 10.1128/cmr.12.2.187) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Valentine JA, Bolan GA. 2018. Syphilis elimination: lessons learned again. Sex. Transm. Dis. 45, S80–S85. ( 10.1097/olq.0000000000000842) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Public Health Agency of Canada (PHAC) . 2024. Syphilis: symptoms and treatment. See https://www.canada.ca/en/public-health/services/diseases/syphilis.html.
- 9. French P. 2007. Syphilis. Br. Med. J. 334, 143–147. ( 10.1136/bmj.39085.518148.BE) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Mackrell L, et al. 2024. Syphilis Point of Care Rapid Test and Immediate Treatment Evaluation (SPRITE) study: a mixed-methods implementation science research protocol of eight public health units in Ontario, Canada. BMJ Open 14. ( 10.1136/bmjopen-2024-089021) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. The Centers for Disease Control and Prevention (CDC) . 2023. About syphilis. See https://www.cdc.gov/ syphilis/about/.
- 12. Kermack W, McKendrick A. 1927. A contribution to the mathematical theory of epidemics. Proc. R. Soc. A 115, 700–721. ( 10.1098/rs) [DOI] [Google Scholar]
- 13. Newman MEJ, Strogatz SH, Watts DJ. 2001. Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E 64, 026118. ( 10.1103/physreve.64.026118) [DOI] [PubMed] [Google Scholar]
- 14. Bollobás B. 1980. A probabilistic proof of an asymptotic formula for the number of labelled regular graphs. Eur. J. Comb. 1, 311–316. ( 10.1016/s0195-6698(80)80030-8) [DOI] [Google Scholar]
- 15. Newman MEJ. 2002. Spread of epidemic disease on networks. Phys. Rev. E 66, 016128. ( 10.1103/physreve.66.016128) [DOI] [PubMed] [Google Scholar]
- 16. Frisch HL, Hammersley JM. 1963. Percolation processes and related topics. J. Soc. Ind. Appl. Math. 11, 894–918. ( 10.1137/0111066) [DOI] [Google Scholar]
- 17. Grassberger P. 1983. On the critical behavior of the general epidemic process and dynamical percolation. Math. Biosci. 63, 157–172. ( 10.1016/0025-5564(82)90036-0) [DOI] [Google Scholar]
- 18. Bansal S, Meyers LA. 2012. The impact of past epidemics on future disease dynamics. J. Theor. Biol. 309, 176–184. ( 10.1016/j.jtbi.2012.06.012) [DOI] [PubMed] [Google Scholar]
- 19. Keeling MJ. 1999. The effects of local spatial structure on epidemiological invasions. Proc. R. Soc. B 266, 859–867. ( 10.1098/rspb.1999.0716) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Volz E, Meyers LA. 2009. Epidemic thresholds in dynamic contact networks. J. R. Soc. Interface 6, 2008. ( 10.1098/rsif.2008.0218) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Miller JC, Slim AC, Volz EM. 2012. Edge-based compartmental modelling for infectious disease spread. J. R. Soc. Interface 9, 890–906. ( 10.1098/rsif.2011.0403) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Zhao S, Magpantay FMG. 2025. Disease transmission on random graphs using edge-based percolation. Math. Meth. Appl. Sci ( 10.1002/mma.10963) [DOI] [Google Scholar]
- 23. Statistics Canada . 2011. Low-income measure after tax (LIM-AT). See https://www12.statcan.gc.ca/nhs-enm/2011/ref/dict/fam021-eng.cfm.
- 24. Jolly AM, Wylie JL. 2001. Sampling individuals with large sexual networks. Sex. Transm. Dis. 28, 200–207. ( 10.1097/00007435-200104000-00003) [DOI] [PubMed] [Google Scholar]
- 25. Jolly A, Muth S, Wylie J, Potterat J. 2001. Sexual networks and sexually transmitted infections: a tale of two cities. J. Urban Health 78, 433–445. ( 10.1093/jurban/78) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Chan PA, et al. 2018. A network analysis of sexually transmitted diseases and online hookup sites among men who have sex with men. Sex. Transm. Dis. 45, 462–468. ( 10.1097/olq.0000000000000784) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. MacLean R. 2018. Resources to address stigma related to sexuality, substance use and sexually transmitted and blood-borne infections. Can. Commun. Dis. Rep. 44, 62–67. ( 10.14745/ccdr.v44i02a05) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Liljeros F, Edling CR, Amaral LAN. 2003. Sexual networks: implications for the transmission of sexually transmitted infections. Microbes Infect. 5, 189–196. ( 10.1016/s1286-4579(02)00058-8) [DOI] [PubMed] [Google Scholar]
- 29. Liljeros F, Edling CR, Amaral LAN, Stanley HE, Åberg Y. 2001. The web of human sexual contacts. Nature 411, 907–908. ( 10.1038/35082140) [DOI] [PubMed] [Google Scholar]
- 30. Schneeberger A, Mercer CH, Gregson SAJ, Ferguson NM, Nyamukapa CA, Anderson RM, Johnson AM, Garnett GP. 2004. Scale-free networks and sexually transmitted diseases. Sex. Transm. Dis. 31, 380–387. ( 10.1097/00007435-200406000-00012) [DOI] [PubMed] [Google Scholar]
- 31. Watts DJ, Strogatz SH. 1998. Collective dynamics of ‘small-world’ networks. Nature 393, 440–442. ( 10.1038/30918) [DOI] [PubMed] [Google Scholar]
- 32. Clauset A, Shalizi CR, Newman MEJ. 2009. Power-law distributions in empirical data. SIAM Rev. 51, 661–703. ( 10.1137/070710111) [DOI] [Google Scholar]
- 33. Newman MEJ. 2003. The structure and function of complex networks. SIAM Rev. 45, 167–256. ( 10.1137/s003614450342480) [DOI] [Google Scholar]
- 34. Kinsley AC, Rossi G, Silk MJ, VanderWaal K. 2020. Multilayer and multiplex networks: an introduction to their use in veterinary epidemiology. Front. Vet. Sci. 7, 00596. ( 10.3389/fvets.2020.00596) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Johnson LF, Geffen N. 2016. A comparison of two mathematical modeling frameworks for evaluating sexually transmitted infection epidemiology. Sex. Transm. Dis. 43, 139–146. ( 10.1097/olq.0000000000000412) [DOI] [PubMed] [Google Scholar]
- 36. Ilienko A. 2013. Continuous counterparts of Poisson and binomial distributions and their properties. Annales Universitatis Scientiarum Budapestinensis de Rolando Eötvös Nominatae. Sectio Computatorica 39, 03. ( 10.48550/arXiv.1303.5990) [DOI] [Google Scholar]
- 37. Chesson HW, Peterman TA. 2021. The estimated lifetime medical cost of syphilis in the United States. Sex. Transm. Dis. 48, 253–259. ( 10.1097/olq.0000000000001353) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Zhao S. 2025. RichardSichengZhao/Data-Repository-for-EBM forDisTransOnRandGraphs-An -Application-to-Mitigate-a-Syphilis-Outbreak: Release for RSOS submission (v2.0.1). Zenodo. ( 10.5281/zenodo.15090772) [DOI]
- 39. Zhao S, Saeed S, Carter M, Stoner B, Hoover M, Guan H. 2025. Supplementary material from: Edge-based modeling for disease transmission on random graphs: an application to mitigate a syphilis outbreak. Figshare. ( 10.6084/m9.figshare.c.7750313) [DOI]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data and relevant code for this research work are stored in GitHub: https://github.com/RichardSichengZhao/Data-Repository-for-EBMforDisTransOnRandGraphs-An-Application-to-Mitigate-a-Syphilis-Outbreak and have been archived within the Zenodo repository: [38].
Supplementary material is available online [39].