Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2024 Apr 1;14:7619. doi: 10.1038/s41598-024-58390-3

The epidemic forest reveals the spatial pattern of the spread of acute respiratory infections in Jakarta, Indonesia

Yuki Novia Nasution 1, Marli Yehezkiel Sitorus 1, Kamal Sukandar 2, Nuning Nuraini 1,, Mochamad Apri 1, Ngabila Salama 3
PMCID: PMC10982301  PMID: 38556584

Abstract

Acute respiratory infection (ARI) is a communicable disease of the respiratory tract that implies impaired breathing. The infection can expand from one to the neighboring areas at a region-scale level through a human mobility network. Specific to this study, we leverage a record of ARI incidences in four periods of outbreaks for 42 regions in Jakarta to study its spatio-temporal spread using the concept of the epidemic forest. This framework generates a forest-like graph representing an explicit spread of disease that takes the onset time, spatio-temporal distance, and case prevalence into account. To support this framework, we use logistic curves to infer the onset time of the outbreak for each region. The result shows that regions with earlier onset dates tend to have a higher burden of cases, leading to the idea that the culprits of the disease spread are those with a high load of cases. To justify this, we generate the epidemic forest for the four periods of ARI outbreaks and identify the implied dominant trees (that with the most children cases). We find that the primary infected city of the dominant tree has a relatively higher burden of cases than other trees. In addition, we can investigate the timely (Rt) and spatial reproduction number (Rc) by directly evaluating them from the inferred graphs. We find that Rt for dominant trees are significantly higher than non-dominant trees across all periods, with regions in western Jakarta tend to have higher values of Rc. Lastly, we provide simulated-implied graphs by suppressing 50% load of cases of the primary infected city in the dominant tree that results in a reduced Rc, suggesting a potential target of intervention to depress the overall ARI spread.

Keywords: ARI, Epidemic forest, Early detection

Subject terms: Diseases, Health care, Mathematics and computing

Introduction

Acute Respiratory Infection (ARI) is an infection that affects the respiratory tract and is classified as upper and lower respiratory tract infections. The common symptoms may include fever, cough, wheezing, and short-rapid breathing1. ARI, including pneumonia and influenza, has become a global health burden, particularly for infants and elder people26. In 2019, ARI caused 4.43 % of deaths worldwide for all ages7, and at the end of the year, the infection caused by the coronavirus disease (COVID-19) became a global pandemic. The infections of ARI during the pandemic COVID-19 tend to decline in most regions due to the implementation of public health measures810. However, some studies show the increasing burden caused by respiratory viral infections in several countries after the pandemic1113.

Various bacteria and viruses can be the cause of ARI. Physical contact and airborne routes are ways for the infection to be transmitted from the infected individual to the susceptible ones. The mobility of people is one factor that causes the spread of infection from one region to another14. The study of the spread of infection is an attempt to control the outbreak and relieve the burden on health institutions. Some studies model the spread of infection in both temporal and spatial aspects. For example, the spread of COVID-19 had been studied through temporal model15,16 which incorporated human mobility17, limited testing capacity18 and provide information for public policy making1921. The spatio-temporal aspect in the spread of COVID-19 had also been modeled for cases in South America22, Bangladesh23, Iran24, Italy25, and Singapore26. The spread of respiratory infections between regions has been modeled by network concept through traffic connection14 and airline route2730.

The concept of epidemic tree was introduced to understand the spatial pattern of disease spread, particularly the 2001 UK foot-and-mouth outbreak31. In the research, an algorithm was constructed to determine the source of infection of each case which arises in several pig farms within the duration of the outbreak. The epidemic forest approach was constructed to spasialize epidemic modeling which was able to identify primary cases at the individual-level and built the forest for the 2013 dengue fever epidemic in Guangzhou, China32.

The identification of primary cases is also important for region-scale epidemic modeling. It can provide information for early detection of an outbreak which may occur in a certain city or country. Therefore, a better understanding of the disease spread in region-scale is crucial to build awareness and preparation by the government to control the outbreak. In this study, we developed an epidemic forest model to describe the spread of infectious diseases between regions by adopting an inter-individual spread model. Through adoption, a region is assumed to only receive transmission of infection from one neighboring region, which in terms of spread between individuals represents close contact. Thus, a unique source of infection can be determined. This study aims to identify the origin of ARI’s spread in the Special Region of Jakarta from 2017-2021 by observing the consistency of seasonality and severity of the disease at the district level for early detection using epidemic forest.

Results

The dataset comprises monthly records of ARI incidences from 2017 to 2021 for 42 regions in Jakarta. These records were collected from hospitals and health care centers in each corresponding region that we assumed represented the actual situation of ARI incidences.

A data-based evidence of ARI’s seasonality

We investigated the spatio-temporal dataset of ARI incidences by the heatmap-style diagram, with the row representing the list of the observed regions. To cluster regions with similar ARI incidences, we used hierarchical clustering based on the ratio of ARI cases relative to the population size. Shown in Fig. 1, we identified four periods of outbreaks marked by the peaks of ARI incidences roughly in all 42 regions, with (1) period 1: July 2017 - July 2018 (2) period 2: Aug 2018 - July 2019, (3) period 3: September 2019 - Aug 2020 and (4) period 4: September 2020 - September 2021. For each period marked by red boxes, the ratios gradually increase from around zero (blue grids) to white-to-reddish colored grids as it reaches the peak of the outbreak, before it decreases back to zero. In other words, ARI incidences roughly follow Gaussian-like curve centered around the peak time of the outbreak. While we inferred the first period to last as long as 14 months, the other three periods were consistent in about 11-12 months, suggesting an annual ARI outbreak.

Figure 1.

Figure 1

ARI outbreak occurs seasonally, identified from the heatmap data: The heatmap shows the ratio of ARI cases to the population size in 42 regions in Jakarta, with a median of 168 cases across regions. Due to scale limitation, the heatmap explicitly indexes only seven regions on the y-axis. We found four ARI outbreaks denoted by red boxes. For each period, all regions consistently experienced increased cases marked with white-to-red grids as they peaked, followed by decreased cases at the end of the outbreak period.

In addition, Fig. 1 shows that some regions constantly had more cases than others across periods, such as regions from Makasar Region downward, or a small cluster at the top of the diagram. Specific to the first three periods, these region cohorts experience an earlier increase of cases than the rest marked by whitish grids at the start of each period in Fig. 1. This result suggests that some regions that experienced the outbreak earlier potentially contributes to the infection of the neighboring regions. Though hierarchical clustering is able to identify the seasonality of ARI outbreaks, this method does not provide results on the explicit spread of the disease among regions.

The inferred onset time negatively correlates the load of cases across regions

A region is classified as infected if the cumulative number of cases exceeds the threshold value. In this case, we set the threshold of the outbreak as 20 cases per 100,000 population In other words, regions are called infected by ARI outbreak if the total cases recorded per period exceeds 0.02% of the total population. Per infected region, we record its onset as the first time of the occurrences of the outbreak. This concept of onset resembles that for individual cases in clinical data – onset time is defined as the first time a person develops disease symptoms. This measure of the outbreak is essential to infer the spatial behavior of the spread of ARI, with those having earlier onset potentially infecting neighboring regions.

In practice, we first calculated the cumulative version of the ARI incidences per period that generally resulted in S-curved trends across regions. We obtained the explicit formulas of cumulative case data using a generalized logistic curve or Richard’s curves. The onset time of the outbreak for each region was directly determined by identifying the intersection between the fitted curve and the threshold line. We rounded the estimated onset times to integers to match our study. Per period, the estimated onset time will be either 0 or positive integers, with an onset time 0 representing regions with the earliest outbreak onset. Figure 2 shows the snippet comparing the actual cumulative ARI incidences with the fitted logistic curves. For the six observed regions in Period 2, the logistic curves behave in the same manner as the actual data and allow us to estimate the onset time of the outbreak. Relative to the other five observed regions, Kelapa Gading and Pademangan Region took more time for the data to exceed the threshold, indicating a later outbreak onset than others. In contrast, though Cilincing Region has a higher threshold 85 total cases to reach the outbreak, the cumulative data rapidly increased and exceeded the threshold before the first two months from the 0 reference month. From only these six observed cases, we found a significant difference in the inferred outbreak onset, indicating some regions contribute to the case prevalence of others.

Figure 2.

Figure 2

The inferred onset time of the outbreak: the logistic curves well-fit the cumulative data of ARI incidences across regions with the snippets of the comparison shown for six regions in Period 2. The inferred onset time of the outbreak is the intersection between the fitted curve (orange line) and the actual data (blue dots). The threshold varies among regions depending on the population size–regions with higher population size require more cases to reach the outbreak status.

To enhance our study, we investigated the inferred onset time and found that it is negatively associated with the total load of cases. Figure 3 shows consistent decreases in total cases as the onset time increases across outbreak periods. Given the evidence that highly-infected regions start the outbreak earlier than others, one plausible scenario is that the overall infections start from the culprits and expand to other regions through human mobility. Since the culprits must be that with the earliest onset time, Fig. 3 suggests that culprits are those highly-infected regions. Though this analysis does not provide an exact spatial pattern of disease spread, it may indicate that culprits are the precise target of intervention to depress the overall spread of ARI disease in Jakarta. These inferred onset times were further leveraged in the next section to construct the forest-like graph of disease spread.

Figure 3.

Figure 3

Onset times are negatively associated with the load of cases across all periods: we constructed a linear model that links the inferred onset times with the total cases for Period 1-4 (A)–(C). To ease the inference of slopes and intercepts of the models, we standardized the onset time and total cases data depicted by blue scatters. The model suggests a negative association between the onset time and load of cases, which is captured by the model mean and its 95%-credibility interval.

The epidemic forest infers a robust spatial pattern of ARI spread across weight combination

By the concept of the epidemic forest, we leveraged the inferred onset times to construct the spatial pattern. The pattern is represented by tree-like graphs that depict how the disease spread within the set of regions. We defined that an infected region can infect others, with one that infects termed as a parent while those infected are termed as children. For each region, we look for the neighbouring areas and select one as the parent with the highest value of strength of linkage (SoL). Aside from the onset time, we proposed to consider at least three additional aspects to determine the parent of children regions: (1) spatial distance, (2) temporal distance, and (3) case prevalence, which are weighted by Ws,Wt and Wp, respectively. More about the detail of the construction of the epidemic forest is described in the Methods section. In this study, we tested eight combinations of the weights to assess how sensitive the implied graph is due to the changes in the combination of weights. Table 1 lists all combinations of the weights that we are interested in assessing.

Table 1.

The eight selected combinations of Ws,Wt and Wt weigh the spatial, temporal distance, and case prevalence to calculate the strength of linkage.

Combination of weights Spatial distance weight (Ws) Temporal distance weight (Wt) Case prevalence weight (Wp)
1 1/3 1/3 1/3
2 0.10 0.45 0.45
3 0.45 0.10 0.45
4 0.25 0.50 0.25
5 0.15 0.70 0.15
6 0.45 0.45 0.10
7 0.25 0.25 0.50
8 0.15 0.15 0.70

We assume that these weights sum up to one, with higher weights representing the higher effects of the particular aspect.

Special Region of Jakarta is a metropolitan region with a well-developed road network and transportation system so Ws for spatial aspects is selected to be non-dominant compared to two other weight values. In this regard, we choose the combinations that comply Ws<1/3. Figure 4 shows the epidemic forest for period 2 with combination 2. The size of each node at Fig. 4 indicates the case prevalence of the node, the bigger nodes mean a bigger number of case prevalence. The color of the node represents the onset time of the outbreak, i.e., the darker shade of blue indicates the earlier onset time and vice versa. From Fig. 4 we obtain eight trees, i.e T1, T2, ..., T8. The tree which is notated by T5 has the most children cases among others, which is called as dominant tree. We observe that the primary case at the dominant tree has a smaller size of node compared to the other tree, which shows that the spread size is not only affected by prevalence. The red edges in Fig. 4 indicate a set of edges that also appear in all combinations with Ws<1/3. The intersection set of edges in Fig. 4 will act as a reference in this process and is called E.

Figure 4.

Figure 4

Resulting epidemic forest for Period 2 with combination 2. The red edges are set of edges which always appear in all combinations with Ws<1/3 (All Data sets and Python code are available on GitHub (https://github.com/marliyehez/Epidemic-Forest) with the source of the map from33).

Next we will perform sensitivity analysis for temporal and prevalence aspects and we use period 2 for illustration. For temporal aspects, if the weight Wt is replaced to be non-dominant, for example in combination 3, E will always appear in the epidemic forest as shown by red edges in Fig. 5a. This result indicates that temporal aspects are not sensitive since their changes do not affect E. As for prevalence aspects, if Wp is changed to be non-dominant, for example in combination 6, E does not always appear in the resulting epidemic forest. This indicates there is sensitivity against the changes in Wp. The comparison of sensitivity analysis for both aspects is shown in Fig. 5.

Figure 5.

Figure 5

Comparison between resulting epidemic forest between non-dominant temporal aspect in (a) and non-dominant prevalence aspect in (b).

Based on sensitivity analysis, relatively low Wp can eliminates few edges that ought to be uphold since it appears in E as seen in 5b we obtain orange edges which do not appear in E. The selected Wt in combination 7 also does not affect the edges on the resulting epidemic forest in E. Hence, we have two combinations with relatively large Wp, that is combinations 7 and 8. We chose combination 7 for the weights in Period 2 considering the discrepancies between weights are not large. The search for the best weight combination for all periods is presented in Table 2 and the resulting epidemic forest for each period is presented in Fig. 6.

Table 2.

Selected weight combination of all three factors for SoL calculation at each period.

Period Ws Wt Wp
1 0.15 0.15 0.70
2 0.25 0.25 0.50
3 0.25 0.25 0.50
4 0.25 0.25 0.50

Figure 6.

Figure 6

Result of epidemic forest for all period.

Figure 6 shows that at each period, the dominant tree in the forest has at least eight from total 42 districts. Despite being the most dominant one, these trees do not always have a significantly greater number of child cases compared to other trees. For example, in Period 2 as shown in Fig. 6b, the dominant tree has eight child cases even though two other trees have seven child cases per tree. We observe that in periods 1 and 3, the primary cases of the dominant tree have the biggest prevalence among the other trees whereas in periods 2 and 4, the primary cases of the dominant tree do not have the biggest prevalence among others. This indicates the seasonality of the prevalence effects on ARI’s spread in the region. Table 3 provides the basic features of epidemic forest for all periods. We observed that there is no district being the consistent primary case at each period, specifically the dominant tree. Along four periods, each dominant tree has a different root. Period 3 has the most resulting trees and the most child cases at the dominant tree.

Table 3.

Basic features of the resulting epidemic forest.

Period Number of trees Primary case of the dominant tree Number of child cases at dominant tree
1 8 Palmerah (PAL) 8
2 8 Senen (SEN) 8
3 9 Grogol Petamburan (GP) 14
4 6 Gambir (GAM) 11

From the resulting epidemic forest, we calculated Rt for each period at each epidemic month with three spatial scale. Each period has a different onset time which depends on the selected tree. The calculation of Rt started from each onset time. Figure 7 shows three-wise Rt for the dominant tree and two selected non-dominant trees (NDT1 and NDT2) on each period. From Fig. 7 we observe that in general, Rt of the dominant tree in period 1 has a similar trend with period 3 and so as periods 2 and 4 have similar trends. It indicates that the dominant tree can illustrate the character of disease spread of each period and this supports the findings on seasonality prevalence effects on ARI’s spread.

Figure 7.

Figure 7

Tree-wise Rts for dominant tree and two selected non-dominant tree for each period.

Discussion

In Indonesia, ARI is considered a health burden particularly for children as its mortality and morbidity rate is relatively high34. Bacteria, viruses, and fungi have been considered as etiological agents for ARI35. The infections due to bacteria and viruses are transmittable through contact, airborne, droplet, vectors, and vehicular36,37. Many variables can affect the transmission of these routes such as environmental factors and crowds of people36. The spread of pathogen efficiently happens among humans, particularly through airborne transmission, and may cause an outbreak that has to be controlled36.

Studies of ARI’s incidence been conducted mainly focus on surveillance on children in developing countries due to its emergency and limited resource of health support34,35,38. The association between several factors such as meteorological factors3943, gender42, housing34,41,42,44, reliability of healthcare facilities45, and chronic diseases42, with incidence of ARI which is caused by virus also observed. The incidence was also studied in spatio-temporal aspects to help understand the association of incidence spatially4650. These studies can map the incidence spread in spatial aspects however, they are unable to demonstrate the spread of disease from one region to another. A better understanding of the disease spread between regions may help mitigate and control the outbreak. Few previous studies on ARI’s transmission between regions have been conducted. These studies mainly focus on the spread by human mobility through existing network transportation, that is city network14, airline network27, and both51.

Our research studies the spread of ARI by utilizing the concept of epidemic forest32 at a region-scale level. The previous study of epidemic forest which was modeling the disease spread at individual level32 can determine the primary cases and the size of spread with different parameter settings. The urgency of determining primary cases at the regional level arises to provide early warning information about the disease’s spread. The objective is to give awareness to health institutions, particularly the health institutions responsible for treating patients in the smallest region-scale, as an attempt to prevent an outbreak.

Aside from being a health burden, an outbreak of ARI can cause social and economic burden2. The government should prepare for the costs that arise from it, such as costs for medicine and vaccines. Early warning information given from the resulting epidemic forest can be beneficial for cost management by the government.

The epidemic forest presents a tree-graph model. A tree-graph model is simpler and easier to analyze than a network model. For instance, one possible multiple-to-multiple network model closest to our work presented by52. In52, more prior knowledge on the construction of the network itself is required, which most are not available in our case, i.e. human mobility. Whilst multiple-to-multiple network models seems to be more realistic, it requires more information in hand compared to epidemic forest model.

The epidemic forest at each period in Fig. 6 provides information on the consistent presence of disease in a certain area. We observe that almost at all periods, the outbreak began at West Jakarta which was marked by early onset time in the area. Based on this result, a suggestion can be made regarding outbreak prevention with priority in West Jakarta.

For illustrative purposes, we conducted a simulation in which we generated an epidemic forest with the condition that the number of cases at the primary case on the dominant tree at the respective period is reduced by 50%. This simulation can provide brief results on whether the prevention of primary cases, which is then represented by a reduction in case numbers, can control the outbreak or not. Table 4 shows the simulation result. We observe that in general, we have more trees compared to the original forest but the number of child cases of the dominant tree is reduced. A significant reduction in child cases of the dominant tree is shown in period 3. By the simulation settings, Tambora district becomes the primary case of the dominant tree in two periods. The resulting epidemic forest from the simulation at each period in Fig. 8 shows that the changing of epidemic forest occurs mainly within the area where the number of cases is reduced whereas another area shows the same trees as the original.

Table 4.

Basic features of resulting the epidemic forest from simulations.

Period Number of trees Primary case of the dominant tree Number of child cases at dominant tree
1 9 Johar Baru (JB) and Jatinegara (JAT) 4
2 8 Tambora (TAM) 10
3 11 Tambora (TAM) 4
4 6 Grogol Petamburan (GP) 9

Figure 8.

Figure 8

Result of the epidemic forest for all period from simulations.

Our study has several limitations. We summarize these limitations and future study plans as follows. First, in this study, it is assumed that a region can only be infected by a neighboring region. In reality, there is a possibility that a not-neighboring region can infect a region. We will develop a model that accommodates this possibility in our future research. Second, we set the same threshold value as a marker for the occurrence of an outbreak in each region, without considering the population density of it. Dynamic threshold values for each region are very important in determining a more accurate onset time in respective regions. In future research, we will develop a model with dynamic threshold values for each region. Third, in this study, we did not consider inter-regional connectivity and networks. In the mechanism of the spread of infectious diseases, these two things play an important role because human mobility is a means of transmission14,27,51. Aspects of inter-regional connectivity and networks will be one of the model developments in future research. Fourth, our model still uses simple geographic data, i.e., geographic coordinate data to build a network between regions.

Furthermore, we would like to highlight some important points in this paper concerning the tree-like model and the measurement method for determining the distance. In this study, we build the model with the assumption of a single-source infection. We model the spread of the disease between regions as a tree-like model, in which the source of infection is only from one region. To validate this assumption, one interesting future agenda is to collect data on the spread of a disease and do the explanatory data analysis to test our hypothesis. Findings from this approach should either strengthen our results or refine our model to capture the real-world phenomena of the disease spread spatially. On the other hand, identifying this primary source will be very useful for developing countries that may have limited budgets for disease control and management. For these countries, handling outbreaks in the primary source can be a priority so that necessary measures can be applied in the precise area to control the disease spread effectively. In this research, simple spatiotemporal information of the case in the regions is applied to the algorithm. We utilize the centroid-based method to measure the distance between regions for the spatial aspect. However, this method provides an adaptable framework that can incorporate additional data and techniques. For instance, information about human mobility between regions can be used to confirm and enhance the accuracy of the epidemic forest.

Methods

The study site

The Special Capital Region of Jakarta is located at 512S to 622S and 10623E to 10658E and covers total area of 664.01 square kilometers. The population density in this region is the highest in Indonesia reaching 15,978 people per square meter in 2021. There are 44 districts, 5 cities, and 1 regency under its jurisdiction. Based on the geographic location of the districts, the study site is shown in Fig. 9 using geopandas (v.0.13.2).

Figure 9.

Figure 9

The District of Special Capital Region of Jakarta.

Data collection

Monthly number of pneumonia and influenza infection cases at 42 districts and the number of population in the Special Capital Region of Jakarta are collected from 2017-2021. The data are divided into two sources of health facilities, i.e. hospitals and the district’s center of public health. Three other data which is used as inputs are the geographical coordinates, district adjacency, and district-level population. Geographic coordinates represent the central coordinates of each district, while district adjacency is represented in a symmetric matrix where the diagonal elements are 0 and the off-diagonal elements are either 0 or 1. A value of 1 in the adjacency matrix indicates that the corresponding districts in the row and column are adjacent, while 0 indicates they are not. The geographical coordinates used to build a network between regions. It should be sufficient given that Jakarta is roughly homogeneous with uniformly distributed placement of public transportation. The district adjacency is used to accommodate the assumption that a region is infected by only one neighboring region. Population data is obtained by processing the population density data at the sub-district level.

The population density data at the sub-district level is processed first to obtain district-level population data. For simplification purposes, the population count is assumed to be constant by taking the average value from the population count between 2017 and 2020 for each district. The data for the year 2021 is not included due to data limitations.

Infectious period identification

The period of occurrence of spikes in cases will be determined for each district. This is accomplished by examining the ratio of ARI cases to the population size. Hierarchical clustering will be applied using the average linkage method and the Euclidean distance metric. This approach aims to group the districts based on their similarities. The results will be presented in the form of a heat map. Visually, the periods with spikes in cases for each district will be identified. Among the three disease data sets, only the Pneumonia data from hospitals shows periods of spikes in cases for each district. Therefore, this data set will be used for the subsequent epidemic forest algorithm.

Onset of the outbreak

In epidemiology, the beginning of the infectious period of an infected individual is known as onset time32. In this research, for the region-scale level, the onset time is determined as the starting point of an outbreak in the particular region that can be identified as the beginning of the peak of cumulative cases with the assumption that there is exactly one single wave of the outbreak for all regions. The data of cases were fitted to the Richards model. The Richards model is shown as the following equation53,54:

C(t)=K[1+e-rμ(t-ti)]-1/μ 1

C(t) is cumulative number of cases at t. Parameters K and r are the total numbers of cases and the cumulative case number’s growth rate per capita, respectively. The cumulative curve deviation’s exponent is denoted by μ and the point of inflection on the x- axis is denoted by ti which shows the timing of changes in downturn or upturn in the growth rate of the cumulative cases number occurs. In this research, the onset of the outbreak for every city is determined by fitting the curve of cumulative cases to the Richards curve to obtain the value of ti.

Construction of region-scale epidemic forest

An epidemic tree is constructed through an algorithm that generated the possible source of infection of each infected case31. The tree consists of nodes and links. The cases are represented by nodes whereas each link represents the parent-child relationship in the infectious process, which is the parent infected the child. The candidates’ parent of a child case are all cases that occur within a certain distance from a child case and are infected prior to the child case. Each child case is assumed to have only one parent. The root of the epidemic tree is called the primary case, which is the first case that arises at the beginning of the spread of disease. The other nodes of the tree are the descendants of the primary case and are called secondary cases. The epidemic forest is built when there are several primary cases32.

A child case may have several candidates for its parent. A strength of linkage between these cases is evaluated to eliminate the less-possible candidates. The strength of linkage is calculated based on the spatio-temporal distance between the child case and each of the candidates. In this study, the prevalence of each region is also involved in the calculation. Spatio-temporal distance and prevalence are considered as measures of the possibility of transmission between regions. The transmission may likely to occur when the spatio-temporal distance is smaller and the prevalence between regions is higher. As the three measurements for strength of linkage have different units, we use the scaling method to make them dimensionless so the integration can be done. The equation of the spatio-temporal distance between child case and j-th candidate parent are given as follows:

D~sj=DsjDsmin 2
D~tj=DtjDtmin 3

Dsj and Dtj are spatial distance and time difference between child case and j-th candidate respectively, whereas Dsmin and Dtmin denotes the minimum value of spatial distance and time difference between child case and all candidates of parent case respectively. The spatial distance represents the distance between centroids of each region. The utilization of centroids in determining distances between regions was previously carried out by55 to determine the distance between individuals in different cities and56 which constructed epidemic trees. As for prevalence is calculated for j-th candidate parent by:

Pj~=Pj-Pcargmaxj{|Pj-Pc|} 4

Pj and Pc is prevalence of j-th candidate parent and prevalence of child case respectively.

In this research, the concept of the epidemic forest is used to understand the spread of infectious disease at a region-scale. We introduce the algorithm of the construction of region-scale epidemic forest in Algorithm 1.

Algorithm 1 The algorithm flow of region-scale epidemic forest

Input: Onset time, geographical coordinates

Output: Epidemic forest

Steps:

  1. Choose arbitrary case and consider it as child case.

  • (2)

    Given the network, determine all neighbouring cases to the child case, namely Candidate 1.

  • (3)

    Separate all cases in Candidate 1 which comply with the rule of parent’s onset time, i.e. the onset time of parent must be prior to the child. These cases are called Candidate 2.

  • (4)
    Evaluate Strength of Linkage (SoL) value of each case j in Candidate 2 relative to the child case. The formula of SoL is given as follows:
    SoL(D~sj,D~tj,Pj~)=Ws·1D~sj+Wt·1D~tj+(1-Ws-Wt)Pj~ 5
    D~sj, D~tj and Pj~ as shown in equation (24), W~s and W~t are the weight assigned to spatial and temporal component respectively.
  • (5)

    Choose a case with the greatest value of SoL as a parent for the child.

  • (6)

    Repeat steps (1) to (5) for all child cases.

  • (7)

    A tree is formed when all child cases have found their parents.

  • (8)

    Determine the primary case, i.e., the case that is the initial source of the spread of the disease. In a formed tree, the primary case is the root of the tree.

  • (9)

    An epidemic tree is formed when we have more than one primary case.

Acknowledgements

This research is funded by Riset P2MI 2024 of Industrial and Financial Mathematics Research Group, Department of Mathematics, Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung.

Author contributions

Conceptualization, N.N., Y.N.N, and K.K.S; methodology, Y.N.N, K.K.S. and N.N.; software, M.Y.S and K.K.S.; validation, N.N., M.A and N.S.; formal analysis, Y.N.N, N.N., and K.K.S.; investigation, M.Y.S, Y.N.N., N.N. and K.K.S.; resources, N.S; writing—original draft preparation, Y.N.N, M.Y.S., N.N. and K.K.S.; writing—review and editing, Y.N.N, M.Y.S., M.A., and N.N.; visualization, M.Y.S andK.K.S.; supervision, N.N. and M.A; data, N.S. All authors have read and agreed to the published version of the manuscript.

Data availibility

The data that support the findings of this study are available from Jakarta Health Office, but restrictions apply to the availability of these data, which were used under license for the current research, and so are not publicly available. Data are, however, available from the authors upon reasonable request and with permission of the Jakarta Health Office.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Langer S, Horn J, Gottschick C, Klee B, et al. Symptom burden and factors associated with acute respiratory infections in the first two years of life-results from the loewenkids cohort. Microorganisms. 2022 doi: 10.3390/microorganisms10010111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Zhang S, Wahi-Singh P, Wahi-Singh B, et al. Costs of management of acute respiratory infections in older adults: A systematic review and meta-analysis. J. Global Health. 2022 doi: 10.7189/jogh.12.04096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Childs A, Zullo AR, Joyce NR, et al. The burden of respiratory infections among older adults in long-term care: A systematic review. BMC Geriatrics. 2019 doi: 10.1186/s12877-019-1236-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Gessner BD. Acute lower respiratory infection in the developing world. Expert Rev. Respir. Med. 2011 doi: 10.1586/ers.11.47. [DOI] [PubMed] [Google Scholar]
  • 5.Collaborators GDI. Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: A systematic analysis for the global burden of diseases study 2019. Global Health Metrics. 2020 doi: 10.1016/S0140-6736(20)30925-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Simoes E, Cherian T, Chow J, et al. et al. Chapter 25: Acute respiratory infections in children. In: Jamison D, Breman J, Measham A, et al.et al., editors. Disease Control Priorities in Developing Countries. 2. Oxford University Press; 2006. [Google Scholar]
  • 7.IHME. Gbd data visualizations (2023). Accessed on 2023-06-09.
  • 8.Chow EJ, Uyeki TM, Chu HY. The effects of the covid-19 pandemic on community respiratory virus activity. Nat. Rev. Microbiol. 2023 doi: 10.1038/s41579-022-00807-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Kim HM, et al. Impact of coronavirus disease 2019 on respiratory surveillance and explanation of high detection rate of human rhinovirus during the pandemic in the republic of korea. Influenza Other Respir. Viruses. 2021 doi: 10.1111/irv.12894. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Park KY, Seo S, Han J, Park JY. Covid-19 pandemic: An epidemiological analysis of the effectiveness of pandemic-related public health measures in reducing seasonal respiratory viruses test positivity. PLoS ONE. 2021 doi: 10.1371/journal.pone.0253451. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ali S, Lau Y, Shan S, et al. Prediction of upcoming global infection burden of influenza seasons after relaxation of public health and social measures during the covid-19 pandemic: A modelling study. Lancet Global Health. 2022 doi: 10.1016/S2214-109X(22)00358-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Boudewijns B, Paget J, Riccio MD, Coudeville L, Crépey P. Preparing for the upcoming 2022/23 influenza season: A modelling study of the susceptible population in Australia, France, Germany, Italy, Spain and the United Kingdom. Influenza Other Respir. Viruses. 2022 doi: 10.1111/irv.13091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Madaniyazi L, Seposo X, Ng CF, et al. Respiratory syncytial virus outbreaks are predicted after the covid-19 pandemic in Tokyo, Japan. Jpn. J. Infect. Dis. 2022 doi: 10.7883/yoken.jjid.2021.312. [DOI] [PubMed] [Google Scholar]
  • 14.Guo Z, Li J, Xiao G, Gong L, Wang Y. Dynamic model of respiratory infectious disease transmission by population mobility based on city network. R. Soc. Open Sci. 2022 doi: 10.1098/rsos.221232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Hasan A, et al. Superspreading in early transmissions of Covid-19 in Indonesia. Sci. Rep. 2020 doi: 10.1038/s41598-020-79352-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Nuraini N, Khairudin K, Apri M. Modeling simulation of Covid-19 in Indonesia based on early endemic data. Commun. Biomath. Sci. 2020 doi: 10.5614/cbms.2020.3.1.1. [DOI] [Google Scholar]
  • 17.Aini W, Sukandar KK, Nuraini N, Handayani D. The impact of mass exodus on the resurgence of Covid-19 cases: Case study of regions in Indonesia. Front. Appl. Math. Stat. 2022 doi: 10.3389/fams.2022.912150. [DOI] [Google Scholar]
  • 18.Hasan A, Nasution Y. A compartmental epidemic model incorporating probable cases to model Covid-19 outbreak in regions with limited testing capacity. ISA Trans. 2022 doi: 10.1016/j.isatra.2021.01.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Fuady A, Nuraini N, Sukandar KK, Lestari BW. Targeted vaccine allocation could increase the Covid-19 vaccine benefits amidst its lack of availability: A mathematical modeling study in Indonesia. Vaccines. 2021 doi: 10.3390/vaccines9050462. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Hasan A, Putri E, Susanto H, Nuraini N. Data-driven modeling and forecasting of Covid-19 outbreak for public policy making. ISA Trans. 2022;124:135–143. doi: 10.1016/j.isatra.2021.01.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Nuraini N, et al. Infectious disease modeling with socio-viral behavioral aspects-lessons learned from the spread of SARS-CoV-2 in a university. Trop. Med. Infect. Dis. 2022 doi: 10.3390/tropicalmed7100289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Oyedotun TDT, Moonsammy S. Spatiotemporal variation of Covid-19 and its spread in South America: A rapid assessment. Ann. Am. Assoc. Geograph. 2020;111(6):1868–1879. [Google Scholar]
  • 23.Rahmad MR, Islam AHMH, Islam MN. Geospatial modelling on the spread and dynamics of 154 day outbreak of the novel coronavirus (Covid-19) pandemic in bangladesh towards vulnerability zoning and management approaches. Model. Earth Syst. Environ. 2021;7:2059–2087. doi: 10.1007/s40808-020-00962-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Moazeni M, Maracy MR, Dehdashti B, Ebrahimi A. Spatiotemporal analysis of Covid-19, air pollution, climate, and meteorological conditions in a metropolitan region of iran. Environ. Sci. Pollut. Res. 2022;29:24911–24924. doi: 10.1007/s11356-021-17535-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Spassiani I, Sebastiani G, Palú G. Spatiotemporal analysis of Covid-19 incidence data. Viruses. 2021 doi: 10.3390/v13030463. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Jiang P, et al. Spatial-temporal potential exposure risk analytics and urban sustainability impacts related to Covid-19 mitigation: A perspective from car mobility behaviour. J. Clean. Prod. 2021 doi: 10.1016/j.jclepro.2020.123673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Kenah E, Chao DL, Matrajt L, Halloran ME, Longini IM., Jr The global transmission and control of influenza. PLoS ONE. 2011 doi: 10.1371/journal.pone.0019515. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Wang R, Wu H, Wu Y, Zheng J, Li Y. Improving influenza surveillance based on multi-granurality deep spatiotemporal neural network. Comput. Biol. Med. 2021 doi: 10.1016/j.compbiomed.2021.104482. [DOI] [PubMed] [Google Scholar]
  • 29.Bowen JT, Jr, Laroe C. Airline networks and the international diffusion of severe acute respiratory syndrome (SARS) Geograph. J. 2006 doi: 10.1111/j.1475-4959.2006.00196.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Gao J, et al. Stan: Spatio-temporal attention network for pandemic prediction using real-world evidence. J. Am. Med. Inf. Assoc. 2021;28:733–743. doi: 10.1093/jamia/ocaa322. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Haydon DT, et al. The construction and analysis of epidemic trees with reference to the 2001 UK foot-and-mouth outbreak. Proc. R. Soc. Lond. B. 2003;270:121–127. doi: 10.1098/rspb.2002.2191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Li M, et al. Epidemic forest: A spatiotemporal model for communicable diseases. Ann. Am. Assoc. Geograph. 2019;109:812–836. doi: 10.1080/24694452.2018.1511413. [DOI] [Google Scholar]
  • 33.Abdillah, S. Geojson for districts (Kecamatan) list in Jakarta. https://github.com/SakifAbdillah/jakartaKecamatanGeoJSON (2016).
  • 34.Lutpiatina L, et al. Multilevel analysis of lifestyle and household environment for toddlers with symptoms of acute respiratory infection (ARI) in Indonesia in 2007, 2012, and 2017. Global Pedriatic Health. 2022;9:1–13. doi: 10.1177/2333794X221078700. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Bhuyan GS, Hossain MA, Sarker SK, et al. Bacterial and viral pathogen spectra of acute respiratory infections in under-5 children in hospital settings in Dhaka city. PLoS ONE. 2017 doi: 10.1371/journal.pone.0174488. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Kutter JS, Spronken MI, Fraaij PL, Frouchier RA, Herfst S. Transmission routes of respiratory viruses among humans. Curr. Opin. Virol. 2018;28:142–151. doi: 10.1016/j.coviro.2018.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Doron, S. & Gorbach, S. Bacterial infections: Overview. In: (International Encyclopedia of Public Health, Academic press, UK, pp. 273–282 (2008).
  • 38.Assane D, Makhtar C, Abdoulaye D, et al. Viral and bacterial etiologies of acute respiratory infections among children under 5years in Senegal. Microbiol. Insights. 2018;11:1–5. doi: 10.1177/1178636118758651. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Seah A, Loo LH, Jamali N, Maiwald M, Aik J. The influence of air quality and meteorological variations on influenza a and b virus infections in a Paediatric population in singapore. Environ. Res. 2023 doi: 10.1016/j.envres.2022.114453. [DOI] [PubMed] [Google Scholar]
  • 40.Xu B, Wang J, Li Z, et al. Seasonal association between viral causes of hospitalised acute lower respiratory infections and meteorological factors in China: A retrospective study. Lancet Planet. Health. 2021 doi: 10.1016/S2542-5196(20)30297-7. [DOI] [PubMed] [Google Scholar]
  • 41.Murray E, Klein M, Brondi L, et al. Rainfall, household crowding, and acute respiratory infections in the tropics. Epidemiol. Infect. 2012;140:78–86. doi: 10.1017/s0950268811000252. [DOI] [PubMed] [Google Scholar]
  • 42.Chen Y, Williams E, Kirk M. Risk factors for acute respiratory infection in the Australian community. PLoS ONE. 2014 doi: 10.1371/journal.pone.0101440. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Sundell N, Andersson L-M, Brittain-Long R, Lindh M, Westion J. A four year seasonal survey of the relationship between outdoor climate and epidemiology of viral respiratory tract infections in a temperate climate. J. Clin. Virol. 2016 doi: 10.1016/j.jcv.2016.10.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Wimasalena NN, Chang-Richards A, Wang KI-K, Dirks KN. Housing risk factors associated with respiratory disease: A systematic review. Int. J. Environ. Res. Public Health. 2021 doi: 10.3390/ijerph18062815. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Tang J, Li Y, Eames I, Chan P, Ridgway G. Factors involved in the aerosol transmission of infection and control of ventilation in healthcare premises. J. Hosp. Infect. 2006 doi: 10.1016/j.jhin.2006.05.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Mao Y, He R, Zhu B, Liu J, Zhang N. Notifiable respiratory infectious diseases in China: A spatial-temporal epidemiology analysis. Int. J. Environ. Res. Public Health. 2020 doi: 10.3390/ijerph17072301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Ho NT, Thompson C, Nhan LNT, et al. Retrospective analysis assessing the spatial and temporal distribution of paediatric acute respiratory tract infections in Ho Chi Minh city, Vietnam. BMJ Open. 2018 doi: 10.1136/bmjopen-2017-016349. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Fatima M, Khattak RM, Grady SC. Spatial and temporal analysis of acute respiratory infections (Aris) in southern Punjab, Pakistan. Spatial Inf. Res. 2022 doi: 10.1007/s41324-022-00447-4. [DOI] [Google Scholar]
  • 49.Balasubramani K, Prasad KA, Kodali NK. Spatial epidemiology of acute respiratory infections in children under 5 years and associated risk factors in India: District-level analysis of health, household, and environmental datasets. Front. Public Health. 2022 doi: 10.3389/fpubh.2022.906248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Fetene MT, Fenta HM, Tesfaw LM. Spatial heterogeneities in acute lower respiratory infections prevalence and determinants across Ethiopian administrative zones. J. Big Data. 2022 doi: 10.1186/s40537-022-00618-y. [DOI] [Google Scholar]
  • 51.Charu V, Zeger S, Gog J. Human mobility and the spatial transmission of influenza in the United States. PLoS Comput. Biol. 2017 doi: 10.1371/journal.pcbi.1005382. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Kim M, Paini D, Jurdak R. Modeling stochastic processes in disease spread across a heterogenous social system. Proc. Natl. Acad. Sci. 2019;116(2):401–406. doi: 10.1073/pnas.1801429116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Wang X-S, Wu J, Yang Y. Richards model revisited: Validation by and application to infection dynamics. J. Theor. Biol. 2012;313:12–19. doi: 10.1016/j.jtbi.2012.07.024. [DOI] [PubMed] [Google Scholar]
  • 54.Fauzi IS, Nuraini N, Ayu RWS, Lestari BW. Temporal trend and spatial clustering of the dengue fever prevalence in west java, Indonesia. Heliyon. 2022 doi: 10.1016/j.heliyon.2022.e10350. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Ward C, Brown GD, Oleson JJ. An individual level infectious disease model in the presence of uncertainty from multiple, imperfect diagnostic tests. Biometrics. 2021 doi: 10.1111/biom.13579. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Eggo RM, Cauchemez S, Ferguson NM. Spatial dynamics of theinfluenza pandemic in England, wales and the United States. J. R. Soc. Interface. 1918 doi: 10.1098/rsif.2010.0216. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data that support the findings of this study are available from Jakarta Health Office, but restrictions apply to the availability of these data, which were used under license for the current research, and so are not publicly available. Data are, however, available from the authors upon reasonable request and with permission of the Jakarta Health Office.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES