Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2021 Jan 28;124:182–190. doi: 10.1016/j.isatra.2021.01.050

Analysis and prediction of COVID-19 epidemic in South Africa

Wei Ding a,b, Qing-Guo Wang c,, Jin-Xi Zhang d
PMCID: PMC7842146  PMID: 33551132

Abstract

The coronavirus disease-2019 (COVID-19) has been spreading rapidly in South Africa (SA) since its first case on 5 March 2020. In total, 674,339 confirmed cases and 16,734 mortality cases were reported by 30 September 2020, and this pandemic has made severe impacts on economy and life. In this paper, analysis and long-term prediction of the epidemic dynamics of SA are made, which could assist the government and public in assessing the past Infection Prevention and Control Measures and designing the future ones to contain the epidemic more effectively. A Susceptible–Infectious–Recovered model is adopted to analyse epidemic dynamics. The model parameters are estimated over different phases with the SA data. They indicate variations in the transmissibility of COVID-19 under different phases and thus reveal weakness of the past Infection Prevention and Control Measures in SA. The model also shows that transient behaviours of the daily growth rate and the cumulative removal rate exhibit periodic oscillations. Such dynamics indicates that the underlying signals are not stationary and conventional linear and nonlinear models would fail for long-term prediction. Therefore, a large class of mappings with rich functions and operations is chosen as the model class and the evolutionary algorithm is utilized to obtain the optimal model for long term prediction. The resulting models on the daily growth rate, the cumulative removal rate and the cumulative mortality rate predict that the peak and inflection point will occur on November 4, 2020 and October 15, 2020, respectively; the virus shall cease spreading on April 28, 2021; and the ultimate numbers of the COVID-19 cases and mortality cases will be 785,529 and 17,072, respectively. The approach is also benchmarked against other methods and shows better accuracy of long-term prediction.

Keywords: COVID-19, Epidemic situation analysis, Epidemic forecasting, Evolution algorithm, South Africa

1. Introduction

The first case of COVID-19 was reported in Wuhan, China in December 2019. Then, COVID-19 spread nearly all over the world rapidly. In eight months, more than 16 million people of 213 countries were infected, where 645k people lost their lives unfortunately. This indicates the strong human transmission and some distinguishing biological features of COVID-19 with respect to other epidemics. The effective Infection Prevention and Control Measures (IPCMs) are urgently needed. To this end, modelling this epidemic is necessary. The dynamical behaviour of the COVID-19 spreading was analysed [1], [2], [3], [4], [5], [6], [7], [8], which focused on the cases in China [1], [2], [3], Japan [4], South Korea [5], Iran [6], Italy [7] and India [8]. The effectiveness of IPCMs was evaluated [9], [10], [11], [12]. Among these, the effectiveness of the quarantine of Wuhan was assessed by calculating the contact rate of latent individuals with the SEIR model [9]. The conclusion is that the quarantine and isolation effectively reduced the potential peak number of COVID-19 infections and successfully delayed the date of peak infection. Similarly, the impact of the disease control measures in Wuhan was studied [10], with the non-constant transmission rates with a modified SEIR model. In addition, two simple approaches to data analysis were adopted to evaluate the influence of the intervention measures [11], [12]. Specifically, the second derivative of the function of the cumulatively diagnosed cases was calculated [11] to show the effect of the massive interventions in China, and a stochastic model that predicts the cumulative number of the laboratory-confirmed patients was introduced [12] to simulate the evolution process of the epidemic under intervention measures. It is noted that their estimation of the transmission parameter was made under many assumptions on the model of epidemiology, e.g., the number of exposed cases in the incubation period. Further, the asymptomatic and infected cases of incubation result in inaccuracy in the reported daily number of confirmed cases. Therefore, the aforementioned approaches to evaluating the epidemic situation are over simplified and not accurate, as shown by the recent data of the epidemic.

Since the COVID-19 continues to spread around the world, it is necessary to model the dynamics of COVID-19 to predict its future trend. The existing epidemic models can be divided into two categories, i.e., the first-principle model [13], [14], [15], [16], [17] and the data-driven model [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29]. The first-principle model is able to clearly show how and why an input has an effect on the output. Building such a model necessitates some specific knowledge that is however difficult to acquire. For example, to predict the status of one person via an epidemic model in a network, we have to know the statuses of those who have contacted him/her, and determine the probability with it the person is infected by them. In addition, the interventions from the human, e.g., precautions from individuals, isolation of suspect cases, and development of ascertainment infections, need to be explicitly specified in advance. Otherwise, the prediction may be far away from the true case [20].

The data-driven modelling is sometimes preferable, which builds the relationship between the system inputs and outputs without explicit domain knowledge. An exponential model was obtained with the number of the daily cumulative cases at the early phase of the outbreak in China and gives the basic reproduction number [21]. Similarly, another data-driven model was developed [22], which is matched with the mean and standard deviations of the number of the reported daily cumulative cases on the Diamond Princess cruise ship with a gamma distribution and gives also the basic reproduction number. The end time and the total numbers of the infectious cases and the mortality cases of COVID-19 in China were predicted by different types of data-driven models, i.e., the logistic model, the Bertalanffy model and the Gompertz model [23]. The social media search indices (SMSIs) were taken into consideration, which were fitted by the data of the confirmed cases via a model of subset selection [24]. In Castorina [25], a generalized Gompertz law was found to predict the maximum number of the infected individuals in China, Singapore, South Korea and Italy. In Li [26], the Gaussian distribution theory was utilized to analyse and predict the transmission of COVID-19. Besides, prediction algorithms were also provided [27], [28] based on machine learning. Among them, the epidemic trend of COVID-19 in India was predicted by a model that is trained by the data of China [27]. The risk category of the country was assessed [28] by shallow long short-term memory (LSTM) networks.

In South Africa (SA), the first case was confirmed on March 5, 2020. After that, COVID-19 has been rapidly spreading throughout SA. At present, the number of the cumulative infectious cases (CICs) still keeps increasing. In total, 674,339 confirmed cases and 16734 mortality cases in total were reported by 30 September 2020. This pandemic has thus given South Africans with huge health risks. To refrain it, a series of domestic containment measures have been carried out by the SA government. These however cause other social impacts. For example, the Gross Domestic Product (GDP) of SA is expected to sink by 7.1% this year, based on the World Bank. To our best knowledge, the studies of the COVID-19 epidemic in SA are rarely seen in the literature.

This paper presents analysis and long-term prediction of the epidemic dynamics of SA, which could assist the government and public in assessing the past IPCMs and designing the future ones to contain the epidemic more effectively and contribute the global study of the virus as the unique case of African people in the world population. A Susceptible–Infectious–Recovered model is adopted to analyse epidemic dynamics. The model parameters are estimated over different phases with the SA data. They indicate variations in the transmissibility of COVID-19 under different phases and thus reveal weakness of the past IPCMs in SA. For long-term prediction, it is noted that the data-driven modelling is well developed with wide applications, but its success depends on prior knowledge of the system to be modelled, which enables selection of the model structures and use of big data. However, COVID-19 is a new type of epidemic with high transmissibility and unknown pathogenicity with no past experience and data, Our analysis model shows that transient behaviours of the daily growth rate (DGR) and the cumulative removal rate (CRR) exhibit persistent rise mixed with periodic oscillations. Such dynamics indicates that the underlying signals are not stationary and conventional linear and nonlinear models would fail for long term prediction. Therefore, a large class of mappings with rich functions and operations is chosen as the model class and the evolutionary algorithm is utilized to obtain the optimal model for long term prediction. The resulting models on the DGR, the CRR and the cumulative mortality rate (CMR) predict that the peak and inflection point will occur on 4 November 2020 and 15 October 2020, respectively; the virus shall cease spreading on April 28, 2021; and the ultimate numbers of the COVID-19 cases and mortality cases will be 785,529 and 17,072, respectively. The approach is also benchmarked against other methods and shows better accuracy of long-term prediction.

The rest of the paper is organized as follows. Section 2 introduces SA with the epidemic and data descriptions. The epidemic analysis and long-term prediction are presented in Sections 3, 4 , respectively. The conclusions are drawn in Section 5.

2. South Africa and COVID-19 epidemic

SA is located in the southernmost region of Africa, with a long coastline that stretches more than 2500 km along the South Atlantic and the Indian Oceans. With a total area of 1,221,037 km2, SA is the 24th largest country in the world. The interior of SA consists of a vast, in most places almost flat, plateau with an altitude of between 1000 m and 2100 m, with a generally temperate climate. It is to the north by the neighbouring countries of Namibia, Botswana, and Zimbabwe and to the east and northeast by Mozambique and Eswatini, and surrounds the enclaved country of Lesotho [30].

According to the Worldometer elaboration of the latest United Nations data in 2020, the population of SA is estimated at 59,308,690, which ranks 25th in the world. SA is a nation of diverse origins, cultures, languages, and religions, with 79.2% of Black Africans, 8.9% of Whites, 8.9% of Colours, 2.5% of Asians, and 0.5% of unspecified people [31].

SA is a developing country with a mixed economy. In 2019, its GDP was worth 350 billion US dollars, ranking 42th in the world. It has been being burdened by a relatively high rate of crime, poverty, and unemployment, and is also ranked in the top ten countries in the world for income inequality. In 2015, 71% of net wealth were held by 10 percent richest of the population, whereas 60% of the poorest held only 7% of the net wealth with the Gini coefficient of 0.63 [32].

The health system of SA comprises the public sector and the private sector. The public health services are divided into primary, secondary and tertiary through health facilities that are located in and managed by the provincial departments of health. The health care system of SA owns more than 400 public hospitals and 200 private hospitals, and consumes about 8.8% of the GDP in this country. Nonetheless, the vacancy rates for doctors and nurses are estimated at 56% and 46%, respectively. Moreover, 84% of the population depends on the public healthcare system, which is the preferred government health provision within a primary health care approach. However, only 21% of doctors work in it [33]. In addition, SA has an estimated seven million people living with HIV, more than any other country in the world [34]. Thus, the health care in SA is beset with chronic human resource shortages and limited resources.

The COVID-19 spread to nearly all the countries after it broke out in Wuhan, China in December 2019. The first known patient of COVID-19 in SA was confirmed, a 38-year-old male citizen infected during travel to Italy, on March 5, 2020. After that, a series of IPCMs were introduced in succession by the National Institute for Communicable Diseases (NICD) of SA, e.g., contacts tracing and isolation by the Emergency Operating Centre (EOC), travel restrictions since 18 March, 2020, closure of schools and universities since 18 March, 2020, and a 500 billion rand stimulus spending plan. Besides, to prevent and control COVID-19 in SA, a national-wide lockdown was conducted by the government on March 26, 2020. It was initially with the most restrictive level of 5, and then relaxed to the level of 4 on May 1, 2020; for example, the railway station was reopen. The lockdown was further relaxed to the level of 3 on June 1, 2020, until now. Nonetheless, COVID-19 spreads throughout SA rapidly, and the number of daily new case (DNC) creates a record high of 13,944 on July 24, 2020, which further weakens the health care system of SA. As a result of the long-term lockdown, a series of economic and social problems were brought, such as an increase in domestic violence and intimate partner violence, the geopolitical dysfunctions. Especially, the exchange rate between Rand and Dollar was depreciated at an all-time low on April 5, 2020. The effective IPCMs are urgently needed. For reference, an exact model contributes to revealing the epidemic features and forecasting its trend, e.g., the inflection point, the peak, and the final size of infectious and death.

Our study is on SA and uses the COVID-19 data from Worldometer [35]. Specifically, the data consists of the CICs, the active cases (ACs) and the cumulative mortality cases (CMCs) and covers the days from March 5, 2020 to September 20, 2020. The data is divided into four sections according to the level of lockdown as shown in Table 1. The total population in 2020 is 59,308,690 [36]. The cases are plotted in Fig. 1, which shows the epidemic becomes increasingly severe in SA. Our study compare SA with China. The data for China is from the National Health Committee of China [37]. It covers the days from January 22, 2020 to April 18, 2020, including a 62-day period of lockdown from January 24, 2020 to March 25, 2020. The data is divided into four sections as well and is shown in Table 2. Note that the length of each phase is the same as that of SA. Since the COVID-19 spread in China was mainly confined in Hubei province before April 2020, the population size of Hubei province is used and 59,270,000 from the Institute of National Statistics of China (INSC) [38].

Table 1.

Transmission coefficients of COVID-19 in SA.

Lockdown level Time period βˆ R0ˆ Drop rate
0 2020.03.05–2020.03.26 0.3993 5.5901
5 2020.03.27–2020.04.30 0.2231 3.124 44.1155%
4 2020.05.01–2020.05.31 0.1854 2.5958 16.9078%
3 2020.06.01–2020.06.20 0.1726 2.4167 6.8996%

Fig. 1.

Fig. 1

Incidence data of SA.

Table 2.

Transmission coefficients of COVID-19 in China.

State Time period βˆ R0ˆ Drop rate
Non-lockdown 2019.12.17–2020.1.23 0.2448 3.4266
Lockdown 2020.01.24–2020.02.27 0.2289 3.2041 6.4933%
Lockdown 2020.02.28–2020.03.29 0.1629 2.2809 28.813%
Lockdown 2020.03.30–2020.04.18 0.1325 1.8545 18.6944%

3. Epidemic analysis

Let x(i) be the number of CICs at the ith day, i=1,2,,N, and define the number of DNCs and DGR as d(i)=x(i)x(i1), X(i)=(x(i)x(i1))/x(i1), i=2,3,,N, respectively. Let y(i) be the number of the cumulative cured cases at the ith day, i=1,2,,N, and define the cumulative cure rate (CCR) as Y(i)=y(i)/x(i), i=1,2,,N. Let z(i) be the number of the CMCs at the ith day, i=1,2,,N, and define the CMR as Z(i)=z(i)/x(i), i=1,2,,N. Let W(i)=y(i)+z(i), i=1,2,,N, be the number of cumulative removed cases (CRCs) at the ith day, i=1,2,,N, and define the CRR as w(i)=W(i)/x(i), i=1,2,,N. Let I(i)=x(i)y(i)z(i), i=1,2,,N, be the number of ACs at the ith day, i=1,2,,N.

With the SA and China data, we calculate their DGR, CCR and CMR. To show the effect of lockdown, these rates from the lockdown date are plotted in Fig. 2, Obviously, during the level-5 lockdown, the DGR and CCR were decreasing and increasing, respectively. The mortality rate keeps low and stable. However, with the level of lockdown relaxed to 4 on May 1, 2020, the DGR presents a tendency of small-amplitude oscillation. Fortunately, the CCR trends up. For comparison, consider China data. Assume that the COVID-19 outbreaks in China on December 17, 2019 [39]. It is observed that the DGR of China fell to 0.05453% on the 44th day after lockdown and went zero in end, whereas, the DGR of SA has been keeping oscillating. Moreover, the CCR in China is much higher than that in SA in the middle and later stages. Specifically, on the 86th day of lockdown, the CCR in China was 93.1432%, yet only 54.3002% in SA. Besides, the CMR in China is slightly higher than that in SA basically.

Fig. 2.

Fig. 2

COVID-19 trends in SA and China.

Now we analyse the epidemic spreading of COVID-19 with modelling. The popular Susceptible–Infectious–Removed (SIR) model [40] is adopted in this paper. Consider a closed set of population, that is, the population in a given region does not change over the time horizon of study. Denote P and S as the numbers of total population and susceptible population, respectively. The SIR model consists of the following three equations:

dS(t)dt=βI(t)S(t)P, (1)
dI(t)dt=βI(t)S(t)PγI(t), (2)
dW(t)dt=γI(t), (3)

where β denotes the effective contact rate, and γ represents the removal rate that is the inverse of the expectation of infection duration for COVID-19. Here, the reason for choosing γ as 114 is given as follows. On the one hand, the WHO indicates that the recovery time of people with mild symptoms for COVID-19 is about two weeks [41]. On the other hand, the mild case (including the asymptomatic case) accounts for 96.79%99.49% of the total infectious cases in SA [42].

In the initial phase of the epidemic, the infectious population accounts for a small fraction of the total population, and thus SP. Substituting S=P in (2) yields

dI(t)dt=(βγ)I(t), (4)

whose solution is given by

I(t)=I(0)×e(βγ)t. (5)

β is estimated by the least square method as

βˆ=minβt=n1n2Iˆ(t)I(t)2, (6)

where Iˆ(t) is prediction from (5), I(t) is the recorded number; n1 and n2 respectively denote the first and last days in a phase, e.g., during the lockdown of level 4 in SA. β in SIR indicates the transmission rate of an epidemic.

To measure the capacity of epidemic spreading, the basic reproduction number, R0, which denotes the average number of secondary infections produced by an infected host in a completely susceptible population [43], is introduced as follows

R0=βγ. (7)

To obtain R0 for COVID-19 in different phases, the incidence data of SA is divided into four sections according to the levels of the lockdown. The initial conditions in (1)(3) are set based on the population of SA as S(0)=59,308,689, I(0)=1 and W(0)=0, respectively. βˆ is obtained based on (6). R0 is calculated from (7) as R0ˆ=βˆγ. By using the package of scipy.optimize.curve_fit in Python, the result is given in Table 1. It indicates that although R0ˆ is decreasing during lockdown in SA, its drop rate, (R0ˆ(Tk1)R0ˆ(Tk))/R0ˆ(Tk1), is also gradually decreasing, where Tk denotes the time period of the level-k lockdown.

The same analysis is carried out on China case, where S(0)= 59,269,999, I(0)=1 and W(0)=0. βˆ and R0ˆ are given in Table 2, which shows that the drop rate of R0ˆ in China is higher than that in SA in the middle and later periods of lockdown.

Note that R0 is obtained under the assumption that everyone is susceptible. If only a part of people is the susceptible host, the effective reproduction number, R(t), is defined [44] as

R(t)=S(t)P(t)×R0. (8)

Note that S(t) is unknown. It follows [44] that

d(t)=eγ×(R(t)1)×d(t1). (9)

This is the AR(1) model. d(t) is a series of observations and available. Thus, it is desirable to make a robust estimation of R(t) with d(t), for which the Bayesian estimation is probably best. The prior distributions for R(t) and d(t) are assumed and the posterior distribution for the autoregressive parameter R(t) is then calculated by Bayes theorem. The mean of R(t) is obtained on such a distribution and taken as the estimate for R(t). By successive applications of this at each t with a rolling window [44], a recursive estimation scheme, which uses the observations up to t, is constructed using the posterior distribution for R(t), as the prior in the next estimation step at time t+1, leading to an update scheme. The resulting probability distribution for R(t) includes information on all observations up to time t, and contrasts with the “instantaneous” R(t) used in (9), which only considers the data at t and t1. Thus, it is a robust estimator of the effective reproduction number assumed to be constant for the whole epidemic up to time t. Any changes in R(t) over time result from the assimilation of each new data point, leading to an updated estimate of R(t).

Applying the above approach to our data, R(t) is plotted in Fig. 3, which shows that the epidemic of COVID-19 in SA is not stable and that R(t) is with a trend of slightly growing in the middle and later periods of the lockdown. However, Fig. 4 shows R(t)<1 after the 34th day of lockdown, which means that COVID-19 in China is under control and will be extinguished. Therefore, the IPCMs in China work better than that in SA.

Fig. 3.

Fig. 3

Estimation of Re for SA.

Fig. 4.

Fig. 4

Estimation of Re for China.

It is found from above analysis that although R0 in SA decreased over the time sections, it still reaches up to 2.4167. This means COVID-19 is still prevailing in SA. The values of DGR and R(t) show upward trends with oscillations in the middle and later periods of lockdown, indicating that a large number of virus carriers, e.g., latent patients and asymptomatic carriers, fail to be traced and that the speed of early detection is not high. It is found by comparison that China is with a higher CCR and obtains better effects of IPCMs. Therefore, while the lockdown in SA has positive effects on suppression of COVID-19, it is still not fully under control and with high risks of a rebound in the middle and later periods. This means that the containment measures should be enhanced; for example all close contacts should be promptly traced and effectively quarantined.

The SIR model was also applied to long-term epidemic prediction in China and South Korea [2], [17], [20]. The recent data shows, however, that the prediction of the approaches [2], [20] is not accurate. We attempt to predict the epidemic trend in SA. The recent data in SA, however, reveals that the prediction accuracy is not satisfactory with SIR to be shown in Section 4.

4. Forecasting

The long-term forecasting of COVID-19 in many countries has been well studied. Among these, the work for China has received considerable attention, including the exponential model [21], the logistic model [23] and the Gompertz model [25]. To our best knowledge, however, no work on the long-term forecasting of the COVID-19 in SA is reported to date in the literature.

Notably, the epidemic curves of COVID-19 in China and SA exhibit quite different features. Specifically, the epidemic in China is more stable with a convergent tendency. However, the same does not hold for SA. Therefore, it is more challenging in forecasting the epidemic trend of COVID-19 in SA. Through the analysis of the incidence data in SA from March 5, 2020, to August 31, 2020, it is found that the DGR, which is related to the DNCs and CICs, shows a tendency of periodic oscillation. In addition, the CRR, which is the sum of CMR and CCR, tends to rise, while the CMR basically keeps flat. As seen, the epidemic dynamics of DGR, CRR and CMR are nonlinear and different from each other. The evolution algorithm [45] has the great capability of learning the unknown dynamics of a nonlinear coupled system and no need to specify the model structure a priori, which is however required by the existing long-term forecasting methods, thus it is adopted to train models for DGR, CRR and CMR for epidemic prediction in SA.

graphic file with name fx1001_lrg.jpg

The flowchart of modelling is shown in Fig. 5 and explained as follows. Given a time series, u(i), i=1,2,,N, we want to build a model, f(i), to make the prediction, uˆ(i)=f(i), as close to u(i) as possible. The mapping, f, is taken from a set of functions with free parameters. The optimal one, f, is found by maximizing a chosen fitness function. A function set, Ωf, is chosen [46], [47] as

Ωf={+,,,/,,,sin(),cos(),tan(),exp(),log(),factorial(),(),log(),gauss(),tanh(),floor(),ceil(),round(),abs(),sinh(),cosh(),asin(),acos(),atan(),atan2,asinh(),acosh(),atanh()}.

Algorithm 1 outlines the overall process of modelling. The initial generation of population, f0={f10,f20,,fq0}, is randomly created. By iteratively performing genetic operators, i.e., selection, crossover and mutation, a series of new generations of population, fk, k=1,2,, is produced. The optimal one, f, is obtained [48] as

f=maxfk{r2},

where

r2=1i=m1m2u(i)fk(i)2u(i)u¯(i)2,

where fk(i), i[m1,m2], is the predicted value on the ith day, and u¯(i)=1m2m1+1i=m1m2u(i) is the mean of u(i). Output u(i), i=m1,m1+1,,m2, is used to train the above model.

Fig. 5.

Fig. 5

Flowchart of modelling.

Now we apply the above model to the SA data. Consider DGR with m1=119 and m2=180 for training. By using the software of Eureqa [49], the optimal prediction model of DGR is obtained with r2=0.997 and

fX=0.96i119×exp(((1.7109718740005+0.915052571455619×tanh(0.000400767440371075×(i119)2)0.0120437904816557×(i119)0.0263440613969856×sin(6.03655745282247+0.899692679789361×(i119))×(1.57031206300012+0.0214894750987405×(i119))×atan2(i119.899692679789361,(1.7109718740005/(i119)0.5)))2)). (10)

This model is used to make prediction, i.e., substitute i=181,182,,200 to (10) to obtain X(i). It is depicted in Fig. 6. The CICs or DNCs in general gains more attention, because the long-term prediction of CICs or DNCs reveals when the epidemic stops spreading. CIC and DNC are found, respectively, from DGR as

x(i)=x(i1)×(1+X(i1)),i=181,182,,420,
d(i)=x(i)x(i1),i=181,182,,420.

They are exhibited in Fig. 7, Fig. 8, respectively. It is seen from Fig. 7 that CIC stops increasing on April 28, 2021, and the amount of the COVID-19 cases is expected to be around 785,529. It is seen from Fig. 8 that the peak of DNC is expected to be 13,944, occurring on July 24, 2020.

Fig. 6.

Fig. 6

Modelling of daily growth rate.

Fig. 7.

Fig. 7

Forecasting of cumulative infectious cases.

Fig. 8.

Fig. 8

Forecasting of daily new cases.

Consider now CRR. Our simulation gives r2=0.997 and the optimal prediction model as

fw=2π×atan(1.005i7×atan2(i7+3.58344312241981×cos(0.180642079027816×(i7)),60.8845419842163+0.13429637454712×(i7)+|i75.2844145431054|×cos(0.0523522020801796×(i7)+atan2(cos(i6.355998362951549),0.13429637454712×(i7))))2). (11)

This model is used to make prediction for w(i), which is depicted in Fig. 9.

Fig. 9.

Fig. 9

Modelling of cumulative removal rate.

Consider CMR with the same procedure as above. The optimal prediction model is found with r2=0.997 and

fZ=1|H(i)|,

where

H(i)=log(cosh(50.0179251655876+(0.0149857496976635×(i7))4.69976682036215+3.43657143039247×sin(4.93733980670225+0.139102002005918×(i7))+sin(0.300255825306365×(i7))+sinh(sin(exp((3.43657143039247×sin(4.93733980670225+0.139102002005918×(i7)))2)50.6215493185659×(i7))))). (12)

This model is used to make prediction for Z(i), which is depicted in Fig. 10. To predict when the epidemic ends, CRC and AC are calculated into the future, respectively, by

W(i)=w(i)×x(i),i=181,182,,
I(i)=x(i)W(i),i=181,182,.

They are plotted in Fig. 11. It is observed that the predicted AC would be less than 10,000 after August 18, 2021. We treat the rising inflection point as the point at which the curvature of the ACs changes of sign. It is predicted that the peak and inflection point of the number of the ACs will occur on November 4, 2020 and October 15, 2020, respectively. Besides, we predict the ultimate number of the CMCs caused by COVID-19 in SA via the forecasted CMR and CIC. The result is 17072.

Fig. 10.

Fig. 10

Forecasting of cumulative mortality rate.

Fig. 11.

Fig. 11

Forecasting of cumulative removal cases and active cases.

We compare our models with popular ones in the literature. To this end, the prediction models of the SIR [40] and the Logistic growth [23] are chosen. They predict CICs and ACs which are also obtained from our models. Denote by R the root mean square error, which together with the coefficient of determination, r2, is used to assess the prediction accuracy. Specifically, R1 and r1 correspond to the number of CICs, and R2 and r2 correspond to the number of ACs. The results are provided in Table 3. The predicted epidemic curves of CICs and ACs are plotted in Fig. 12, Fig. 13, respectively. It is seen that our approach is with a lower R and a higher r2, indicating our approach with a higher prediction accuracy.

Table 3.

Prediction on SA case.

Model R1 r12 R2 r22
SIR model 168605.7152 −310.0851 103922.2234 −954.8337
Logistic model 5446.8394 0.6753 71873.2667 −456.1937
Evolution model 1381.9596 0.9791 3337.2853 0.0143

Fig. 12.

Fig. 12

Prediction of cumulative infectious cases.

Fig. 13.

Fig. 13

Prediction of active cases.

5. Conclusions

In this paper, a Susceptible–Infectious–Recovered model is adopted to analyse epidemic dynamics. The model parameters are estimated over different phases with the SA data. They indicate variations in the transmissibility of COVID-19 under different phases and thus reveal weakness of the past IPCMs in SA. Furthermore, a novel model is developed to forecast the long-term epidemic trend of COVID-19 in SA. The model class is wide and the evolution algorithm learns the optimal one iteratively from a random initial model, which does not requires prior knowledge of the underlying system and data properties. The so-trained model shows: (1) the peak and inflection point would occur on November 4, 2020 and October 15, 2020, respectively; (2) the epidemic shall be basically under control in April 28, 2021; and (3) the ultimate number of the COVID-19 cases could be 785,529, in which there would be 17,072 people losing their lives. Using historical incidence data of SA, the experimental result illustrates the effectiveness of our approach, and the comparative experimental result shows a higher prediction accuracy of our approach than the others.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported in part by the National Research Foundation of South Africa under Grant Nos. 113340, 120106, in part by the Financial Support of UIC Start-up Fund, China under Grant No. R72021115, in part by the Natural Science Foundation of the Higher Education Institutions of Jiangsu Province of China under Grant No. 19KJB520019, in part by the Financial Support of Changshu Institute of Technology Start-up Fund, China under Grant No. XZ1734, in part by Natural Science Foundation of Jiangsu Province, China under Grant Nos. BK20181033, BK20191029, and in part by National Natural Science Foundation of China under Grant Nos. 61901062, 61903050, 62003057.

References

  • 1.Peng L, Yang W, Zhang D, Zhuge C, Hong L. Epidemic analysis of COVID-19 in China by dynamical modeling, medRxiv. 10.1101/2020.02.16.20023465. [DOI]
  • 2.Zhong L., Mu L., Li J., Wang J., Yin Z., Liu D. Early prediction of the 2019 novel coronavirus outbreak in the mainland china based on simple mathematical model. IEEE Access. 2020;8:51761–51769. doi: 10.1109/ACCESS.2020.2979599. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Wan H, Cui J, Yang G. Risk estimation and prediction by modeling the transmission of the novel coronavirus (COVID-19) in mainland China excluding Hubei province, , medRxiv. 10.1101/2020.03.01.20029629. [DOI] [PMC free article] [PubMed]
  • 4.Kuniya T. Prediction of the epidemic peak of coronavirus disease in Japan, 2020. J Clin Med. 2020;9(3):789. doi: 10.3390/jcm9030789. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Shim E., Tariq A., Choi W., Lee Y., Chowell G. Transmission potential and severity of COVID-19 in South Korea. Int J Infect Dis. 2020;93:339–344. doi: 10.1016/j.ijid.2020.03.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Tuite A.R., Bogoch I.I., Sherbo R., Watts A., Fisman D., Khan K. Estimation of coronavirus disease 2019 (COVID-19) burden and potential for international dissemination of infection from Iran. Ann Internal Med. 2020;172(10):699–701. doi: 10.7326/M20-0696. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Remuzzi A., Remuzzi G. COVID-19 and Italy: what next? Lancet. 2020;395(10231):1225–1228. doi: 10.1016/S0140-6736(20)30627-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Shah PV. Prediction of the peak, effect of intervention, and total infected by COVID-19 in India. Disaster Med Publ Health Prep. 10.1017/dmp.2020.321. [DOI] [PMC free article] [PubMed]
  • 9.Hou C., Chen J., Zhou Y., Hua L., Yuan J., He S., et al. The effectiveness of quarantine of Wuhan city against the Corona Virus Disease 2019 (COVID-19): A well-mixed SEIR model analysis. J Med Virol. 2020;92(7):841–848. doi: 10.1002/jmv.25827. [DOI] [PubMed] [Google Scholar]
  • 10.Yang C., Wang J. A mathematical model for the novel coronavirus epidemic in Wuhan, China. Math Biosci Eng. 2020;17(3):2708–2724. doi: 10.3934/mbe.2020148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Chen X, Yu B. First two months of the 2019 Coronavirus Disease (COVID-19) epidemic in China: real-time surveillance and evaluation with a second derivative model. Global Health Res Policy. 10.1186/s41256-020-00137-4. [DOI] [PMC free article] [PubMed]
  • 12.Guo Z, Xiao D. Analysis and prediction of the 2019 novel coronavirus pneumonia epidemic in China based on an individual-based model. 10.21203/rs.3.rs-25853/v1. [DOI]
  • 13.Kim AS-K. AAEDM: Theoretical dynamic epidemic diffusion model and Covid-19 Korea pandemic cases, medRxiv. 10.1101/2020.03.17.20037838. [DOI]
  • 14.Zhang Z., Wang H., Wang C., Fang H. Modeling epidemics spreading on social contact networks. IEEE Trans Emerg Top Comput. 2015;3(3):410–419. doi: 10.1109/TETC.2015.2398353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Wong W.W., Feng Z.Z., Thein H.-H. A parallel sliding region algorithm to make agent-based modeling possible for a large-scale simulation: modeling hepatitis C epidemics in Canada. IEEE J Biomed Health Inf. 2015;20(6):1538–1544. doi: 10.1109/JBHI.2015.2471804. [DOI] [PubMed] [Google Scholar]
  • 16.Fagnani F., Zino L. Time to extinction for the SIS epidemic model: new bounds on the tail probabilities. IEEE Trans Netw Sci Eng. 2017;6(1):74–81. [Google Scholar]
  • 17.Batista M. Estimation of the final size of the coronavirus epidemic by the SIR model. ResearchGate. 10.1101/2020.02.16.20023606. [DOI]
  • 18.Zhang D., Xu Z., Wang Q.-G., Zhao Y.-B. Leader–follower H consensus of linear multi-agent systems with aperiodic sampling and switching connected topologies. ISA Trans. 2017;68:150–159. doi: 10.1016/j.isatra.2017.01.001. [DOI] [PubMed] [Google Scholar]
  • 19.Pal R., Sekh A.A., Kar S., Prasad D.K. Neural network based country wise risk prediction of COVID-19. Appl Sci. 2020;10(18) [Google Scholar]
  • 20.Nesteruk I. Estimations of the coronavirus epidemic dynamics in South Korea with the use of SIR model. ResearchGate. 10.13140/RG.2.2.15489.40807. [DOI]
  • 21.Zhao S., Lin Q., Ran J., Musa S.S., Yang G., Wang W., et al. Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak. Int J Infect Dis. 2020;92:214–217. doi: 10.1016/j.ijid.2020.01.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Zhang S., Diao M., Yu W., Pei L., Lin Z., Chen D. Estimation of the reproductive number of novel coronavirus (COVID-19) and the probable outbreak size on the Diamond Princess cruise ship: A data-driven analysis. Int J Infect Dis. 2020;93:201–204. doi: 10.1016/j.ijid.2020.02.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Jia L., Li K., Jiang Y., Guo X., Zhao T. Prediction and analysis of coronavirus disease 2019. Popul Evol. 2020 arXiv:2003.05447. [Google Scholar]
  • 24.Qin L., Sun Q., Wang Y., Wu K.-F., Chen M., Shia B.-C., et al. Prediction of number of cases of 2019 novel coronavirus (COVID-19) using social media search index. Int J Environ Res Publ Health. 2020;17(7):2365. doi: 10.3390/ijerph17072365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Castorina P., Iorio A., Lanteri D. Data analysis on coronavirus spreading by macroscopic growth laws. Internat J Modern Phys C. 2020;31(07) [Google Scholar]
  • 26.Li L., Yang Z., Dang Z., Meng C., Huang J., Meng H., et al. Propagation analysis and prediction of the COVID-19. Infect Dis Model. 2020;5:282–292. doi: 10.1016/j.idm.2020.03.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Tiwari S, Kumar S, Guleria K. Outbreak trends of coronavirus disease–2019 in India: A prediction. Disaster Med Publ Health Prep. 10.1017/dmp.2020.115. [DOI] [PMC free article] [PubMed]
  • 28.Pal R., Sekh A.A., Kar S., Prasad D.K. Neural network based country wise risk prediction of COVID-19. Appl Sci. 2020;10(18):6448. doi: 10.3390/app10186448. [DOI] [Google Scholar]
  • 29.Zhang D., Shi P., Wang Q.-G., Yu L. Analysis and synthesis of networked control systems: A survey of recent advances and challenges. ISA Trans. 2017;66:376–392. doi: 10.1016/j.isatra.2016.09.026. [DOI] [PubMed] [Google Scholar]
  • 30.Arnold G. Lesotho: Year in review 1996–Britannica online encyclopedia. Encyclopedia Britannica. 2011;30 [Google Scholar]
  • 31.Grundy K.W. South Africa: Time running out. The report of the study commission on U.S. policy toward Southern Africa. Afr Aff. 1982;81(325):595–596. [Google Scholar]
  • 32.Pandy W.R., Rogerson C.M. New directions in South African tourism geographies. Springer; 2020. Tourism industry perspectives on climate change in South Africa; pp. 93–111. [Google Scholar]
  • 33.Mayosi B.M., Benatar S.R. Health and health care in South Africa–20 years after mandela. N Engl J Med. 2014;371(14):1344–1353. doi: 10.1056/NEJMsr1405012. [DOI] [PubMed] [Google Scholar]
  • 34.Connolly C., Colvin M., Shishana O., Stoker D. Epidemiology of HIV in South Africa-results of a national, community-based survey. South Afr Med J. 2004;94(9) [PubMed] [Google Scholar]
  • 35.Worldometers. COVID-19 Coronavirus pandemic. https://www.worldometers.info/coronavirus/#countries.
  • 36.Worldometers. South Africa population, https://www.worldometers.info/world-population/south-africa-population/.
  • 37.National Health Committee of China. COVID-19 Coronavirus pandemic, http://www.nhc.gov.cn/.
  • 38.Institute of National Statistics of China. Statistical communique of the Hubei Province on the 2019 national economic and social development, http://tjj.hubei.gov.cn/tjsj/tjgb/ndtjgb/qstjgb/202003/t20200323_2188487.shtml.
  • 39.Ma J. 2020. Coronavirus: China’s first confirmed Covid-19 case traced back to november 17. https://www.scmp.com/news/china/society/article/3074991/coronavirus-chinas-first-confirmed-covid-19-case-traced-back. [Google Scholar]
  • 40.Satsuma J., Willox R., Ramani A., Grammaticos B., Carstea A. Extending the SIR epidemic model. Physica A. 2004;336(3):369–375. [Google Scholar]
  • 41.WHO Director J. 2020. General’s opening remarks at the media briefing on COVID-19-24 February 2020. https://www.who.int/dg/speeches/detail/who-director-general-s-opening-remarks-at-the-media-briefing-on-covid-19---24-february-2020. [Google Scholar]
  • 42.Reddy KP, Shebl FM, Foote JHA, Harling G, Scott JA, Panella C et al. Cost-effectiveness of public health strategies for COVID-19 epidemic control in South Africa: a microsimulation modelling study, medRxiv, 10.1101/2020.06.29.20140111. [DOI] [PMC free article] [PubMed]
  • 43.Jones J.H. Notes on R0. California: Dep Anthropol Sci. 2007;323:1–19. [Google Scholar]
  • 44.Bettencourt L.M., Ribeiro R.M. Real time bayesian estimation of the epidemic potential of emerging infectious diseases. PLoS One. 2008;3(5) doi: 10.1371/journal.pone.0002185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Schmidt M., Lipson H. Distilling free-form natural laws from experimental data. Science. 2009;324(5923):81–85. doi: 10.1126/science.1165893. [DOI] [PubMed] [Google Scholar]
  • 46.Koza J.R., Keane M.A., Streeter M.J., Mydlowec W., Yu J., Lanza G. Springer Science & Business Media; 2006. Genetic programming IV: Routine human-competitive machine intelligence, vol. 5. [Google Scholar]
  • 47.Li H., Yang X., Li Y., Hao L.-Y., Zhang T.-L. Evolutionary extreme learning machine with sparse cost matrix for imbalanced learning. ISA Trans. 2020;100:198–209. doi: 10.1016/j.isatra.2019.11.020. [DOI] [PubMed] [Google Scholar]
  • 48.Nakagawa S., Johnson P.C., Schielzeth H. The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded. J R Soc Interface. 2017;14(134) doi: 10.1098/rsif.2017.0213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Dubčáková R. Eureqa: software review. Genet Program Evol Mach. 2011;12(2):173–178. [Google Scholar]

Articles from ISA Transactions are provided here courtesy of Elsevier

RESOURCES