Abstract
Background and aims
COVID-19, which started as an epidemic from China in November 2019, was first reported to WHO in December 2019. It had spread to almost all countries globally by March 2020. The pandemic severely affected health and economy globally, prompting countries to take drastic measures to combat the virus. This study aims to analyze different governments' responses to the pandemic to gain insights on how best to fight the Coronavirus.
Methodology
Various data analysis operations like clustering and bivariate analysis were carried out using Python, Pandas, Scikit-Learn, and Matplotlib to clean up, consolidate, and visualize data. Insights were drawn from the analysis conducted.
Results
We identified that the mortality rate/case fatality rate is directly proportional to the percentage of elderly (people above 65 years of age) for the top thirty countries by cases. Countries in Western Europe showed the highest mortality rates, whereas countries in South Asia and the Middle East showed the lowest mortality rate (controlling for all other variables).
Conclusion
Lockdowns are effective in curbing the spread of the virus. A higher amount of testing resulted in a lesser spreading of the virus and better control. In most regions, countries that were conducting a large number of tests also seemed to have lower mortality rates.
Keywords: Case fatality rate, COVID-19, Lockdowns, Mortality rate
Highlights
-
•
Presents the COVID-19 pandemic in a country and government response.
-
•
Identified percentage of elderly as a significant factor in the mortality rate in a country.
-
•
Established the spread (cases) and severity (percentage mortality) of the pandemic.
-
•
Presented a case for implementing standardized testing and data reporting protocols.
1. Introduction
In December 2019, doctors in The Hubei province of China started reporting cases of a new type of viral disease that they found hard to treat. The disease seemed to have originated in the city of Wuhan. Within a few weeks, the disease spread like wild fire across many provinces in China, prompting the authorities to effectuate a complete shutdown of economic, commercial, social, and cultural activities, to quell the spread of the virus.1 Within the span of 1–2 months, the virus had made landfall in Europe and America, spreading rapidly with the help of air travel and unrestricted movement of people across open borders (in Europe). By mid-March, the virus had landed in almost all of the UN member states and was declared a pandemic. Countries most affected by it had enacted some form of a lockdown or the other to stop its spread. Efforts were made to enforce social distancing in the early stages to various degrees in different countries to stop the growth rate from exponential. However, thus far, only a handful of countries have had success in curbing the virus. As of June 1, 2020: 6.22 million cumulative cases have been reported globally, with deaths and recoveries numbering at 370 thousand and 2.7 million, respectively.
This study aims to establish a basis for the causal relationship between the severity of the pandemic in a country and the government's handling. If such a relationship can be established, it can give governments vital insights that can be used to enact effective policy and legislation against the pandemic.
2. Scope of the study
This study will focus on drawing insights from publicly available data and statistics on the Coronavirus. The data that will be considered for the study are aggregated COVID-19 patient statistics like daily cases, deaths, recoveries, testing data, etc. Python has been used for carrying out the analysis. The study's depth will be limited to some exploratory data analysis, data analysis for correlation and cause-and-effect relationships, bivariate analysis, data visualization, and some k-Means clustering. Data from the worst affected countries (by cases) are considered in this analysis.
3. Methodology
The data used in the analysis conducted were obtained from publicly available, and the government reported statistics on COVID-19 patients in their countries. The primary dataset used is sourced from the John Hopkins University Center for System Science and Engineering (CSSE). The dataset consisted of aggregated data from various sources such as The WHO, European Center for Disease Prevention and Control, The United States CDC, and various other governmental and non-governmental organizations.
The raw form data consisted of total cases segregated at city and district levels for every country. The cases were grouped by country in the analysis.
Link to the data source (GitHub): https://github.com/CSSEGISandData/COVID-19.
Testing data was collected from ourworldindata.org.
Link to the data source (ourworldindata.org): https://ourworldindata.org/grapher/number-of-covid-19-tests-per-confirmed-case.
The analysis is based on data up to June 1, 2020.
Statistical analyses were carried out on the data using well documented and practised methods. The software used is ubiquitous, open-source, and unquestioned in their accuracy and mode of operation. The data's sample size was big enough to consider the data to represent the population, and as such, generalizations could be made for the whole population using the insights obtained from analyzing the data.
Countries were separated into groups based on the percentage of their elderly population, and the COVID-19 mortality rate (total deaths/total cases).
The method is used to divide the countries into groups is the k-Means clustering method, based on the Elkan algorithm.2 The K-Means method aims to reduce intra cluster variance while maximizing inter-cluster variance.
Feature scaling was not needed as both the features were on a similar scale from the start, and both features were given equal priority in the clustering. The number of clusters and number of dimensions/features did not warrant giving any aforethought to considerations of computational costs and complexity. The initial centroids were selected at random from the dataset.
Five clusters are identified as they seemed to stratify the data in the most convenient and explanatory manner, without sacrificing too much in terms of the sum of the squared distance of all the clusters (loss/error function).
The analysis was carried out on the Jupyter Notebooks platform, using the Python programming language.
Python libraries Used:
-
•
Pandas - Used for storing data; Pandas is a data storage, handling, and manipulation package used with Python.
-
•
Matplotlib - Charts and graphs were generated using the pyplot library from Matplotlib, a popular data visualization package for Python.
-
•
Scikit-Learn - K-Means clustering was done with the help of the sklearn. Cluster library from scikit-learn, a popular machine learning, and data preprocessing package used with Python.
Link to the code for this analysis: https://github.com/aymanimtyaz/COVID-19.
4. Analysis and observations
Government response to the Coronavirus pandemic can be divided into two parts:
-
•
Efforts in curtailing the spread of the virus (i.e., flattening the curve)
-
•
Efforts in the handling and treatment of COVID-19 positive patients
One metric to gauge the efficacy of government response in handling COVID-19 positive patients is the mortality rate/case fatality rate. The mortality rate is the total number of deaths attributed to the virus divided by the total number of COVID-19 positive cases. Analysis of early cases in China led to the observation that the virus poses a more considerable danger to the elderly and people with some underlying comorbidities such as hypertension, diabetes, and heart disease.1,3 This trend continued as the virus spread around the world.
4.1. Analysis of each group
It has been established that the Coronavirus poses an enormous amount of danger to the elderly. More than 50% of the deaths attributed to the virus are in elderly patients, considering a worldwide average. As we can see from the chart below, there seems to be a linear relationship between the mortality rate and the percentage of older people (people above the age of 65) in a country.
The variance in the chart may/can be attributed to other factors, such as handling of COVID-19positive patients, methods of data collection and reporting, other population demographics like genetic makeup, trends in disease, disabilities, and malnutrition, competency, scale, and accessibility of the country's medical apparatus, economic status of the country (GDP, PPP, poverty levels, etc.).
The countries on the chart in Fig. 1 have been clustered into five groups using the k-Means clustering algorithm. We shall examine each group below in Fig. 1.
4.2. Group 1
The cluster towards the upper right-hand corner of the chart is the one with the countries having the highest percentage mortality and the highest percentage of older people compared to the other clusters apart from the cluster containing Germany and Portugal. The prime reason for the high mortality is evident from the chart itself, a more significant number (by percentage) of older people in these countries. All these countries belong to Western Europe, except Sweden. However, Sweden's mortality rate has been increased owing to other reasons which we shall discuss further (See Fig. 2).
More than half of Belgium's COVID-19 deaths are in care homes for older people. Belgium comes third in place in Europe for the number of people in old-age homes per 1000. It, coupled with Belgium having among the highest percentages of people above 65, may have increased the rate.
One point to note is how Belgium counts deaths due to the virus. A significant percentage of the counted deaths have not been tested positive for the virus. Almost all of the people, in this case, resided in old age homes. The justification given for counting them in the deaths is that if there is even one confirmed case in an old age home, and if a significant amount of people die in a short period close to the diagnosis of the confirmed case, showing similar symptoms. There is a high probability that those people also died due to the virus. This method may have resulted in a small number of false positives, which may have wrongly increased the rate.
Unlike other countries in the top cases list, Sweden did not implement a lockdown. It merely encouraged its citizens to stay indoors. Public places like restaurants, bars, businesses, schools, and universities were allowed to remain open. It may have contributed to the increased mortality rate; however, how much the decision against a lockdown influenced the mortality rate remains to be seen.
Apart from their decision to not implement a lockdown, a large proportion of Sweden's elderly also resides in nursing homes, just like Belgium. Unlike Belgium, however. Sweden only attributes deaths to the virus after a positive test has been confirmed.
4.3. Group 2
This cluster consists of the United States, Canada, Switzerland, Germany, and Portugal (See Fig. 3).
In this group, Germany and Portugal seem to be doing very well concerning the percentage of elderly in their population.
Up until late April, Germany had a case fatality rate of 1%–2%. This has been attributed to the amount of testing the Germans had been carrying out, unlike other European countries having similar age demographics. Germany was testing at a much higher rate. They were even testing young people with mild symptoms. The number of cases is directly proportional to the testing level, and as these two stats increased, the mortality rate started to drop.
Germans also have a large amount of trust in their government, which, throughout the pandemic, has maintained a very high level of transparency and communication with the public, giving updates to them on the daily. As such, social distancing norms given by the government were rarely broken by the German public.
Portugal's low mortality rate is accredited because they started responding to the pandemic well before it spiralled out of control. Portugal declared a state of medical emergency when they had a few 100 or so cases, compared to Spain, who declared an emergency when the growth had already gone exponential, and they had around 6000 cases.
Portugal is also unique because, unlike other European countries, it only has one land neighbour through which inter-country road-based transmission of the disease was possible. It had also managed to isolate more than 90% of the cases to 2 of its cities, Lisbon and Porto. People with mild symptoms were instructed to stay at home, while series cases were admitted in hospitals.
The United States currently leads the charts in cases in deaths by a wide margin. Lockdowns were imposed on varying levels across the country, and different states have handled the pandemic differently. The situation was also highly politicized, with different media outlets giving a different spin to how the situation is. Much misinformation is being spread, resulting in sections of the public flouting social distancing norms.
4.4. Group 3
This group consists of the Latin American countries of Peru, Brazil, Ecuador, and Mexico. Along with Iran, China, and Turkey. The clustering algorithm seems to have clustered these countries together based on the elderly's percentage, as the intra-cluster variance in mortality rate is very high (See Fig. 4).
Mexico warrants a little discussion here. It has an unusually high case fatality rate of 11%. One of the prime reasons for this is that Mexico has one of the lowest testing rates globally, at around 1.5 tests per confirmed case. This means that they are not testing enough—the WHO recommends a testing rate of 25–30 tests per confirmed case4 for most countries. Since the testing rate is so low, that the mortality rate gets inflated, low testing rates may result in improper handling of the spread of the virus.5,6 More the number of people tested, more the number of positive cases isolated, and a lesser amount of untested, COVID-19positive people who can go around spreading the virus.
4.5. Group 4
The cluster towards the lower-left corner consists of those countries, which show a low mortality rate and have the lowest percentage of the elderly among all the countries. The cluster can be further divided into three groups. The first group consists of Saudi Arabia, Qatar, and the United Arab Emirates, which are the GCC's three foremost countries. These countries have large immigrant worker populations that mostly consist of young males who reside in large dormitories, much like Singapore. These countries also have many monetary resources that they can utilize in treating COVID-19 positive patients with a high standard of care. As such, they all boast mortality rates of less than 1% (See Fig. 5).
The second group consists of the countries in the Indian subcontinent: India, Pakistan, and Bangladesh. They have mortality rates between 1% and 3%, in the early stages of the epidemic. These countries had implemented among the strictest lockdowns in the world. India and Bangladesh have only just lifted their lockdowns. The lockdowns have had a definite impact on curtailing the cases' spread, as both India and Bangladesh have seen record increases in the number of new cases daily after the lockdowns were lifted.
The only remaining country in this group in South Africa has also implemented a lockdown a few weeks after detecting its first case. Much like other countries in Africa, South Africa also has a large proportion of young people who may help offset the mortality rate. However, health conditions like obesity, hypertension, etc. are prevalent there. According to some statistics, over half of all South Africans are considered to be overweight. The result of this comorbidity seems to be reflected in the age distribution of deaths, two-thirds of deaths due to the virus are in people below 65.
4.6. Group 5
This group consists of Singapore, Chile, Russia, and Belarus. About the percentage of elderly in these countries, the deaths seem to be less. If we follow the graph's linear trend, these countries should have a mortality rate of 4%–6%.
Singapore has among the lowest COVID-19 mortality rates in the first infected countries by cases. This can be attributed to the fact that over 90% of the confirmed cases are those of young migrant workers living in large, tightly packed dormitories where the virus's probability is high. An overwhelming majority of these workers show no to very mild symptoms, if at all.
Singapore is also one of the world's wealthiest countries, and as such, it can allocate a large number of resources towards combating the virus (See Fig. 6).
Russia has the third-highest number of cases globally; however, it reports one of the lowest mortality rates. The Russian government attributes the low mortality rate to the late emergence of the virus compared to Europe and North America, which gave it time to set up the infrastructure to handle the virus and gave it some precedent in what to do and what not to do. Russia also has a high per capita testing rate. Russia, Belarus, and Chile are accused of manipulating statistics related to the pandemic.
5. Major findings
This study identifies significant findings as:
-
•
European countries were found to have the highest case fatality rates, may be because of age demography and comorbidity
-
•
Variance in the chart can be explained as being a result of government response to the pandemic
-
•
Countries like Germany, Portugal, and Singapore seem to have implemented reasonable measures against the virus, as their mortality rates are lower than in other countries with similar age demographics
-
•
Countries like Mexico and Brazil need to increase their testing rate in terms of both per capita testing and several tests per positive case (See Fig. 7, Fig. 8)
6. Inference
This analysis makes the following major inferences:
-
•
A relationship exists between the case fatality rate and the percentage of elderly in a country
-
•
A high testing rate (tests per capita) and a test per confirmed case rate of 20–30 help reduce the virus's spread and reduce/give a more accurate value of the case fatality rate
-
•
Standardized testing and data collection protocols are needed across the globe for ensuring that the data being used in these kinds of analyses is worthwhile
7. Conclusion
This research infers specific issues which are given below. We can say that government response to the pandemic can affect the pandemic's severity in a country. Steps like enforcement of lockdowns and social distancing norms effectively curt the virus's spread, as we have seen in countries like India. Smaller countries with less distributed population centres and good travel infrastructure are mediums through which the virus can rapidly spread (see: Europe). Social distancing norms and lockdowns would have to be enforced with a higher stringency level to bring about any meaningful containment of the virus. Testing is of paramount importance when it comes to combating the virus. It is through testing that statistics related to the pandemic are obtained. Keeping this in mind, governments should allocate a more considerable amount of resources towards testing. The effect of other factors that may be related to the pandemic should be explored.
8. Limitations and future research implications
8.1. Limitations
This study has not covered any kind of forecasting. The topic of the study (COVID-19 pandemic) is currently an evolving situation. The insights drawn from this study may not apply down the road. The data being considered is publicly available; the government reported patient data from January 20, 2020 to June 1, 2020.
8.2. Drawbacks of conducting a bivariate analysis for a complex problem like COVID-19
A problem like COVID-19 cannot be modelled accurately in a bivariate system. Thus, such a complex problem is almost certainly dependent on a host of other factors, apart from the elderly's percentage in a country.
8.3. No international standards for data collection, aggregation, and reporting
Every country follows its protocols for reporting data and statistics related to the COVID-19 pandemic. Furthermore, the protocols may be different for subdivisions in the country. Different states, districts, jurisdictions, etc. may have different methods of counting. This problem gets exacerbated by developing countries where there are not any protocols for data counting and reporting at all. This lack of consistency in reporting protocols may result in inaccurate data, which imparts inaccuracy to the analyses that use that data.
8.4. Manipulated data
Russia, Belarus, Chile, China, etc. have been accused of manipulating their testing and patient data.
8.5. Future scope
Other factors that can affect the mortality rate, apart from %age elderly are:
-
•
Genetic makeup.
-
•
Trends in disease, disabilities, and malnutrition
-
•
Competency, scale, and accessibility of the country's medical apparatus
-
•
Vaccination history
-
•
The economic status of the country (GDP, PPP, poverty levels, etc.)
These factors should be taken into consideration in future analyses.
A more in-depth study of the effects of lockdowns has to be done. Lockdowns are harder to implement for poorer countries as their economy starts to shake. Studies must be done to determine if these cycles of lockdowns are a viable option in fighting the COVID-19 pandemic. Moreover, if so, how to time and size different cycles of lockdown.
Declaration of competing interest
None.
Contributor Information
Ayman Imtyaz, Email: aymanimtyaz@gmail.com.
Abid Haleem, Email: haleem.abid@gmail.com.
Mohd Javaid, Email: mjavaid@jmi.ac.in.
References
- 1.Lang Wang M.D., Wenbo He M.D., Xiaomei Yu M.D. Coronavirus Disease 2019 in elderly patients: characteristics and prognostic factors based on 4-week follow-up. J Infect. 2020 doi: 10.1016/j.jinf.2020.03.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Elkan C. Using the triangle inequality to accelerate k-means. International conference on machine learning (ICML)(2003)(ICML03-022) AAAIPress. 2003 [Google Scholar]
- 3.Hamade Mohammad Ali. COVID-19: How to Fight Disease Outbreaks with Data. World Economic Forum COVID Action Platform. https://www.weforum.org/agenda/2020/03/covid-19-how-to-fight-disease-outbreaks-with-data/
- 4.WHO . COVID-19-virtual Press Conference. 30 March 2020. Recommended testing rates.https://www.who.int/docs/default-source/coronaviruse/transcripts/who-audio-emergencies-coronavirus-press-conference-full-30mar2020.pdf [Google Scholar]
- 5.Gupta R., Misra A. Contentious issues and evolving concepts in the clinical presentation and management of patients with COVID-19 infection with reference to use of therapeutic and other drugs used in Co-morbid diseases (hypertension, diabetes, etc.) Diabetes & Metabolic Syndrome: Clin Res Rev. 2020;14(3):251–254. doi: 10.1016/j.dsx.2020.03.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Gupta R., Ghosh A., Singh A.K., Misra A. Clinical considerations for patients with diabetes in times of COVID-19 epidemic. Diabetes, Metab. Syndrome: Clin Res Rev. 2020;14(3):211–212. doi: 10.1016/j.dsx.2020.03.002. [DOI] [PMC free article] [PubMed] [Google Scholar]