Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2021 Jun 7;425:132968. doi: 10.1016/j.physd.2021.132968

Trends in COVID-19 prevalence and mortality: A year in review

Nick James a,⁎,1, Max Menzies b,1
PMCID: PMC8183049  PMID: 34121785

Abstract

This paper introduces new methods to study the changing dynamics of COVID-19 cases and deaths among the 50 worst-affected countries throughout 2020. First, we analyse the trajectories and turning points of rolling mortality rates to understand at which times the disease was most lethal. We demonstrate five characteristic classes of mortality rate trajectories and determine structural similarity in mortality trends over time. Next, we introduce a class of virulence matrices to study the evolution of COVID-19 cases and deaths on a global scale. Finally, we introduce three-way inconsistency analysis to determine anomalous countries with respect to three attributes: countries’ COVID-19 cases, deaths and human development indices. We demonstrate the most anomalous countries across these three measures are Pakistan, the United States and the United Arab Emirates.

Keywords: COVID-19, Time series analysis, Population dynamics, Nonlinear dynamics, Epidemiology

1. Introduction

2020 will be remembered as the year that the world first battled the COVID-19 pandemic. Almost 2 million people lost their lives, substantial restrictions on population movement and activities were imposed, and almost every country experienced an economic recession. During that year, treatments improved substantially [1], [2], [3], [4] and several vaccines were produced by the end of the year [5], [6]. However, the disease remains highly prevalent around the world as of the start of 2021, and measures to contain and reduce its transmission remain highly relevant for the reduction of casualties as well as economic and other social consequences [7], [8].

Throughout the year, government responses to the pandemic varied substantially, both over time and between countries. Early government responses included banning travel [9], the implementation of testing and contact tracing programmes [10], and lockdowns. Due to the economic consequences of lockdowns, many countries implemented them too late [11], [12] and lifted restrictions before cases had sufficiently reduced [13]. Such disparate responses to the virus led to great variability in case and death counts, creating different waves of the outbreak across many countries. Such later waves often exhibited higher case and death counts than the first [14], [15].

The response of the scientific community to COVID-19 has also been varied and multifaceted, producing research from many perspectives and disciplines. In addition to the aforementioned medical research [1], [2], [3], [4], [5], [6], mathematical approaches to model and analyse the virus and its impact have been broad. First, models based on existing statistical techniques, such as the Susceptible–Infected–Recovered (SIR) model and the basic reproductive ratio R0, have been proposed and systematically collated by researchers [16], [17]. These have been used for various purposes, including diagnosis and prognosis of COVID-19 patients, efficacy of medications, and vaccine development. Next, nonlinear dynamics researchers have proposed several sophisticated extensions to the classical predictive SIR model, including analytic techniques to find explicit solutions [18], [19], modifications to the SIR model with additional variables [20], [21], [22], [23], [24], incorporation of Hamiltonian dynamics [25] or network models [26] and a closer analysis of uncertainty in the SIR approach [27]. Other mathematical approaches to prediction and analysis include power-law models [28], [29], [30], distance analysis [31], [32], network models [33], [34], [35], [36], analyses of the dynamics of transmission and contact [37], [38], forecasting models [39], Bayesian methods [40], clustering [41], [42] and many others [43], [44], [45], [46].

We have a different motivation and approach relative to the aforementioned work. Rather than performing predictions on an individual country basis or comparing parameters among different countries (such as R0 or power-law exponents), we seek to reveal structural similarity in COVID-19 case, death and mortality time series across many countries of the world. Rather than predicting the future, which is always challenging due to unpredictable changes in government policy, we aim to be descriptive, revealing similarity and anomalous countries in outcomes. Indeed, close analysis of the case, death and mortality dynamics on a country-by-country basis is necessary to inform governments of the most successful strategies for reducing transmission of cases and progression to deaths. Identifying structural similarities between countries’ trajectories can support conclusions that certain government responses will likely result in better or worse outcomes. Moreover, identifying anomalous countries can provide insights on which responses to the pandemic were exceptionally good or poor. For this purpose, we present three sections, each of which contributes a new mathematical method for analysing the world’s COVID-19 cases, deaths and mortality, or any multivariate time series more generally.

This paper is therefore structured as follows. In Section 2, we analyse the trajectories of mortality rates on a country-by-country basis. In particular, we build upon a recently introduced algorithmic framework to identify the turning points of the mortality trajectories, which reveal when the disease was most and least lethal (with respect to the progression from cases to deaths). We then use a new semi-metric between finite sets to assign countries into classes of mortality trajectories. We believe this is the first work to classify different mortality trajectories among countries, rather than a more traditional comparison of overall mortality rates without considering its changing dynamics over time. In Section 3, we analyse the eigenspectra of virulence matrices as a new means of understanding trends in the worldwide prevalence and mortality of COVID-19. This reveals periods in which COVID-19 was most severe and most heterogeneous between countries. In Section 4, we compare countries’ case and death counts with their human development index (HDI) and use a new method to identify the most anomalous countries between these attributes. In Section 5, we discuss our findings from the aforementioned analyses regarding COVID-19 trends throughout the year 2020. We conclude in Section 6.

2. Mortality rate analysis

In this section, we study the dynamics of the COVID-19 mortality rate among n=50 countries. Our data spans 01/01/2020 to 31/12/2020, a period of T=366 days. We choose the countries with the 50 greatest total case counts of COVID-19 as of 31/12/2020, order these by alphabetical order, and index them i=1,,n. Let xi(t),yi(t)R be the multivariate time series of new daily cases and deaths, respectively, for i=1,,n and t=1,,T. Throughout this paper, the subscript i pertains to the ith country, ordered alphabetically, while evaluating a function at t gives its value at the tth day of the year. For a given country, let ri(t) be its 30-day rolling mortality rate, defined by

ri(t)=s=t29tyi(s)s=t29txi(s),t=30,,T, (1)

or zero if no cases have been observed over the last 30 days. This gives a multivariate time series ri(t), for i=1,,n and t=30,,T. The data point at time t describes the rolling mortality rate over the prior 30 days.

2.1. Methodology

The aim of this section is to study these mortality trends on a country-by-country basis and identify structural similarity across different countries. For this purpose, we use two (semi)-metrics between the mortality rate time series and apply hierarchical clustering [47], [48] to these measures. Hierarchical clustering has been used in several epidemiological applications, including inflammatory diseases [49], airborne diseases [50], Alzheimer’s disease [51], Ebola [52], SARS [53], and COVID-19 [41].

These mortality rates ri(t) exhibit highly undulating behaviour, moving between clear peaks and troughs (turning points). Our first semi-metric measures distance between algorithmically-identified turning points as a proxy for each time series’ behaviour. We modify an existing algorithmic framework for this purpose. First, we apply a Savitzky–Golay filter to produce a collection of smoothed time series rˆi(t), t=30,,T and i=1,,n. A beneficial effect of this smoothing is to ameliorate some of the noise present in the COVID-19 case count data. In addition to the smoothing procedure, our computation includes a rolling mortality rate that further reduces the effect of perturbations in the data’s underlying signal. We choose a 30-day rolling mortality rate for two reasons: first, this window length provides a compromise between denoising the data and not over-smoothing; second, 30 days provides a good luck at the mortality rate behaviour of the last month of data. Next, we follow [15] and apply a two-step algorithm where we select and then refine a set of turning points. We assign each smoothed mortality rate time series non-empty sets Pi and Ti of local maxima (peaks) and local minima (troughs). To better suit our specific application, we modify the second step of this algorithm, in which the turning point list is refined. Full details are included in Appendix, including a discussion of the procedure’s robustness against noisy data. We display 12 countries’ mortality rate time series and annotate their turning points in Fig. 1.

Fig. 1.

Fig. 1

Smoothed mortality rate time series and identified turning points for various countries: (a) Brazil (b) India (c) Mexico (d) the US (e) the Netherlands (f) Sweden (g) France (h) Germany (i) Italy (j) Russia (k) Ecuador and (l) Bulgaria. Green and red vertical lines denote algorithmically detected troughs and peaks, respectively. The rolling mortality rate at a given time calculates the mortality over the previous 30 days. The aforementioned countries represent at least one member of every characteristic class of trajectories.

To quantify distance between time series’ turning points, we modify the semi-metric of [32] (with p=1). Given two non-empty finite sets S1,S2{1,2,,T}, this is defined as

D(S1,S2)=12TbS2d(b,S1)#S2+aS1d(a,S2)#S1, (2)

where d(b,S1) is the minimal distance from bS2 to the set S1, and #S1 is the cardinality of S1, and analogously for S2. By the choice of normalisation, this is always bounded between 0 and 1. To more appropriately separate different behaviours among mortality trends, we modify this semi-metric by including a regularisation term. This treatment is inspired by various regularisation penalties in the statistical literature [54]. We construct our semi-metric as follows:

D(S1,S2)=D(S1,S2)+β|#S1#S2|, (3)

where 0<β1 is a constant. The resulting values D(S1,S2) are symmetric, non-negative, and zero if and only if S1=S2. Then, we define the n×n matrix DTP between turning point sets by

DijTP=D(Pi,Pj)+D(Ti,Tj). (4)

Here, Pi denotes the set of peaks for the ith country’s mortality series, while the subscript ij gives a distance between countries i and j, ordered alphabetically. In Fig. 2, we perform hierarchical clustering on DTP with a range of values of β. These distances do not capture the absolute values of the mortality rate time series; they only distinguish between their undulating behaviour, reflected in their sets of turning points. To round out our analysis, we include another metric, an L1 norm that does account for difference in the absolute values of mortality. We define another matrix by

Dij1=t=30T|ri(t)rj(t)|, (5)

and perform hierarchical clustering on D1 in Fig. 3. Again, the subscript ij refers to a distance computed between countries i and j, ordered alphabetically.

Fig. 2.

Fig. 2

Hierarchical clustering on the turning point distance matrix DTP, defined in Section 2, for (a) β=13, (b) β=12. This groups countries according to their similarity in undulating behaviour, measured by distances between turning point sets. Five characteristic classes are observed: Russia has two turning points; Brazil, India and the US have three; most European countries have four, with a strong subcluster of similarity observed including Austria, Belgium, and others. Two smaller classes are observed containing five and six turning points, respectively. The cluster structure in the two dendrograms is near identical, with a consolidation of the five- and six-turning point classes in (b). There, these classes are clearly observed as subclusters.

Fig. 3.

Fig. 3

Hierarchical clustering on the L1 distance matrix D1, defined in Section 2. Mexico and Ecuador emerge as outliers, characterised by a consistently high mortality rate over the full period and the highest peaks in mortality of all, respectively. Belgium, France, Hungary, Spain and the UK are revealed as a secondary cluster, characterised by high mortality in April and May, rapidly decreasing from then.

2.2. Results

In Fig. 1, we display rolling mortality rate and turning points for 12 countries: Brazil, India, Mexico, the United States (US), the Netherlands, Sweden, France, Germany, Italy, Russia, Ecuador and Bulgaria. These countries display highly heterogeneous behaviours, which are suitably captured in Fig. 2. Fig. 2(a) reveals four clusters of similarity, and one outlier. Russia 1(j) is the unique country with just two detected turning points. Several developing countries such as Brazil 1(a), India 1(b) and Mexico 1(c) as well as developed countries including the US 1(d), the Netherlands 1(e) and Sweden 1(f) have three turning points. France 1(g), Germany 1(h) and Italy 1(i) have four turning points. Ecuador 1(k) and others have five, while Bulgaria 1(l) and others have six. Fig. 2(b) gives a near-identical result, where the clusters pertaining to five and six turning points merge. However, examining the dendrogram closely, both are clearly visible as subclusters, and we are comfortable identifying five categories of trajectories. To demonstrate the robustness of our method, we record the cluster structure for a greater range of β in Table 1.

Table 1.

Number of clusters and cluster sizes for different values of the parameter β, used to define the semi-metric in Eq. (3). While a different number of clusters is observed for β=13, subclusters with 5 and 4 elements are clearly visible in Fig. 2(a).

Cluster robustness vs β
β # Clusters Cluster sizes
1/5 4 {21,19,9,1}
1/4 4 {21,19,9,1}
1/3 5 {21,19,5,4,1}
1/2 4 {21,19,9,1}
1 4 {21,19,9,1}

Within the 4-turning point cluster, we see a dense subcluster of similarity containing Austria, Belgium, Canada, Czechia, France, Georgia, Germany, Hungary, Italy, Poland, Portugal, Switzerland and the United Kingdom (UK). All these countries experienced a peak in the mortality rate in April or May (corresponding to the previous 30 days) and a local minimum near the beginning of September (corresponding to the previous 30 days during August). This similarity can be seen by examining members of this cluster, France 1(g), Germany 1(h) and Italy 1(i).

Turning to Fig. 3, several other insights concerning the mortality rate trajectories emerge. First, Mexico and Ecuador are identified as outliers in the collection of countries, with only slight similarity to each other. For Mexico 1(c), this is due to a consistently high mortality rate over time, over 10% for most of the period. Ecuador 1(k) is an outlier due to peaks in mortality over 30%, higher than any other country. Belgium, France, Hungary, Spain, and the UK form their own smaller cluster characterised by high mortality rates (of around 20%) in their first wave of COVID-19. Indeed, these countries experienced higher mortality in March–April than anywhere else in the world.

3. Virulence matrix analysis

In this section, we develop a new framework of time-varying analysis of 30-day rolling virulence matrices, inspired by, but differing from, covariance matrices in finance [55]. Let t=30,,T be a particular time. We form vectors xi(t)=(xi(t29),,xi(t)), analogously for yi(t). These two vectors record the case and death counts over the past 30 days, with the subscript i referring to the ith country, ordered alphabetically. We may also form ri(t)=(ri(t29),,ri(t)) for t=59,,T, as the time series ri(t) only begin at t=30. Define (unscaled) inner products by

xi(t),xj(t)=s=t29txi(s)xj(s). (6)

We then define n×n (unscaled) virulence matrices with respect to cases, deaths and mortality rates by the following (i,j=1,,n):

Vijc(t)=xi(t),xj(t),t=30,,T; (7)
Vijd(t)=yi(t),yj(t),t=30,,T; (8)
Vijr(t)=ri(t),rj(t),t=59,,T. (9)

The subscript ij refers to an inner product between the ith and jth countries, while the superscripts c,d,r refer to case, death and mortality rate time series, respectively. Due to the summation procedure used to form these inner products, these virulence matrices implicitly average over 30 days’ worth of case counts and can are thus robust against the noise present in day-to-day data. We could also analogously define normalised virulence matrices by using normalised inner products in place of the unscaled inner products above. These matrices are thus named because they provide a representation of the global spread of COVID-19 over the last 30 days and contain relationships between different countries’ trajectories. The use of a standard covariance matrix here would not appropriately measure this prevalence: a country with a constant (but severe) number of cases for the past 30 days would yield a zero covariance with any other country. Each matrix V(t) is a n×n symmetric real matrix, and thus is diagonalisable with all real eigenvalues. By the positivity of the inner product, each matrix satisfies a non-negativity condition uTVu0 for uRn, and so all eigenvalues are non-negative. We list and order the eigenvalues λ1λ2λn0. This produces a time-varying eigenspectrum, which we display in Fig. 4 for the first ten eigenvalues. Moreover, for any such symmetric matrix, the greatest eigenvalue λ1 holds particular significance. By the spectral theorem, λ1 coincides with the operator norm of the matrix [56], a measure of its total size. That is,

λ1=Vop=maxuRn{0}Vuu. (10)

Subsequent eigenvalues also have a real-world interpretation. λ2=0 if and only if the matrix V is rank 1, which occurs if and only if all trajectories xi (in the instance of the cases matrix) differ by a multiplicative constant. In general, a small value of λ2 relative to λ1 indicates substantial homogeneity in the trajectories.

Fig. 4.

Fig. 4

Time-varying eigenspectra (first ten eigenvalues) for the virulence matrices associated to (a) cases (b) deaths (c) mortality rate. The first eigenvalue demonstrates broad trends in the total size of the matrices, and shows (a) a large increase of cases towards the end of 2020, (b) two or three waves of significant deaths, (c) the highest mortality early on in the year. The second eigenvalue reveals more heterogeneity in case trajectories towards the end of the year, and mortality towards the beginning of the year.

In Fig. 4, Fig. 4, Fig. 4, respectively, we display the time-varying eigenspectra for the virulence matrices associated to cases, deaths and mortality rates. There are several interesting properties of these time-varying eigenspectra. The first eigenvalue λ1 of Fig. 4(a) demonstrates the general increase of new COVID-19 cases over the course of 2020. The sharp spike towards the end of the year demonstrates the rapid growth in cases in the final months of 2020. Fig. 4(b) has two prominent peaks in its first eigenvalue, corresponding to the periods of March–April and November–December. These peaks highlight the natural history of COVID-19, where many countries suffered significant deaths during their first wave of the virus, enforced harsh restrictions resulting in fewer cases and deaths, and subsequently experienced further growth in cases and deaths upon such restrictions’ easing. Finally, the first eigenvalue in Fig. 4(c) highlights an interesting trend in the mortality rate. There is a marked spike in March–April, followed by a significant reduction throughout the remainder of 2020. This shape in the first eigenvalue likely represents vulnerable people dying earlier and/or under-reporting of cases early in the year, contributing to a higher calculated mortality rate from reported cases and deaths.

The relationship between the first eigenvalue and subsequent eigenvalues is also of interest. Fig. 4(a) shows the second eigenvalue λ2 becoming quite significant for cases towards the end of 2020, when the total number of cases is larger than ever. This shows that the behaviour of new cases in late 2020 is more heterogeneous than the first wave, when all cases were rising quite uniformly throughout the world. Fig. 4, Fig. 4 show a more moderate, but similar phenomenon concerning deaths and mortality rate at various stages of the year. The second eigenvalue in Fig. 4(b) is slightly more pronounced in the second wave of the virus, displaying more heterogeneity in COVID-19 deaths later in the year. The second eigenvalue in Fig. 4(c) is more pronounced during the first wave of the virus — highlighting more heterogeneity during the first wave of the virus with respect to mortality. Indeed, Fig. 1 shows that European countries experienced substantial mortality in their first wave of COVID-19, which characterised them as anomalous in Fig. 3. This contributed to a meaningful heterogeneity of mortality rates across the world during the early stages of the year.

4. Inconsistency analysis

In this section, we describe how we measure the consistency between three attributes, and reveal anomalous countries in the process. To do so, we introduce a new method of comparing three distance matrices and apply this to distances between case and death time series, and human development indices (HDI). This generalises prior work studying anomalies between two attributes [57].

4.1. Methodology

Let hi be the HDI of each country. Calculated by the United Nations Development Programme [58], this index combines a country’s life expectancy, educational standards and economic standard of living. Bounded between 0 and 1, the HDI hi reflects a substantially lower living standard the further h moves from 1. To reflect this, we use a logarithmic distance between these indices that penalises movement away from 1 more than a linear distance:

Dijh=|loghiloghj|,i,j=1,,n. (11)

As before, the subscript i refers to the ith country, ordered alphabetically, the subscript ij refers to a distance between the ith and jth country, while the superscript h signals a distance relative to HDI. This forms a n×n distance matrix between countries’ development indices. Given the exponential nature of the spread of the virus, we also use a logarithmic distance between the case and death time series. Some of these time series have negative counts due to retrospective adjustments in the data. In order to ensure non-negative counts, we first apply a Savitzky–Golay filter to produce smoothed case and death time series xˆi(t) and yˆi(t) respectively. Due to its moving average and polynomial smoothing, this eliminates almost all negatives, except when there are very few counts. We replace any non-positive count with a 1. Then, we may calculate a logarithmic L1 distance as follows:

Dijc=logxˆilogxˆj=t=1T|logxˆi(t)logxˆj(t)|; (12)
Dijd=logyˆilogyˆj=t=1T|logyˆi(t)logyˆj(t)|. (13)

Above, the superscripts c,d refer to distances relative to the case and death time series, respectively, between countries i and j. Again, the summation across many days of data has the effect of smoothing out over the noise inherent in day-to-day variations of case counts. We use such a metric between case or death time series rather than a simple difference between the total yearly counts to distinguish between countries (and hence reveal potential anomalies) according to when the cases or deaths occurred. Thus, we have defined three n×n distance matrices between countries. Given a n×n distance matrix D, its corresponding affinity matrix is defined as

Aij=1Dijmax{D},i,j=1,,n. (14)

All elements of these affinity matrices lie in [0,1], so it is appropriate to compare them directly by taking their difference. Given a n×n matrix C, let |C| be the matrix given by taking the absolute value of all elements, that is |C|ij=|Cij|. Then, define three n×n symmetric pairwise inconsistency matrices:

INCc,d=|AcAd|; (15)
INCc,h=|AcAh|; (16)
INCd,h=|AdAh|; (17)

and a total inconsistency matrix

INCc,d,h=INCc,d+INCc,h+INCd,h. (18)

Above and below, a superscript c,d refers to inconsistency between cases and deaths, while c,h refers to an inconsistency between cases and HDI, and similarly for d,h. A superscript c,d,h refers to an inconsistency between all three attributes. Next, we can define pairwise anomaly scores by

aic,d=j=1nINCijc,d; (19)
aic,h=j=1nINCijc,h; (20)
aid,h=j=1nINCijd,h. (21)

For each country, we record an anomaly vector ai=(aic,d,aic,h,aid,h) and a total anomaly score given by aic,d,h=aic,d+aic,h+aid,h. We can also define a weighted anomaly score to reduce bias in one set of anomaly scores being systematically larger than another. Let Mc,d=maxi{aic,d}, analogously for Mc,h and Md,h. Let the weighted anomaly score be a~ic,d,h=aic,dMc,d+aic,hMc,h+aid,hMd,h. This aims to record a neutral contribution from each anomaly score. In Table 2, Table 3, we record the anomaly vectors, total anomaly score and weighted anomaly score for all 50 countries under consideration. In Fig. 5, we plot the total consistency matrix INCc,d,h, where anomalous countries can easily be seen due to larger entries in their respective rows and columns. An analogous weighted consistency matrix can also be defined, which is broadly similar to the one shown.

Table 2.

Anomaly vectors, total anomaly scores and weighted anomaly scores, as defined in Section 4, for the first 37 countries under consideration. Pairwise anomaly scores quantify the inconsistency in measurements between two quantities, while the total and weighted anomaly scores incorporate all three attributes. The weighted anomaly score is chosen to more appropriately weight the contributions from the three pairwise scores.

Country anomaly scores relative to cases, deaths and HDI (1)
Country ac,d ac,h ad,h ac,d,h a~c,d,h
Argentina 3.23 8.96 10.59 22.78 1.98
Austria 3.43 8.73 10.45 22.61 1.20
Azerbaijan 3.87 7.47 9.72 21.06 1.15
Bangladesh 2.47 14.69 14.93 32.10 1.58
Belarus 4.41 7.86 9.91 22.19 1.23
Belgium 3.88 8.27 8.52 20.67 1.13
Brazil 4.04 12.29 15.31 31.64 1.64
Bulgaria 2.26 8.84 8.31 19.41 0.99
Canada 2.87 9.15 9.21 21.23 1.11
Chile 3.00 9.18 10.85 23.04 1.20
Colombia 3.55 6.28 6.48 16.30 0.92
Croatia 2.91 10.53 11.04 24.48 1.26
Czechia 4.06 8.23 9.86 22.15 1.21
Ecuador 3.75 7.52 6.96 18.23 1.02
France 2.98 9.77 10.70 23.45 1.21
Georgia 4.86 13.33 11.92 30.11 1.61
Germany 3.33 9.83 8.36 20.62 1.10
Hungary 3.52 10.74 8.94 23.21 1.23
India 2.75 9.67 9.72 22.14 1.14
Indonesia 3.72 8.37 7.47 19.56 1.07
Iran 5.25 8.77 11.68 25.69 1.43
Iraq 3.42 7.35 5.88 16.65 0.93
Israel 4.98 9.74 11.15 25.86 1.43
Italy 4.03 9.27 11.67 24.96 1.34
Japan 2.79 9.75 10.34 22.88 1.18
Jordan 4.66 11.06 10.94 26.66 1.44
Mexico 9.73 6.82 13.06 29.60 1.85
Morocco 3.97 10.34 9.67 24.00 1.29
Nepal 4.53 16.84 17.78 39.15 2.00
Netherlands 3.97 8.41 8.70 21.08 1.16
Pakistan 2.43 22.73 21.65 46.80 2.24
Panama 3.34 7.17 7.92 18.42 1.01
Peru 3.06 6.48 6.98 16.52 0.90
Philippines 2.51 7.49 7.29 17.29 0.91
Poland 2.68 7.69 8.58 18.94 0.99
Portugal 3.11 7.42 8.10 18.63 1.00
Romania 3.18 7.24 7.68 18.10 0.98

Table 3.

Anomaly vectors, total anomaly scores and weighted anomaly scores, as defined in Section 4, for the remaining 13 countries under consideration. Pairwise anomaly scores quantify the inconsistency in measurements between two quantities, while the total and weighted anomaly scores incorporate all three attributes. The weighted anomaly score is chosen to more appropriately weight the contributions from the three pairwise scores.

Country anomaly scores relative to cases, deaths and HDI (2)
Country ac,d ac,h ad,h ac,d,h a~c,d,h
Russia 2.62 10.62 10.75 23.99 1.21
Saudi Arabia 3.10 10.77 10.54 24.41 1.26
Serbia 3.57 7.98 9.80 21.36 1.15
Slovakia 3.34 12.86 13.13 29.33 1.50
South Africa 3.16 7.01 5.72 15.89 0.88
Spain 4.05 9.82 11.27 25.14 1.35
Sweden 4.23 9.57 9.36 23.16 1.26
Switzerland 3.14 8.52 9.73 21.38 1.13
Turkey 2.50 7.81 7.78 18.08 0.95
Ukraine 2.78 6.86 6.45 16.08 0.87
UAE 10.29 8.78 13.56 32.63 2.01
UK 3.78 9.87 10.80 24.44 1.30
US 3.18 18.46 19.81 41.45 2.04

Fig. 5.

Fig. 5

Total anomaly matrix INCc,d,h, as defined in Section 4. Lighter entries indicate higher values of the matrix, and hence more inconsistency between the attributes under consideration: cases, deaths and HDI. The US and Pakistan can be seen to have substantial inconsistency with many other countries.

Remark 4.1

In this brief aside, we explore the edge cases of maximal consistency and maximal inconsistency, and interpret their meaning. Consider a single entry INCijc,d=|AijcAijd|. As both Aijc,Aijd[0,1], the inconsistency entry INCijc,d has greatest possible value to 1. It attains that value when Aijc=1 and Aijd=0, or vice versa. These equations can be reinterpreted as Dijc=0 and Dijd=maxDd, respectively. That is, greatest inconsistency occurs when countries i and j have equal case counts, but the greatest difference in death counts among any pair of countries, or vice versa. The exact same statement applies for greatest inconsistency between cases and HDI or deaths and HDI.

On the other hand, greatest possible consistency across the entire matrix would mean AijcAijd=0, for all i,j. Rearranging this yields DijcmaxDc=DijdmaxDd, for all i,j. That is, the distance matrices Dc and Dd differ up to a single scalar. One example where this can occur is if there are constants a and τ such that yi(t)=axi(t+τ), for all i=1,,n,t=1,,T. Then this relationship passes to the smoothed counts by linearity, and so t|logyˆi(t)logyˆj(t)|=t|logaxˆi(t+τ)logaxˆj(t+τ)|=t|logxˆi(t)logxˆj(t)|. That is, maximal consistency would occur if every country has an identical progression of cases to deaths, up to a multiplicative constant a and a time-offset τ.

4.2. Results

The total inconsistency matrix and all computed anomaly scores yield several insights. First, the three most anomalous countries with respect to the weighted anomaly score are Pakistan, the US and the United Arab Emirates (UAE). A near-identical result applies if we use the unscaled total anomaly score, with Pakistan, the US, Nepal and then the UAE exhibiting the largest unscaled scores. For the US and Pakistan, the highest contribution to the total or weighted anomaly score comes from their high pairwise anomaly scores ac,h and ad,h, which are the two highest of any country. Interestingly, these high scores have differing explanations. The US is highly inconsistent between cases (and analogously deaths) and HDI due to its much higher case and death counts than other countries of similar HDI. Pakistan is classified as inconsistent due to an extreme HDI, the lowest of any country under consideration, but a case and death time series that are similar to many others. Thus, due to a lower HDI than other countries with similar case and death counts, it is registered as inconsistent. We remark that high anomaly scores do not necessarily indicate a straightforward anomalous quotient between cases or HDI, for example. Instead, a high anomaly score reflects inconsistency in relationships with other countries.

On the other hand, the UAE has a high weighted and total anomaly score due to its value of ac,d, which is the highest of any country. Indeed, the UAE experienced the lowest mortality rate across 2020 of any country under consideration. The country with the second-highest value of ac,d is Mexico. This is anomalous for the opposite reason: a consistently high progression from cases to deaths, as first noted in Fig. 1(c).

5. Discussion

In this paper, we analyse the natural history of COVID-19 across 50 countries over 2020. We observe significant structural similarity between certain countries as well as heterogeneity across the world with respect to COVID-19 prevalence and mortality, and identify anomalous countries therein. Such insights cannot be gained with conventional techniques, such as a comparison of reproductive ratios across countries. Our analysis consistently considers the changing dynamics with time.

In Section 2, we analyse mortality rate trajectories for 50 countries. By modifying a recently introduced turning point algorithm and introducing a new semi-metric between turning point sets, we assign these time series into five characteristic classes according to their differing trajectories. Russia is identified as an outlier — its mortality rate rose consistently until July and never dropped substantially enough to register a subsequent trough in our algorithmic framework. It is unique in this sense among the 50 countries, possessing a consistently stable mortality rate after its first peak. 19 countries exhibit three turning points, including Brazil, India and the US, indicating a substantial reduction in mortality from a first peak. 21 countries exhibit four turning points, indicating a second wave in which mortality has increased once again. In particular, a strong subcluster contains most Western European countries: Austria, Belgium, Czechia, France 1(g), Germany 1(h), Hungary, Italy 1(i), Portugal, Switzerland, and the UK. These all share highly similar mortality trajectories, with a first peak in April–May, a trough around September, and another peak at the end of the year.

There are three wealthy western European countries that do not fit into this cluster. Both the Netherlands and Sweden, displayed in Fig. 1, Fig. 1 respectively, do not register a second peak in mortality. Indeed, these countries both kept their mortality low towards the end of the year, while France, Germany and Italy experienced an increase. Prior research has noted that the Netherlands reduced its mortality rate substantially in its second wave of COVID-19 [42], while Sweden changed its COVID-19 response substantially relative to the first half of the year [59]. Spain registers six turning points primarily due to highly irregular reporting, featuring negative counts and large numbers of cases and deaths consolidated and reported on single sporadic days.

A smaller number of countries exhibited more turning points: five with 5 turning points and four with 6. We observe that the majority of developed countries exhibit 3 or 4 turning points, as visible in Fig. 2, while the outlier countries (with 2,5 or 6 turning points) were mostly developing countries. This reflects more regular (and less undulating) behaviour in the mortality rate trajectories and has two explanations. First, more developed countries may have implemented more consistent testing, which could have caused less fluctuations in the reported mortality rate. Secondly, more developed countries may have more healthcare resources to improve their treatment of COVID-19 and thereby reduce and stabilise the mortality rate over time.

As a whole, the most significant finding from Section 2 is the identification of five categories of mortality trajectories, attained from both the turning point algorithm and the use of clustering our new semi-metric between sets. This reveals close similarity among mortality trajectories when considered as varying functions over time, and carries more weight than an overall comparison of mortality obtained by just dividing the number of deaths observed throughout 2020 by the number of cases. Our results are robust with respect to the variation of parameters.

In Section 3, we introduce a new class of virulence matrices for cases, deaths and mortality rates and analyse their eigenspectra. The first eigenvalue λ1 provides a measure of the total scale of the matrices and summarises worldwide trends in prevalence and mortality throughout 2020. Fig. 4(a) reflects a substantial surge in cases towards the end of the year, Fig. 4(b) shows multiple waves of deaths of comparable magnitude, while Fig. 4(c) shows an early peak that dominates the rest of the period. The second eigenvalue λ2 provides a measure of the heterogeneity among the studied time series. Fig. 4(a) exhibits a considerable rise in heterogeneity towards the end of the year, during a time in which new cases trajectories across different countries were substantial but quite non-uniform. In Fig. 4(b), we see a much greater value of λ2 during the second wave of deaths, in which λ1 is in fact lower than the first wave. The much milder drop off between λ1 and λ2 indicates the greatest heterogeneity with respect to deaths during this period in the middle of the year. Fig. 4(c) similarly reveals substantial heterogeneity in mortality rates during the earlier part of the year.

When viewed in conjunction, these three figures provide several insights into the natural history of the disease throughout 2020. Case counts generally increased in global severity throughout the year, while death counts constituted a much clearer pattern of multiple waves. The mortality rate trajectory 4(c) can explain this — in March and April, the progression from reported cases to deaths was much more severe throughout Europe, causing substantial deaths despite fewer cases than late 2020. During the middle of the year, the heterogeneity in death counts was at its highest. Indeed, the months of June to August featured relatively few new cases in Europe [60], while Brazil [61], India and other developing countries experienced substantial growth in cases [62]. Towards the end of the year, the pandemic once again impacted the entire world, with more counts observed than ever before. During this time, mortality was low, but cases were so high that deaths became the highest they have ever been. Heterogeneity in case trajectories also increased substantially, with COVID-19 trajectories differing substantially between different countries, many increasing, some decreasing, but most with high total counts. One could more closely examine heterogeneity by considering normalised virulence matrices obtained from normalised inner products, as explained in Section 3.

This analysis provides a new means of identifying periods of maximal severity and heterogeneity in case, death and mortality trajectories across the world. The temporal dimension is critical in such analyses as both severity and heterogeneity change over time. Specifically, cases are most severe and heterogeneous at the end of the year; deaths are most severe in March/April and year-end, but most heterogeneous in the middle of the year; mortality is most severe and heterogeneous in March/April.

In Section 4, we study the consistency between cases, deaths and HDI for all 50 countries under consideration. We believe that this is the first method proposed to study (in)consistencies among a collection of time series for up to three measures. We propose two measures of anomaly across these three quantities: a total and weighted anomaly score (that more appropriately combines the contributions of the three pairwise anomaly components). The three most anomalous countries with respect to the weighted score are Pakistan, the US and the UAE. Closer inspection of the pairwise anomaly components in Table 2, Table 3 can reveal which quantities most contribute to a country’s total or weighted score. For the UAE, this is the high anomaly score between cases and deaths, caused by the lowest progression from cases to deaths among our collection of countries. For the US, both anomaly scores ac,h and ad,h contribute highly; these reflect the fact that the US has considerably more cases and deaths than other countries of similar HDI. For Pakistan, the same two anomaly scores ac,h and ad,h are the largest of any country, but for the opposite reason: its HDI is substantially lower than any country with a similar case and death time series.

The full collection of anomaly scores can also reveal broad trends regarding consistency between the three measures. In Table 2, Table 3, we see that the two pairwise anomaly scores relative to HDI are systematically greater than the pairwise score between case and death counts. Indeed, we have aic,d<aid,h for every single country and aic,d<aic,h for every country except Mexico (which has the second-highest case–death anomaly score after the UAE due to its consistently and anomalously high mortality). These patterns reveal systematically more consistency between case and death counts than between case or death counts and HDI. Qualitatively, this reveals there is little relationship between a country’s HDI and its case or death counts. In addition, a closer examination reveals that aic,h<aid,h for 34 out of the 50 countries, 2/3 of the collection. Thus, to a lesser extent, there is greater consistency between case counts and HDI than there is between death counts and HDI. This is a surprising finding — one would naively expect more consistency between a lower HDI and higher deaths due to poorer healthcare quality resulting in a greater progression of cases to deaths, regardless of the number of cases.

The originality of Section 4 is two-fold: first, a new mathematical method for identifying inconsistencies across three attributes; and second, as the first analysis of cases, deaths and HDI of different countries simultaneously, again taking temporal dynamics into account. The main findings are the identification of specific anomalous countries, including Mexico and the UAE between cases and deaths, and the US and Pakistan between cases (or deaths) and HDI.

Several limitations and opportunities for future research exist in this inconsistency framework. First, the results could also be repeated for case and death time series as a proportion of each country’s population. Alternative metrics between cases and deaths could be used, such as a simple difference between the total yearly counts, without the temporal component provided by the L1 metric. A closer analysis of the relationship between the varying sizes of the anomaly scores could quantitatively characterise the differing consistency between three quantities as a whole. One limitation in this analysis framework is that anomalies are measured purely by their relative deviation from the rest of the collection, and direction (positive or negative) is ignored. A closer inspection is necessary to determine the nature of the anomaly. However, this could be seen as a benefit of the methodology as well, as it is flexible in the detection of different sorts of inconsistent behaviour. Further research could also incorporate several different attributes other than HDI, such as countries’ age demographics, size, and population density.

More broadly speaking, any analysis of reported cases and deaths due to COVID-19 will have limitations. First, the reported counts of COVID-19 may have been under-reported [63] throughout the pandemic. Not only did early cases spread throughout Europe and the US before testing programmes had been established, but testing protocols were far from uniform across the year and between countries. Indeed, several countries changed their testing protocols on various occasions, including within the same wave [64], [65], [66]. Even deaths may have been under-reported, with substantial differences observed between excess mortality and reported COVID-19 deaths [67]. Nonetheless, our analysis of reported case and death counts may reveal structural similarity and anomalies, help governments in their decision-making, and motivate further research that examines other data attributes in more involved studies.

6. Conclusion

Overall, this paper introduces new methods for analysing COVID-19 prevalence and mortality on a country-by-country and worldwide basis and chronicles the natural history of COVID-19 during 2020. On a global scale, we reveal broad trends in case and death counts as well as mortality trajectories, which present a coherent picture of the changing impacts of COVID-19 over time. On a country-by-country basis, we reveal both heterogeneity and structural similarity with respect to mortality over time and study consistency between COVID-19 prevalence and human development, revealing specific anomalous countries. Moreover, the framework presented in this paper could be applied broadly to various epidemiological or economic crises. The consistent theme in our analysis, and motivation for it, is to always seek structure and associated anomalies in case, death and mortality time series, with an essential consideration of changing dynamics with time.

The primary strength of this analysis is that our findings are difficult to detect with existing methods. For example, the use of SIR models and their extensions, together with an analysis of the reproductive ratio R0, may be fit independently for each country and create predictions, but they are not suitable to detecting structure in all the world’s case, death and mortality trajectories at once. Such methods would neither reveal the five classes of trajectories we find, nor would they identify the periods of the greatest heterogeneity in prevalence or mortality, nor identify anomalous countries with respect to our chosen data attributes. Measurements such as R0 are more useful for early analysis of the transmissibility of the virus; we aim to find structure while comparing the plights of different countries over time.

As 2021 begins, the world remains severely affected by COVID-19. Though vaccination distribution is underway in many countries, the analysis of trends in cases, deaths and mortality remains of substantial relevance to governments. The identification of structural similarity in mortality rate trajectories between European states may inspire additional cooperation [7] and coordination of their strategic response to the pandemic. Our methods highlight countries that have responded particularly well or poorly, and our analysis highlights points in time where cases, deaths and mortality rates changed substantially for candidate countries. Finally, we reveal global changes in the relationship between cases, deaths and mortality rates over time. Such changes should inform governments regarding their response to the pandemic. This will be particularly crucial in the coming months, as various vaccines are administered over the world.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

Many thanks to Kerry Chen for helpful comments and edits.

Funding sources

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Communicated by V.M. Perez-Garcia

Appendix. Turning point methodology

In this section, we provide more details for the identification of turning points of a mortality rate time series r(t). First, some smoothing is necessary due to irregularities in the data set, and discrepancies between different data sources. The Savitzky–Golay filter ameliorates these issues by combining polynomial smoothing with a moving average computation, and yields a smoothed time series rˆ(t)R0. Subsequently, we perform a two-step process to select and then refine a non-empty set P of local maxima (peaks) and T of local minima (troughs).

Following [15], we apply a two-step algorithm to the smoothed time series rˆ(t). The first step produces an alternating sequence of troughs and peaks. The second step refines this sequence according to chosen conditions and parameters. The most important conditions to identify a peak or trough, respectively, in the first step, are the following:

rˆ(t0)=max{rˆ(t):max(1,t0l)tmin(t0+l,T)}, (A.1)
rˆ(t0)=min{rˆ(t):max(1,t0l)tmin(t0+l,T)}, (A.2)

where l is a parameter to be chosen. Due to the smoothing of the Savitzky–Golay filter, noise in day-to-day counts will change the local maxima and minima of the smoothed time series minimally, and will not affect either the number of total turning points or the distances between different turning point sets. Following [15], we select l=17, which accounts for the 14-day incubation period of the virus [69] and less testing on weekends. Defining peaks and troughs according to this definition alone has several flaws, such as the potential for two consecutive peaks.

Instead, we implement an inductive procedure to choose an alternating sequence of peaks and troughs. Suppose t0 is the last determined peak. We search in the period t>t0 for the first of two cases: if we find a time t1>t0 that satisfies (A.2) as well as a non-triviality condition rˆ(t1)<rˆ(t0), we add t1 to the set of troughs and proceed from there. If we find a time t1>t0 that satisfies (A.1) and rˆ(t0)rˆ(t1), we ignore this lower peak as redundant; if we find a time t1>t0 that satisfies (A.1) and rˆ(t1)>rˆ(t0), we remove the peak t0, replace it with t1 and continue from t1. A similar process applies from a trough at t0.

At this point, the time series is assigned an alternating sequence of troughs and peaks. However, some turning points are immaterial and should be excluded. The second step is a flexible approach introduced in [15] for this purpose. In this paper, we introduce new conditions within this framework. First, we use the same peak ratio procedure: let t1<t3 be two peaks, necessarily separated by a trough. We select a parameter δ=0.2, and if the peak ratio, defined as rˆ(t3)rˆ(t1)<δ, we remove the peak t3. If two consecutive troughs t2,t4 remain, we remove t2 if rˆ(t2)>rˆ(t4), otherwise remove t4. That is, if the second peak has size less than δ of the first peak, we remove it.

Finally, let t1,t2 be adjacent turning points (one a trough, one a peak). We choose a parameter ϵ=log(2); if

|logrˆ(t2)logrˆ(t1)|<ϵ, (A.3)

that is, the values of the turning point differ by less than a factor of 2, we remove t2 from our sets of peaks and troughs. If t2 is not the final turning point, we also remove t1. This is a different condition from previous work — whereas [15] considers the average change with time between turning points of new case trajectories, we consider only the absolute change between turning points in mortality rate. Indeed, there is no need to consider how much time has passed when determining whether mortality has increased or decreased by a sufficient amount, in our implementation a factor of 2, to warrant a turning point being included.

Data availability

Daily COVID-19 case and death counts and human development index data can be accessed at “Our World in Data” [68].

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Daily COVID-19 case and death counts and human development index data can be accessed at “Our World in Data” [68].


Articles from Physica D. Nonlinear Phenomena are provided here courtesy of Elsevier

RESOURCES