Abstract
In this paper, we estimate the seroprevalence against COVID-19 by country and derive the seroprevalence over the world. To estimate seroprevalence among adults, we use serological surveys (also called the serosurveys) conducted within each country. When the serosurveys are incorporated to estimate world seroprevalence, there are two issues. First, there are countries in which a serological survey has not been conducted. Second, the sample collection dates differ from country to country. We attempt to tackle these problems using the vaccination data, confirmed cases data, and national statistics. We construct Bayesian models to estimate the numbers of people who have antibodies produced by infection or vaccination separately. For the number of people with antibodies due to infection, we develop a hierarchical model for combining the information included in both confirmed cases data and national statistics. At the same time, we propose regression models to estimate missing values in the vaccination data. As of 31st of July 2021, using the proposed methods, we obtain the credible interval of the world seroprevalence as .
Keywords: Bayesian model, hierarchical model, SARS-CoV-2 antibodies, vaccination, world seroprevalence
1. Introduction
At the beginning of December 2019, the first coronavirus disease 2019 (abbreviated COVID-19) patient, due to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was identified in Wuhan, China [12]. In the following weeks, the disease rapidly spread all over China and other countries, which caused worldwide damage and is still widespread. According to the official statement, COVID-19 has so far caused more than 317 million infections and 5.5 million deaths globally.
Vaccines are a critical tool for protecting people because of producing antibodies against infectious diseases. Every country in the world is struggling to block the spread of the virus and treat patients. As part of that, countries are administering COVID-19 vaccines, and the majority of people in many countries have been given the vaccines. There are a variety of available COVID-19 vaccines, e.g. AstraZeneca, Johnson & Johnson, Moderna, Novavax, and Pfizer-BioNTech, and candidates currently in Phase III clinical trials [7].
Seroprevalence is the ratio of people with antibodies, which is produced by previous infection or vaccines, to a particular virus in a population. In this paper, we study the seroprevalence of SARS-CoV-2 infections in people all over the world, particularly in adults, using information officially reported by countries. The available information includes confirmed cases, the number of people vaccinated, types of vaccines, and serosurvey data.
Recently, there have been various approaches for estimating the seroprevalence of antibodies to SARS-CoV-2. For example, [6] proposed a Bayesian method that uses a user-specific likelihood function being able to incorporate the variabilities of specificity and sensitivity of the antibody tests, [14] utilized a Bayesian logistic regression model with a random effect for the age and sex, and [10] developed a Bayesian multilevel poststratification approach with multiple diagnostic tests. Lee et al. [11] presented a Bayesian binomial model with an informative prior distribution based on clinical trial data of the plaque reduction neutralization test (PRNT), a kind of serology test. While these approaches have been developed for the populations in certain regions, not global, [2] considered the seroprevalence in the global region by proposing a meta-regression method. In this paper, we propose a new Bayesian method for estimating the world seroprevalence of SARS-CoV-2 antibodies and verify the proposed method via numerical studies. This method estimates the percentage of people who have developed antibodies due to viral infection or vaccination in each country and combines these estimates using a hierarchical Bayesian model. Additionally, the method utilizes informative priors constructed from external information to enhance the accuracy of our estimates. By doing so, we can provide global seroprevalence estimates that better reflect available information and uncertainty. We assess the accuracy of the proposed method via a simulation study and the leave-one-out cross-validation method for the real data.
The rest of the paper is organized as follows. In the next section, we introduce the serology test and vaccination datasets for the SARS-CoV-2 and briefly review the model proposed in [11] for constructing an informative prior. In Section 3, we propose a new Bayesian approach to estimate the world seroprevalence of SARS-CoV-2. Section 4 presents the results of empirical analysis using real data. Finally, the conclusion is given in Section 5.
2. Materials
2.1. Vaccine data
In this subsection, we introduce the notation used in the rest of the paper and describe datasets for estimating the number of effectively vaccinated people by country. The datasets include the vaccinations, delivery amount of vaccines, and observational studies for vaccine effectiveness.
2.1.1. Vaccination data by country
We utilize the vaccination data given in [13], which is collected from official public reports on vaccinations against COVID-19 by country. The dataset contains the cumulative vaccine doses administrated, the cumulative number of fully vaccinated people, the report dates, and the information for vaccine manufacturers. As of 31 July 2021, the number of countries on reports is 182.
We denote the jth report date of the ith country using where and , and the cumulative doses administrated and the cumulative number of fully vaccinated people until the date are denoted by and , respectively, for the jth report of the ith country. Note that is observed for all i and j, while is not observed in some reports. Specifically, is not observed at all in two countries, Cote d'Ivoire and Ethiopia, and is partially observed in 113 countries. We denote the set of vaccine manufacturers used at the corresponding date by . For example, if the vaccines produced by AstraZeneca and Pfizer-BioNTech are only used at the jth report date of the ith country, then .
We define as the cumulative doses by vaccines from the kth manufacturer for , where K is the number of vaccine manufacturers in the whole vaccination data. With this definition, we have . In the vaccination data we consider, are observed in 32 countries.
2.1.2. Delivery amount of vaccines
As of the 31st July, [16] presents the delivery data, which refer to the amounts of doses that a country has received. The delivery data consist of publicly reported delivered vaccine amounts, including bilateral agreement, COVAX shipment, and donations. Among 182 countries providing vaccination reports (Section 2.1.1), the delivery data are available for 140 countries. We use these data for the estimation of missing values of .
Let be the set of country indexes having the delivery amount data, and let be the delivery amount of the kth vaccine in the ith country, . We define as
which denotes the proportion of the kth vaccine delivered in the ith country. Note that for the case , this definition is based on the assumption that the delivery amount of the kth vaccine in a country is affected by the total supply of this vaccine.
2.1.3. Observational studies for vaccine effectiveness
In the vaccination data introduced in Section 2.1.1, twelve kinds of vaccines are used. We identify the vaccines by the name of manufacturers, which are listed in Table 1. We categorize these vaccines into three groups called type 1, 2, and 3 vaccines. The numbering of the type represents the required doses for one person to be fully vaccinated.
Table 1.
The list of vaccine manufacturers in the vaccination data [13].
type | manufacturer | interval (days) | number of studies (fully/ partially) |
---|---|---|---|
1 | Janssen | – | 8/0 |
CanSino | – | 0/0 | |
2 | AstraZeneca (AZ) | 84 | 17/15 |
Pfizer | 21 | 80/46 | |
Sinopharm | 21 | 1/0 | |
Sputnik V | 21 | 1/0 | |
Sinovac | 14 | 2/1 | |
Moderna | 28 | 40/26 | |
Covaxin | 28 | 0/0 | |
QazVac | 21 | 0/0 | |
EpiVacCorona | 21 | 0/0 | |
3 | RBD-Dimer | 56 | 0/0 |
Note: In the third column are the recommended intervals between the first and last doses of each vaccine, which are obtained from [15]. In the fourth column, the number of studies on vaccine effectiveness is presented. The studies targeting fully vaccinated and partially vaccinated people are distinguished.
A vaccine is evaluated by its efficacy or effectiveness, which measures how well vaccination protects people against infection, symptomatic illness, hospitalization, or death. While the efficacy is based on the controlled clinical trial, the effectiveness is based on real-world observation studies. In this paper, we consider the effectiveness since we analyze real-world vaccination data.
Higdon et al. [9] conducted a systematic review of COVID-19 effectiveness studies. They collected 107 effectiveness studies, categorized into four groups: effectiveness studies against death, severe disease, symptomatic disease, and any infection. We use the effectiveness studies against any infection for seroprevalence estimation. Then, we have 69 studies for 7 vaccines: Pfizer, Moderna, AstraZeneca (AZ), Sputnik V, Janssen, Sinovac, and Sinopharm. We summarize the 69 observational studies in Table 1. Observation studies for type 2 vaccines have the effectiveness of fully vaccinated and the effectiveness of partially vaccinated.
2.2. Serological survey data
We collect the serological survey data from SeroTracker, a knowledge hub of COVID-19 serosurveillance [1]. We exclude survey data that has a risk of a biased sample. Specifically, we consider the following two exclusion conditions:
The sample is collected from a sub-population.
The seroprevalence is lower than the COVID confirmed population rates.
The survey data collected by Serotracker includes surveys from specific sub-populations, such as a particular region, age group, or healthcare workers. We exclude surveys targeting these sub-populations to ensure our analysis represents the national population, leaving us with 126 serological surveys after applying the first exclusion criterion. The second exclusion criterion pertains to the conceptual distinction between individuals with confirmed COVID-19 and those with detectable SARS-CoV-2 antibodies; surveys that do not adhere to this distinction are also excluded. Following the application of this second criterion, by July 31st, 2021, we have 99 serological surveys from 45 countries. Each survey is characterized by its own sampling period. The histogram of the last dates in the sampling periods is shown in Figure 1.
Figure 1.
The histogram of the last dates in the sampling periods for 99 nationwide serological surveys.
3. A Bayesian method for the seroprevalence estimation
We present a Bayesian method to estimate the seroprevalence. Specifically, we propose the method for estimation of the seroprevalence based on the two parts: the proportions of the effectively vaccinated and of the infected, which are denoted by and , respectively.
Recall that the effectively vaccinated are people with antibodies produced from vaccines and that the infected are those who have gotten the antibodies by infection. We define the seroprevalence at t date of the ith country as
where the product terms represent the cases in which the infected are vaccinated without the knowledge of infection. We provide Bayesian models to estimate and in next two Subsections 3.1 and 3.2, respectively, for each country and date.
3.1. Models for vaccine induced seroprevalence
For the estimation of , we propose a Bayesian model to estimate the number of effectively vaccinated people. Let denote the number of effectively vaccinated people at the jth report date in the ith country. Note that the index j in indicate the report index of the vaccination data (Section 2.1.1), and vaccination reports are not given for everyday. If s for are given, we can obtain as
(1) |
where , and is the population of the ith country. When there is no report in date t, we use the most recent report from date t. Thus, we focus on the estimation of for the estimation of .
Let be the number of fully vaccinated people by the kth vaccine at the jth report date in the ith country, and and be the efficacies of the kth vaccines for the fully vaccinated people and those who have at least one dose but have not finished the required doses, respectively. We assume that the distribution of is
(2) |
where denotes the required doses of the kth vaccine.
The term in (2) represents the partially vaccinated people of kth vaccine. If , since by definitions, this term is zero. If , , which is the number of people who have gotten only one vaccine. If , is the sum of the number of people vaccinated with one dose and twice the number of people vaccinated with two doses. Under the assumption that the number of people vaccinated once and twice is the same, is equal to the number of people who have at least one dose of vaccination, but have not finished the required number of vaccination. We are aware that this assumption is not warranted, but since the vaccine requiring 3 doses is used only in one country, Uzbekistan, we believe that the effect of the assumption is not critical.
Since some of , , and are not observed, we need statistical models for these variables. In Sections 3.1.1–3.1.3, we suggest a method to specify and . In Section 3.1.4, we suggest a method to specify and .
3.1.1. Model for
We consider a multinomial regression model for given and , which are defined in Sections 2.1.1 and 2.1.2, respectively. Let be the response vector and be a covariate vector, which is to be defined with and for , where for a positive integer n. We assume
(3) |
where is the regression coefficient. Model (3) implies that
(4) |
for all . Equation (4) means that the ratio of usage probability of the xth vaccine to that of the yth vaccine, , is proportional to the ratio of to after logarithm transformation. This assumption is examined via visualization after the definition of .
We now define using the variables for delivery amount and the numbers of doses administrated for . In the definition of , we reflect the idea that is positively dependent both on the delivery amount of the kth vaccine in the ith country and the period during which the kth vaccine is used. First, let
(5) |
where is the kth vaccine, and . The variable is defined by multiplying the number of doses administrated at the date of the th report, , to the delivery amount of the kth vaccine in the ith country if the kth vaccine is used at this date. Otherwise, we set as zero. Then, we define . Figure 2 is the scatter plot for the points in the set , and shows that the linearity assumption in (4) is reasonable.
Figure 2.
The scatter plot for the points in the set .
We assign a non-informative prior distribution for :
Theorem 3.1 shows that the posterior distribution under the flat prior is proper. The proof is given in the supplementary material.
Theorem 3.1
Suppose follows the distribution (3) for and . Let . If there exists such that , then
where is constructed by , and is the density function with parameter and observation .
3.1.2. Model for
There are missing values in (the cumulative number of fully vaccinated people at the jth report date of the ith country), and we propose a distribution for the missing values. To do this, we first present methods for three simple cases in which only one type of vaccines are used in the country i up to the report date , and then expand those to the method for the general case in which mixed types of vaccines are used in the country i up to the report date .
In Case 1 in which only type 1 vaccines are used, is easily derived from since the vaccination is completed with only one dose. Thus, we have
(6) |
In Case 2, in which only type 2 vaccines are used, we employ the Poisson distribution to the random variable . Note that is the number of the doses administrated to people who have gotten one dose but not finished vaccination as of the jth report date of the ith country. We assume that the longer the interval between the first and the last doses is, the larger is. We also assume that the larger the doses recently administrated is, the larger is.
To specify the doses recently administrated, we address the relation between the report index j and the corresponding report date. For each report index j, is defined as the report date, and satisfies . In the vaccination data, there exists an index j such that , i.e. the reports are not given for everyday. When we need vaccination data for date d with , we use the data from the closest report. Specifically, we define , to indicate the closest report index from date , as
for country index i, report index j and positive integer δ. According to the definition of , when there are more than one minimizer in , we use the smallest index. In this paper, we set , and if there is no confusion, we let denote . Using the definition of , we define representing the average of daily doses recently administrated, and we define approximating the doses administrated for recent T days, where T is the required interval between the first and last doses.
Supposing only one kind of type 2 vaccine is used, we propose the regression model
(7) |
This model reflects the assumptions that is positively related to the doses administrated for recent T days. Recall that is the number of the doses administrated to people who have gotten one dose but not finished vaccination as of the jth report date of the ith country. We suppose that people who have gotten only one dose had the first dose in recent T days based on the required interval.
The model (7) can be used only when one kind of type 2 vaccine is used. We expand (7) to consider the case when kinds of type 2 vaccines are possibly used, where is a positive integer larger than 1. We substitute T in to the weighted mean of the intervals as . Here is the required interval between the first and last doses of the kth vaccine. We define as
(8) |
Recall the definition of in (5). The variable is zero when the kth vaccine is not used at the th report date of the ith country; otherwise, this variable represents the delivery amount of the kth vaccine in the ith country multiplied by the doses administrated at the corresponding date. Thus, is constructed from the three factors: the delivery amount, the doses administrated during recent days, and whether the kth vaccine is used. Using the weighted mean of the intervals, we define to replace in (7). We suggest the distribution for Case 2 as
(9) |
where is the index set for type 2 vaccines.
Next, we propose a model for Case 3, in which only type 3 vaccines are used, using the similar idea as in Case 2. To do this, we use the random variable instead of . Here the variable represents the doses administrated to people who have not finished vaccination. Then we consider the Poisson model as
where , and is the index set of type 3 vaccines. We can re-express this distribution as
(10) |
Finally, we combine the models (6), (9) and (10) to construct the model for general case. Let be the weight of type l vaccines for l = 1, 2, 3 with , which are defined as for l = 1, 2, 3. By combining (6), (9) and (10), we propose the generalized model as
(11) |
We choose the flat prior distribution on and ,
The following theorem shows that the prior induces the proper posterior distribution. The proof for this theorem is given in the supplementary material.
Theorem 3.2
Let n be a positive integer with , and let and . If there exists a pair of indexes i and j such that , then
where .
3.1.3. Distributional assumption for
In this subsection, we provide a distribution for given and for . This distribution is based on the following three premises:
for
is positively dependent on for
The first premise is obvious from the definitions of and , and the second premise is based on the definitions of and . When a type 1 vaccine is considered, the number of fully vaccinated people coincides with the number of doses since only one dose is required for this type of vaccine. Next, we address the third premise. Recall that is the interval between first and last doses of the kth manufacturer's vaccine, and is defined so that . Those who have gotten the first dose of the kth vaccine until the th report date are expected to be fully vaccinated until jth report date. Thus, we assume that is positively dependent on .
Using the premises, we suggest a distribution for for . We let , which is the vector comprised of s excluding the type 1 vaccines. Likewise we let . Given , and , we suggest the distribution for as
3.1.4. Model for the estimation of the vaccine effectiveness parameters
We propose a hierarchical model to analyze vaccine effectiveness studies introduced in Section 2.1.3. The hierarchical model extends the random effect model for meta-analysis proposed in [3]. The model in [3] is designed for the meta-analysis of one vaccine. We suggest the hierarchical model to consider more than one vaccine.
We review the model in [3]. Suppose we have n studies for the effectiveness of one vaccine. Bodnar et al. [3] defines the effect size as the log risk ratio, , where is the vaccine effectiveness. The ith study gives the effect size with the standard error . The random effect model supposes are generated from
(12) |
where represents the true effect size of the ith study and represents the overall mean of the true effect size. The parameter ω is the heterogeneity parameter, which represents environmental differences among the studies. For the Bayesian inference of model (12), [4] and [3] suggest the Berger and Bernardo reference prior as
We extend model (12) to analyze the observational studies of more than one vaccine. Suppose we have observational studies for K vaccines. Let and denote the results of the ith study for partially and fully vaccinated of the kth vaccine, respectively, and , where is the number of observational studies for the kth vaccine. The hierarchical model assumes that and are generated from the following distribution:
(13) |
where and represent the true effect size of the ith study for partially and fully vaccinated of the kth vaccine, respectively. The and represent the overall effect size for partially and fully vaccinated of the kth vaccine, respectively, and and represent the effectiveness of overall vaccines for partially and fully vaccinated of the kth vaccine, respectively. We append constraints on , and in the hierarchical model as follows:
The constraints imply that the effectiveness increases as the number of doses increases.
We suggest the following prior distribution:
For the parameter of ω, we employ the prior distribution used by the random effect model (12). We assign the flat prior on the parameters of and motivated by Gelman et al. [8]. By the posterior distribution of , we estimate the effectiveness of the kth vaccine. If the observational studies of kth vaccine do not exist, i.e. , the posterior distribution of is derived based on the overall mean parameter in (13).
3.2. Models for infection induced seroprevalence
In this section, we propose a method to estimate using a hierarchical model, an extension of the model (14) proposed by Lee et al. [11],
(14) |
where N is the number of subjects in a serosurvey, X is the number of subjects who is test-positive, and are sensitivity and specificity of the serology test, respectively, and θ is the seroprevalence. While model (14) is used for the analysis of one set of serosurvey in a country, we suggest the hierarchical model to analyze the serosurvey data over countries given in Section 2.2.
First, we introduce a reparameterized form of model (14) in Section 3.2.1, and we propose the hierarchical model in Section 3.2.2 using the reparameterized model. We introduce notations for this section. We use 99 serosurveys introduced in Section 2.2, and let and denote the numbers of survey samples and test-positive samples in the lth serosurvey, respectively, . The index represents the country index in which the lth serosurvey is conducted, and the index indicates the last date in the sampling period of the lth serosurvey.
3.2.1. Reparameterization of model for one serosurvey
We reparametrize model (14) since we are interested in the seroprevalence by infection . The reparameterized model is
(15) |
, where and are the sensitivity and specificity of the serology test used in the lth survey, respectively. Recall that and denote the seroprevalence by infection and the proportion of the effectively vaccinated, respectively, in the th country at date. If a serosurvey is conducted before vaccination, then . Note that among 99 serosurveys, 80 surveys are conducted before vaccination.
We construct a prior distribution on from the number of effectively vaccinated in (2), divided by the population. Recall that the distribution for the number of effectively vaccinated is derived only for dates when the vaccination report is provided. If there is no vaccination report of the th country in date , we use the most recent report from the date. Given the prior on , we propose a Bayesian method to estimate in the following section.
3.2.2. Model for serosurvey data over countries
We propose a hierarchical model to analyze the serosurvey data over countries. Let denote the proportion of the cumulative confirmed cases, which is referred to as confirmed ratio in the th country at date, respectively. We assume that random variable is explained by country-specific random effect and country statistics: adult population density and GDP per capita of the corresponding country. Note that the random variable represents the ratio of the number of infected to that of confirmed. We represent this assumption as
(16) |
where and are the standardized log adult population density and log of GDP per capita, and is the truncated normal distribution with mean μ, covariance and the range of . Combining (15) and (16), we construct the hierarchical model as
(17) |
Next, we describe prior distributions on , τ, , σ, , , and . As suggested in Section 3.2.1, we use the distribution (2) for the prior on . Gelman et al. [8] suggested the flat prior for the standard deviation σ in hierarchical models, and they also showed that this prior gives the proper posterior distribution when flat priors are given for other parameters, , τ, and for our model. For and , we construct prior distributions based on the method in Section 4 of [11]. We give the detail in supplementary material.
4. Results
In this section, we give the results of the Bayesian inference for the regression models and the hierarchical models in Section 3, and we give the results of world seroprevalence estimation. We use NIMBLE [5] for the Bayesian inference of these models. In each inference, we generate 4000 posterior samples, including 2000 burn-in sample for 4 chains. All codes are available on https://github.com/klee564/worldsero.
In Section 4.1, we give the simulation study for models proposed in Section 3. In Section 4.2, we give the posterior distributions of the regression coefficients and the vaccine effectiveness parameters. In Section 4.3, we derive the predictive posterior distributions of and for each date and country and summarize the posterior distributions to figure out the world seroprevalence.
4.1. Simulation study
We conduct a simulation study to evaluate the models proposed in Section 3. We set the parameters in (4), in (11), and in (13) as the random values from the following distributions:
We set the parameters and σ in (17) as the random values from the following distributions:
We set the covariates and the number of observations as the real data values and generate the simulation data with 100 repetitions.
We estimate the parameter of the models by the Bayesian method, which gives credible intervals for the parameters. We evaluate the accuracy of the Bayesian method by the coverage probability for the true parameter of the credible intervals since we are interested in interval estimation. Table 2 gives the coverage probabilities of the credible intervals for the regression coefficients and vaccine effectiveness parameters. The coverage probabilities attain the nominal probability.
Table 2.
Parameter | ||||||
---|---|---|---|---|---|---|
Coverage probability |
4.2. Posterior distributions for regression coefficients and vaccine effectiveness
First, we present the posterior distributions for models (3) and (11), i.e. we give the posterior distributions of , and . The posterior distributions are represented in Figure 3.
Figure 3.
The posterior distribution of is concentrated around 1. Note that, by (4), represents the relation between the usage rate by vaccine and the ratio of vaccine delivery amounts. The posterior means of and are and 1.15 respectively. For the convenience of interpretation, we interpret and via the model (7), a simplified version for the case when only a type 2 vaccine is used. According to (7), we have
(18) |
The regression coefficients explain the relation between and via (18). Recall that the random variable represents the number of doses administrated to people who have gotten one dose but not finished vaccination, and approximates the doses administrated for recent T days, where T is the required interval of the vaccine.
We give the posterior distributions of and in model (17). The posterior samples are summarized in Figure 4.
Figure 4.
The posterior samples of and in model (17).
Recall that the regression coefficients and appear in the following distribution:
The left term represents the log ratio of the seroprevalence by infection to the confirmed ratio, and and are the regression coefficients for the population density and the GDP, respectively. The posterior mean and the credible interval of are 0.105 and - , respectively. For the , the posterior mean and the credible intervals are and - - , respectively.
Next, we give the posterior distributions of the vaccine effectiveness parameter in Figure 5.
Figure 5.
The box plot represents the posterior sample of each vaccine and vaccination status, where vaccination status means whether the subject is partially or fully vaccinated. The x-axis represents the vaccine, and the type in the legend represents the vaccination status.
We have found that the vaccine effectivenesses with fully vaccinated Pfizer and Moderna are and , respectively, which is high compared to other vaccines.
4.3. Estimation of world seroprevalence
We derive the predictive posterior distributions of and for the ith country in t date. Recall that and denote the proportion of the effectively vaccinated population and seroprevalence by infection of the ith country at t date, respectively. We also define seroprevalence of the ith country at t date as
The predictive posterior distribution of is derived from the effectively vaccinated population, in (2), divided by the population . Recall that the index j in indicate report index, and reports are not given for everyday. When there is no report in date t, we use the most recent report from date t. The predictive posterior distribution of is derived from the distribution
in (16), given , and .
Next, we define the trend of world seroprevalences using , and . We define , and as
where P is the sum of population of the all countries considered. The variables , and describe the trends of world seroprevalence by infection, the proportion of effectively vaccinated in the world and the world seroprevalence, respectively, and these are represented in Figure 6.
Figure 6.
The trends of , and from beginning of January 2021 to the end of July 2021. The gray area denotes the credible interval. The black line represents the posterior mean. The left, center, right graphs represent the trends of , and , respectively.
As of 31st July the credible intervals of , and θ are , and , respectively. We compare our estimate with the result by Bergeri et al. [2], which estimates the world seroprevalence in July 2021 as . The centers of our estimate and [2] are and , respectively. The difference between the centers is . The range of our interval estimate covers the interval by Bergeri et al. [2]. Thus, our result is consistent with the result of [2].
We present a treemap in Figure 7, which shows the posterior means of seroprevalences by country on 31st July 2021. The seroprevalences of China and India are and , respectively, which are similar to the world's seroprevalence on this date. France and UK attain over seroprevalence on this date.
Figure 7.
The treemap presents the posterior means of seroprevalence by country on 31st July 2021. Each tile represents a country, and its area is proportional to the corresponding population. The color and the value in each tile represent the seroprevalence when t is 31st July 2021.
We evaluate the accuracy of credible intervals for the seroprevalence via the leave-one-out cross-validation method. We split the serosurvey data into train data and test observation. Each observation of the data is a serosurvey for a particular date in a particular country. We fit the proposed model with the training data and then derive the predictive posterior distribution for the test observation. Out of 99 test observations, 91 are included in the predictive posterior distribution, i.e. we get a coverage probability of .
5. Discussion
We have proposed a novel Bayesian approach to estimate the seroprevalence of COVID-19 antibodies among the global adult population. This approach begins by estimating the seroprevalences due to infection and vaccination for each country, and then employs a Bayesian hierarchical model to combine these estimates into a global seroprevalence figure. Additionally, we constructed informative priors by utilizing external sources, such as data from clinical trials.
There are many studies on the estimation of seroprevalence in a population. However, these studies focus on estimating the seroprevalence on the date and country in which the sample is collected, and hence the estimation of the world seroprevalence is not apparent. Furthermore, the previous works on the vaccination data were mainly on the cumulative doses administrated and the fully vaccinated population, while the method proposed in the paper predicted the effective vaccinated population using the information on the efficacies of vaccines.
The methods used in this paper could be improved. Firstly, in the hierarchical model for the seroprevalence of infection, additional covariates could be explored and used for the model. The covariates used in this study are national statistics which do not depend on the date factor. Thus, we expect that explanatory power could be improved by adding the date-dependent covariate, such as the daily number of COVID tests in a country. Secondly, the model could be refined by considering the sampling period, as the current analysis only uses the last day of the sampling period. This approach may not fully capture the trends in infection rates over time. Lastly, the current study has some limitations, including being based on data up to July 2021 and not accounting for the decline in neutralizing antibodies among vaccinated individuals, potential underreporting of COVID-19 cases, changes in social isolation regulations, population adherence, and test availability. To improve the accuracy of the results, it may be necessary to update the data and incorporate these limitations into the model.
Supplementary Material
Acknowledgments
The first and second authors equally contributed to this work.
Funding Statement
Kwangmin Lee was supported by Chonnam National University (Grant number: 2023-0482) and the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. RS-2023-00211979). Seongil Jo was supported by INHA UNIVERSITY Research grant and the National Research Foundation of Korea (NRF) grant funded by the Korea government(MSIT) (NRF-2022R1A5A7033499 and RS-2023-00209229). Jaeyong Lee was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government(MSIT) (No. 2020R1A4A1018207 and NRF-2023R1A2C1003050).
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.Arora R.K., Joseph A., Van Wyk J., Rocco S., Atmaja A., May E., Yan T., Bobrovitz N., Chevrier J., Cheng M.P., Williamson T., and Buckeridge D.L., SeroTracker: a global SARS-CoV-2 seroprevalence dashboard, Lancet Infect. Dis. 21 (2021), pp. e75–e76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Bergeri I., Whelan M., Ware H., Subissi L., Nardone A., Lewis H.C., Li Z., Ma X., Valenciano M., Cheng B., Ariqi L.A., Rashidian A., Okeibunor J., Azim T., Wijesinghe P., Le L.-V., Vaughan A., Pebody R., Vicari A., Yan T., Yanes-Lane M., Cao C., Clifton D.A., Cheng M.P., Papenburg J., Buckeridge D., Bobrovitz N., Arora R.K., and Van Kerkhove M.D., Unity Studies Collaborator Group , Global epidemiology of SARS-CoV-2 infection: A systematic review and meta-analysis of standardized population-based seroprevalence studies, jan 2020-oct 2021, MedRxiv (2021), pp. 2021–12.
- 3.Bodnar O., Link A., Arendacká B., Possolo A., and Elster C., Bayesian estimation in random effects meta-analysis using a non-informative prior, Stat. Med. 36 (2017), pp. 378–399. [DOI] [PubMed] [Google Scholar]
- 4.Bodnar O., Link A., and Elster C., Objective Bayesian inference for a generalized marginal random effects model, Bayesian Anal. 11 (2016), pp. 25–45. [Google Scholar]
- 5.de Valpine P., Turek D., Paciorek C.J., Anderson-Bergman C., Lang D.T., and Bodik R., Programming with models: Writing statistical algorithms for general model structures with NIMBLE, J. Comput. Graph. Stat. 26 (2017), pp. 403–413. [Google Scholar]
- 6.Dong Q. and Gao X., Bayesian estimation of the seroprevalence of antibodies to SARS-CoV-2, JAMIA Open 3 (2020), pp. 496–499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Forni G. and Mantovani A., on behalf of the COVID-19 Commission of Accademia Nazionale dei Lincei, Rome , COVID-19 vaccines: Where we stand and challenges ahead, Cell Death Differ. 28 (2021), pp. 626–639. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Gelman A., Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper), Bayesian Anal. 1 (2006), pp. 515–534. [Google Scholar]
- 9.Higdon M.M., Wahl B., Jones C.B., Rosen J.G., Truelove S.A., Baidya A., Nande A.A., ShamaeiZadeh P.A., Walter K.K., Feikin D.R., Patel M.K., Knoll M.D., and Hill A.L., A systematic review of COVID-19 vaccine efficacy and effectiveness against SARS-CoV-2 infection and disease, MedRXiv. (2021), pp. 9. [DOI] [PMC free article] [PubMed]
- 10.Kline D., Li Z., Chu Y., Wakefield W.C., Miller J., Turner A.N., and Clark S.J., Estimating seroprevalence of SARS-CoV-2 in Ohio: A Bayesian multilevel poststratification approach with multiple diagnostic tests, preprint (2020), arXiv:2011.09033. [DOI] [PMC free article] [PubMed]
- 11.Lee K., Jo S., and Lee J., Seroprevalence of SARS-CoV-2 antibodies in South Korea, J. Korean Stat. Soc. 50(3) (2021), pp. 891–904. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lu H., Stratton C.W., and Tang Y.W., Outbreak of pneumonia of unknown etiology in Wuhan, China: The mystery and the miracle, J. Med. Virol. 92 (2020), pp. 401–402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Mathieu E., Ritchie H., Ortiz-Ospina E., Roser M., Hasell J., Appel C., Giattino C., and Rodés-Guirao L., A global database of COVID-19 vaccinations, Nat. Hum. Behav. 5(7) (2021), pp. 947–953. [DOI] [PubMed] [Google Scholar]
- 14.Stringhini S., Wisniak A., Piumatti G., Azman A.S., Lauer S.A., Baysson H., De Ridder D., Petrovic D., Schrempft S., Marcus K., Yerly S., Vernez I.A., Keiser O., Hurst S., Posfay-Barbe K.M., Trono D., Pittet D., Gétaz L., Chappuis F., Eckerle I., Vuilleumier N., Meyer B., Flahault A., Kaiser L., and Guessous I., Seroprevalence of anti-SARS-CoV-2 IgG antibodies in Geneva, Switzerland (SEROCoV-POP): A population-based study, The LANCET 396 (2020), pp. 313–319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.The New York Times , Coronavirus vaccine tracker, 2021. Available at https://www.nytimes.com/interactive/2020/science/coronavirus-vaccine-tracker.html. Accessed July 31, 2021.
- 16.UNICEF , COVID-19 vaccine market dashboard, 2021. Available at https://www.unicef.org/supply/covid-19-vaccine-market-dashboard/. Accessed July 31, 2021.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.