Estimation of world seroprevalence of SARS-CoV-2 antibodies

Kwangmin Lee; Seongmin Kim; Seongil Jo; Jaeyong Lee

doi:10.1080/02664763.2024.2335569

. 2024 Apr 2;51(15):3039–3058. doi: 10.1080/02664763.2024.2335569

Estimation of world seroprevalence of SARS-CoV-2 antibodies

Kwangmin Lee ^a, Seongmin Kim ^b, Seongil Jo ^c,^CONTACT, Jaeyong Lee ^b

PMCID: PMC11536633 PMID: 39507207

Abstract

In this paper, we estimate the seroprevalence against COVID-19 by country and derive the seroprevalence over the world. To estimate seroprevalence among adults, we use serological surveys (also called the serosurveys) conducted within each country. When the serosurveys are incorporated to estimate world seroprevalence, there are two issues. First, there are countries in which a serological survey has not been conducted. Second, the sample collection dates differ from country to country. We attempt to tackle these problems using the vaccination data, confirmed cases data, and national statistics. We construct Bayesian models to estimate the numbers of people who have antibodies produced by infection or vaccination separately. For the number of people with antibodies due to infection, we develop a hierarchical model for combining the information included in both confirmed cases data and national statistics. At the same time, we propose regression models to estimate missing values in the vaccination data. As of 31st of July 2021, using the proposed methods, we obtain the $95 %$ credible interval of the world seroprevalence as $[35.5 %, 56.8 %]$ .

Keywords: Bayesian model, hierarchical model, SARS-CoV-2 antibodies, vaccination, world seroprevalence

1. Introduction

At the beginning of December 2019, the first coronavirus disease 2019 (abbreviated COVID-19) patient, due to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was identified in Wuhan, China [12]. In the following weeks, the disease rapidly spread all over China and other countries, which caused worldwide damage and is still widespread. According to the official statement, COVID-19 has so far caused more than 317 million infections and 5.5 million deaths globally.

Vaccines are a critical tool for protecting people because of producing antibodies against infectious diseases. Every country in the world is struggling to block the spread of the virus and treat patients. As part of that, countries are administering COVID-19 vaccines, and the majority of people in many countries have been given the vaccines. There are a variety of available COVID-19 vaccines, e.g. AstraZeneca, Johnson & Johnson, Moderna, Novavax, and Pfizer-BioNTech, and candidates currently in Phase III clinical trials [7].

Seroprevalence is the ratio of people with antibodies, which is produced by previous infection or vaccines, to a particular virus in a population. In this paper, we study the seroprevalence of SARS-CoV-2 infections in people all over the world, particularly in adults, using information officially reported by countries. The available information includes confirmed cases, the number of people vaccinated, types of vaccines, and serosurvey data.

Recently, there have been various approaches for estimating the seroprevalence of antibodies to SARS-CoV-2. For example, [6] proposed a Bayesian method that uses a user-specific likelihood function being able to incorporate the variabilities of specificity and sensitivity of the antibody tests, [14] utilized a Bayesian logistic regression model with a random effect for the age and sex, and [10] developed a Bayesian multilevel poststratification approach with multiple diagnostic tests. Lee et al. [11] presented a Bayesian binomial model with an informative prior distribution based on clinical trial data of the plaque reduction neutralization test (PRNT), a kind of serology test. While these approaches have been developed for the populations in certain regions, not global, [2] considered the seroprevalence in the global region by proposing a meta-regression method. In this paper, we propose a new Bayesian method for estimating the world seroprevalence of SARS-CoV-2 antibodies and verify the proposed method via numerical studies. This method estimates the percentage of people who have developed antibodies due to viral infection or vaccination in each country and combines these estimates using a hierarchical Bayesian model. Additionally, the method utilizes informative priors constructed from external information to enhance the accuracy of our estimates. By doing so, we can provide global seroprevalence estimates that better reflect available information and uncertainty. We assess the accuracy of the proposed method via a simulation study and the leave-one-out cross-validation method for the real data.

The rest of the paper is organized as follows. In the next section, we introduce the serology test and vaccination datasets for the SARS-CoV-2 and briefly review the model proposed in [11] for constructing an informative prior. In Section 3, we propose a new Bayesian approach to estimate the world seroprevalence of SARS-CoV-2. Section 4 presents the results of empirical analysis using real data. Finally, the conclusion is given in Section 5.

2. Materials

2.1. Vaccine data

In this subsection, we introduce the notation used in the rest of the paper and describe datasets for estimating the number of effectively vaccinated people by country. The datasets include the vaccinations, delivery amount of vaccines, and observational studies for vaccine effectiveness.

2.1.1. Vaccination data by country

We utilize the vaccination data given in [13], which is collected from official public reports on vaccinations against COVID-19 by country. The dataset contains the cumulative vaccine doses administrated, the cumulative number of fully vaccinated people, the report dates, and the information for vaccine manufacturers. As of 31 July 2021, the number of countries on reports is 182.

We denote the jth report date of the ith country using $d_{i, j}$ where $i = 1, 2, \dots, 182$ and $j = 1, 2, \dots, J_{i}$ , and the cumulative doses administrated and the cumulative number of fully vaccinated people until the date $d_{ij}$ are denoted by $X_{i, j}$ and $Y_{i, j}$ , respectively, for the jth report of the ith country. Note that $X_{i, j}$ is observed for all i and j, while $Y_{i, j}$ is not observed in some reports. Specifically, $Y_{i, j}$ is not observed at all in two countries, Cote d'Ivoire and Ethiopia, and is partially observed in 113 countries. We denote the set of vaccine manufacturers used at the corresponding date by $V_{i, j}$ . For example, if the vaccines produced by AstraZeneca and Pfizer-BioNTech are only used at the jth report date of the ith country, then $V_{i, j} = {AstraZeneca, Pfizer-BioNTech}$ .

We define $X_{i, j, k}$ as the cumulative doses by vaccines from the kth manufacturer for $k = 1, \dots, K$ , where K is the number of vaccine manufacturers in the whole vaccination data. With this definition, we have $X_{i, j} = \sum_{k = 1}^{K} X_{i, j, k}$ . In the vaccination data we consider, $X_{i, j, k}$ are observed in 32 countries.

2.1.2. Delivery amount of vaccines

As of the 31st July, [16] presents the delivery data, which refer to the amounts of doses that a country has received. The delivery data consist of publicly reported delivered vaccine amounts, including bilateral agreement, COVAX shipment, and donations. Among 182 countries providing vaccination reports (Section 2.1.1), the delivery data are available for 140 countries. We use these data for the estimation of missing values of $X_{i, j, k}$ .

Let $D$ be the set of country indexes having the delivery amount data, and let ${\tilde{s}}_{i, k}$ be the delivery amount of the kth vaccine in the ith country, $i \in D$ . We define $s_{i, k}$ as

s_{i, k} = {\begin{cases} {\tilde{s}}_{i, k} / \sum_{k = 1}^{K} {\tilde{s}}_{i, k} & if i \in D \\ \sum_{i \in D} {\tilde{s}}_{i, k} / \sum_{k = 1}^{K} \sum_{i \in D} {\tilde{s}}_{i, k} & if i \notin D, \end{cases}

which denotes the proportion of the kth vaccine delivered in the ith country. Note that for the case $i \notin D$ , this definition is based on the assumption that the delivery amount of the kth vaccine in a country is affected by the total supply of this vaccine.

2.1.3. Observational studies for vaccine effectiveness

In the vaccination data introduced in Section 2.1.1, twelve kinds of vaccines are used. We identify the vaccines by the name of manufacturers, which are listed in Table 1. We categorize these vaccines into three groups called type 1, 2, and 3 vaccines. The numbering of the type represents the required doses for one person to be fully vaccinated.

Table 1.

The list of vaccine manufacturers in the vaccination data [13].

type	manufacturer	interval (days)	number of studies (fully/ partially)
1	Janssen	–	8/0
	CanSino	–	0/0
2	AstraZeneca (AZ)	84	17/15
	Pfizer	21	80/46
	Sinopharm	21	1/0
	Sputnik V	21	1/0
	Sinovac	14	2/1
	Moderna	28	40/26
	Covaxin	28	0/0
	QazVac	21	0/0
	EpiVacCorona	21	0/0
3	RBD-Dimer	56	0/0

Open in a new tab

Note: In the third column are the recommended intervals between the first and last doses of each vaccine, which are obtained from [15]. In the fourth column, the number of studies on vaccine effectiveness is presented. The studies targeting fully vaccinated and partially vaccinated people are distinguished.

A vaccine is evaluated by its efficacy or effectiveness, which measures how well vaccination protects people against infection, symptomatic illness, hospitalization, or death. While the efficacy is based on the controlled clinical trial, the effectiveness is based on real-world observation studies. In this paper, we consider the effectiveness since we analyze real-world vaccination data.

Higdon et al. [9] conducted a systematic review of COVID-19 effectiveness studies. They collected 107 effectiveness studies, categorized into four groups: effectiveness studies against death, severe disease, symptomatic disease, and any infection. We use the effectiveness studies against any infection for seroprevalence estimation. Then, we have 69 studies for 7 vaccines: Pfizer, Moderna, AstraZeneca (AZ), Sputnik V, Janssen, Sinovac, and Sinopharm. We summarize the 69 observational studies in Table 1. Observation studies for type 2 vaccines have the effectiveness of fully vaccinated and the effectiveness of partially vaccinated.

2.2. Serological survey data

We collect the serological survey data from SeroTracker, a knowledge hub of COVID-19 serosurveillance [1]. We exclude survey data that has a risk of a biased sample. Specifically, we consider the following two exclusion conditions:

The sample is collected from a sub-population.
The seroprevalence is lower than the COVID confirmed population rates.

The survey data collected by Serotracker includes surveys from specific sub-populations, such as a particular region, age group, or healthcare workers. We exclude surveys targeting these sub-populations to ensure our analysis represents the national population, leaving us with 126 serological surveys after applying the first exclusion criterion. The second exclusion criterion pertains to the conceptual distinction between individuals with confirmed COVID-19 and those with detectable SARS-CoV-2 antibodies; surveys that do not adhere to this distinction are also excluded. Following the application of this second criterion, by July 31st, 2021, we have 99 serological surveys from 45 countries. Each survey is characterized by its own sampling period. The histogram of the last dates in the sampling periods is shown in Figure 1.

Figure 1. — The histogram of the last dates in the sampling periods for 99 nationwide serological surveys.

3. A Bayesian method for the seroprevalence estimation

We present a Bayesian method to estimate the seroprevalence. Specifically, we propose the method for estimation of the seroprevalence based on the two parts: the proportions of the effectively vaccinated and of the infected, which are denoted by $θ^{(V)}$ and $θ^{(I)}$ , respectively.

Recall that the effectively vaccinated are people with antibodies produced from vaccines and that the infected are those who have gotten the antibodies by infection. We define the seroprevalence $θ_{i} (t)$ at t date of the ith country as

θ_{i} (t) = θ_{i}^{(V)} (t) + θ_{i}^{(I)} (t) - θ_{i}^{(V)} (t) θ_{i}^{(I)} (t),

where the product terms $θ_{i}^{(V)} (t) θ_{i}^{(I)} (t)$ represent the cases in which the infected are vaccinated without the knowledge of infection. We provide Bayesian models to estimate $θ_{i}^{(V)} (t)$ and $θ_{i}^{(I)} (t)$ in next two Subsections 3.1 and 3.2, respectively, for each country and date.

3.1. Models for vaccine induced seroprevalence

For the estimation of $θ_{i}^{(V)} (t)$ , we propose a Bayesian model to estimate the number of effectively vaccinated people. Let $M_{i, j}$ denote the number of effectively vaccinated people at the jth report date in the ith country. Note that the index j in $M_{i, j}$ indicate the report index of the vaccination data (Section 2.1.1), and vaccination reports are not given for everyday. If $M_{i, j}$ s for $j \in [J_{i}]$ are given, we can obtain $θ_{i}^{(V)} (t)$ as

θ_{i}^{(V)} (t) = {\begin{cases} 0 & if {j \in [J_{i}] : d_{i, j} \leq t} = \emptyset \\ M_{i, \tilde{j}} / P_{i} & otherwise, \end{cases}

(1)

where $\tilde{j} = max {j \in [J_{i}] : d_{i, j} \leq t}$ , and $P_{i}$ is the population of the ith country. When there is no report in date t, we use the most recent report from date t. Thus, we focus on the estimation of $M_{i, j}$ for the estimation of $θ_{i}^{(V)} (t)$ .

Let $Y_{i, j, k}$ be the number of fully vaccinated people by the kth vaccine at the jth report date in the ith country, and $E_{k}^{(f)}$ and $E_{k}^{(p)} \in [0, 1]$ be the efficacies of the kth vaccines for the fully vaccinated people and those who have at least one dose but have not finished the required doses, respectively. We assume that the distribution of $M_{i, j}$ is

M_{i, j} \sim \sum_{k} (Binom (Y_{i, j, k}, E_{k}^{(f)}) + Binom (\frac{2}{d (k)} (X_{i, j, k} - d (k) Y_{i, j, k}), E_{k}^{(p)})),

(2)

where $d (k)$ denotes the required doses of the kth vaccine.

The term $2 (d (k))^{- 1} (X_{i, j, k} - d (k) Y_{i, j, k})$ in (2) represents the partially vaccinated people of kth vaccine. If $d (k) = 1$ , since $X_{i, j, k} = Y_{i, j, k}$ by definitions, this term is zero. If $d (k) = 2$ , $2 (d (k))^{- 1} (X_{i, j, k} - d (k) Y_{i, j, k}) = X_{i, j, k} - 2 Y_{i, j, k}$ , which is the number of people who have gotten only one vaccine. If $d (k) = 3$ , $X_{i, j, k} - d (k) Y_{i, j, k}$ is the sum of the number of people vaccinated with one dose and twice the number of people vaccinated with two doses. Under the assumption that the number of people vaccinated once and twice is the same, $2 (X_{i, j, k} - 3 Y_{i, j, k}) / 3$ is equal to the number of people who have at least one dose of vaccination, but have not finished the required number of vaccination. We are aware that this assumption is not warranted, but since the vaccine requiring 3 doses is used only in one country, Uzbekistan, we believe that the effect of the assumption is not critical.

Since some of $X_{i, j, k}$ , $Y_{i, j, k}$ , $E_{k}^{(f)}$ and $E_{k}^{(p)}$ are not observed, we need statistical models for these variables. In Sections 3.1.1–3.1.3, we suggest a method to specify $X_{i, j, k}$ and $Y_{i, j, k}$ . In Section 3.1.4, we suggest a method to specify $E_{k}^{(f)}$ and $E_{k}^{(p)}$ .

3.1.1. Model for $X_{i, j, k}$

We consider a multinomial regression model for $X_{i, j, k}$ given $X_{i, j}$ and $s_{i, k}$ , which are defined in Sections 2.1.1 and 2.1.2, respectively. Let $X_{i, j} = (X_{i, j, 1}, X_{i, j, 2}, \dots, X_{i, j, K})^{⊤} \in R^{K}$ be the response vector and $w_{i, j} = (w_{i, j, 1}, \dots, w_{i, j, K})^{⊤} \in R^{K}$ be a covariate vector, which is to be defined with $s_{i, k}$ and $X_{i, j^{'}}$ for $j^{'} \in [j]$ , where $[n] := {1, 2, \dots, n}$ for a positive integer n. We assume

\begin{aligned} X_{i, j} & \sim Multinom (X_{i, j}, p_{i, j}), \\ p_{i, j} = (p_{i, j, 1}, \dots, p_{i, j, K})^{⊤} & \propto [\exp {β^{(V_{1})} \log (w_{i, j, 1})}, \dots, \exp {β^{(V_{1})} \log (w_{i, j, K})}], \end{aligned}

(3)

where $β^{(V_{1})} \in R$ is the regression coefficient. Model (3) implies that

\log (p_{i, j, x} / p_{i, j, y}) = β^{(V_{1})} \log (w_{i, j, x} / w_{i, j, y}),

(4)

for all $x, y \in [K]$ . Equation (4) means that the ratio of usage probability of the xth vaccine to that of the yth vaccine, $p_{i, j, x} / p_{i, j, y}$ , is proportional to the ratio of $w_{i, j, x}$ to $w_{i, j, y}$ after logarithm transformation. This assumption is examined via visualization after the definition of $w_{i, j}$ .

We now define $w_{i, j}$ using the variables for delivery amount $s_{i, k}$ and the numbers of doses administrated $X_{i, j^{'}}$ for $j^{'} \in [j]$ . In the definition of $w_{i, j}$ , we reflect the idea that $w_{i, j, k}$ is positively dependent both on the delivery amount of the kth vaccine in the ith country and the period during which the kth vaccine is used. First, let

\begin{aligned} d w_{i, j^{'}} & = (d w_{i, j^{'}, 1}, \dots, d w_{i, j^{'}, K}), \\ d w_{i, j^{'}, k} & = s_{i, k} d X_{i, j^{'}} I (v (k) \in V_{i, j^{'}}), for k = 1, \dots, K, \end{aligned}

(5)

where $v (k)$ is the kth vaccine, $d X_{i, j^{'}} = X_{i, j^{'}} - X_{i, j^{'} - 1}$ and $X_{i, 0} = 0$ . The variable $d w_{i, j^{'}, k}$ is defined by multiplying the number of doses administrated at the date of the $j^{'}$ th report, $d X_{i, j^{'}}$ , to the delivery amount of the kth vaccine in the ith country if the kth vaccine is used at this date. Otherwise, we set $d w_{i, j^{'}, k}$ as zero. Then, we define $w_{i, j} := \sum_{j^{'} \leq j} d w_{i, j^{'}}$ . Figure 2 is the scatter plot for the points in the set ${(\log (X_{i, j, x} / X_{i, j, y}), \log (w_{i, j, x} / w_{i, j, y})) : both of X_{i, j, x} and X_{i, j, y} are observed}$ , and shows that the linearity assumption in (4) is reasonable.

Figure 2. — The scatter plot for the points in the set ${(\log (X_{i, j, x} / X_{i, j, y}), \log (w_{i, j, x} / w_{i, j, y})) : both of X_{i, j, x} and X_{i, j, y} are observed.}$ .

We assign a non-informative prior distribution for $β^{(V_{1})}$ :

π (β^{(V_{1})}) \propto 1.

Theorem 3.1 shows that the posterior distribution under the flat prior is proper. The proof is given in the supplementary material.

Theorem 3.1

Suppose $X_{i, j}$ follows the distribution (3) for $j = 1, 2, \dots, J_{i}$ and $i \in 1, 2, \dots, N_{1}$ . Let $U_{i, j} = {k \in [K] : w_{i, j, k} > 0}$ . If there exists $(i, j)$ such that $| {k \in U_{i, j} : X_{i, j, k} > 0} | \geq 2$ , then

$\int_{- \infty}^{\infty} \prod_{i, j} p (X_{i, j} ∣ p_{i, j} (β^{(V_{1})})) d β^{(V_{1})} < \infty,$

where $p_{i, j} (β^{(V_{1})})$ is $p_{i, j}$ constructed by $β^{(V_{1})}$ , and $p (X_{i, j} ∣ p_{i, j} (β^{(V_{1})}))$ is the density function with parameter $p_{i, j} (β^{(V_{1})})$ and observation $X_{i, j}$ .

3.1.2. Model for $Y_{i, j}$

There are missing values in $Y_{i, j}$ (the cumulative number of fully vaccinated people at the jth report date of the ith country), and we propose a distribution for the missing values. To do this, we first present methods for three simple cases in which only one type of vaccines are used in the country i up to the report date $d_{i, j}$ , and then expand those to the method for the general case in which mixed types of vaccines are used in the country i up to the report date $d_{i, j}$ .

In Case 1 in which only type 1 vaccines are used, $Y_{i, j}$ is easily derived from $X_{i, j}$ since the vaccination is completed with only one dose. Thus, we have

2 (X_{i, j} - Y_{i, j}) = 0 .

(6)

In Case 2, in which only type 2 vaccines are used, we employ the Poisson distribution to the random variable $X_{i, j} - 2 Y_{i, j}$ . Note that $X_{i, j} - 2 Y_{i, j}$ is the number of the doses administrated to people who have gotten one dose but not finished vaccination as of the jth report date of the ith country. We assume that the longer the interval between the first and the last doses is, the larger $X_{i, j} - 2 Y_{i, j}$ is. We also assume that the larger the doses recently administrated is, the larger $X_{i, j} - 2 Y_{i, j}$ is.

To specify the doses recently administrated, we address the relation between the report index j and the corresponding report date. For each report index j, $d_{i, j}$ is defined as the report date, and $d_{i, j}$ satisfies $d_{i, 1} < d_{i, 2} < \dots < d_{i, J_{i}}$ . In the vaccination data, there exists an index j such that $d_{i, j} - d_{i, j - 1} > 1$ , i.e. the reports are not given for everyday. When we need vaccination data for date d with ${d : d_{i, j} = d, j = 1, \dots, J_{i}} = \emptyset$ , we use the data from the closest report. Specifically, we define $j^{*} (j, δ; i)$ , to indicate the closest report index from date $d_{j} - δ$ , as

j^{*} (j, δ; i) = min {{argmin}_{j^{'} \leq j - 1} | d_{i, j} - d_{i, j^{'}} - δ |},

for country index i, report index j and positive integer δ. According to the definition of $j^{*} (j, δ; i)$ , when there are more than one minimizer in ${argmin}_{j^{'} \leq j - 1} | d_{i, j} - d_{i, j^{'}} - δ |$ , we use the smallest index. In this paper, we set $δ = 21$ , and if there is no confusion, we let $j^{*}$ denote $j^{*} (j, δ; i)$ . Using the definition of $j^{*}$ , we define $Z_{i, j} = (X_{i, j} - X_{i, j^{*}}) / (d_{i, j} - d_{i, j^{*}})$ representing the average of daily doses recently administrated, and we define $W_{i, j} = Z_{i, j} T$ approximating the doses administrated for recent T days, where T is the required interval between the first and last doses.

Supposing only one kind of type 2 vaccine is used, we propose the regression model

(X_{i, j} - 2 Y_{i, j}) \sim Pois (\exp (β_{0}^{(V_{2})} + β_{1}^{(V_{2})} \log (W_{i, j}))) .

(7)

This model reflects the assumptions that $(X_{i, j} - 2 Y_{i, j})$ is positively related to the doses administrated for recent T days. Recall that $X_{i, j} - 2 Y_{i, j}$ is the number of the doses administrated to people who have gotten one dose but not finished vaccination as of the jth report date of the ith country. We suppose that people who have gotten only one dose had the first dose in recent T days based on the required interval.

The model (7) can be used only when one kind of type 2 vaccine is used. We expand (7) to consider the case when $K^{'}$ kinds of type 2 vaccines are possibly used, where $K^{'}$ is a positive integer larger than 1. We substitute T in $W_{i, j}$ to the weighted mean of the intervals as $\sum_{k = 1}^{K^{'}} w_{i, j, k}^{*} T_{k}$ . Here $T_{k}$ is the required interval between the first and last doses of the kth vaccine. We define $w_{i, j, k}^{*}$ as

w_{i, j, k}^{*} = \frac{\sum_{j^{'} = j^{*}}^{j} d w_{i, j^{'}, k}}{\sum_{k = 1}^{K} \sum_{j^{'} = j^{*}}^{j} d w_{i, j^{'}, k}} .

(8)

Recall the definition of $d w_{i, j^{'}, k}$ in (5). The variable $d w_{i, j^{'}, k}$ is zero when the kth vaccine is not used at the $j^{'}$ th report date of the ith country; otherwise, this variable represents the delivery amount of the kth vaccine in the ith country multiplied by the doses administrated at the corresponding date. Thus, $w_{i, j, k}^{*}$ is constructed from the three factors: the delivery amount, the doses administrated during recent $d_{i, j} - d_{i, j^{*}}$ days, and whether the kth vaccine is used. Using the weighted mean of the intervals, we define $W_{i, j}^{(2)} = Z_{i, j} \sum_{k \in V^{(2)}} w_{i, j, k}^{*} T_{k}$ to replace $W_{i, j}$ in (7). We suggest the distribution for Case 2 as

(2 X_{i, j} - 2 Y_{i, j}) \sim X_{i, j} + Pois (\exp (β_{0}^{(V_{2})} + β_{1}^{(V_{2})} \log (W_{i, j}^{(2)}))),

(9)

where $V^{(2)}$ is the index set for type 2 vaccines.

Next, we propose a model for Case 3, in which only type 3 vaccines are used, using the similar idea as in Case 2. To do this, we use the random variable $X_{i, j} - 3 Y_{i, j}$ instead of $X_{i, j} - 2 Y_{i, j}$ . Here the variable $X_{i, j} - 3 Y_{i, j}$ represents the doses administrated to people who have not finished vaccination. Then we consider the Poisson model as

(X_{i, j} - 3 Y_{i, j}) \sim Pois (\exp (β_{0}^{(V_{2})} + β_{1}^{(V_{2})} \log (Z_{i, j} \sum_{k \in V^{(3)}} w_{i, j, k}^{*} T_{k}))),

where $W_{i, j}^{(3)} = Z_{i, j} \sum_{k \in V^{(3)}} w_{i, j, k}^{*} T_{k}$ , and $V^{(3)}$ is the index set of type 3 vaccines. We can re-express this distribution as

(2 X_{i, j} - 2 Y_{i, j}) \sim \frac{4}{3} X_{i, j} + \frac{2}{3} Pois (\exp (β_{0}^{(V_{2})} + β_{1}^{(V_{2})} \log (W_{i, j}^{(3)}))) .

(10)

Finally, we combine the models (6), (9) and (10) to construct the model for general case. Let $q_{l}$ be the weight of type l vaccines for l = 1, 2, 3 with $q_{1} + q_{2} + q_{3} = 1$ , which are defined as $q_{l} = \sum_{k \in V^{(l)}} w_{i, j, k}^{*}$ for l = 1, 2, 3. By combining (6), (9) and (10), we propose the generalized model as

\begin{aligned} 2 (X_{i, j} - Y_{i, j}) & \sim Pois (q_{2} (X_{i, j} + \exp (β_{0}^{(V_{2})} + β_{1}^{(V_{2})} \log (W_{i, j}^{(2)}))) \\ + q_{3} (\frac{4}{3} X_{i, j} + \frac{2}{3} \exp (β_{0}^{(V_{2})} + β_{1}^{(V_{2})} \log (W_{i, j}^{(3)})))) . \end{aligned}

(11)

We choose the flat prior distribution on $β_{1}$ and $β_{0}$ ,

π (β_{0}^{(V_{2})}, β_{1}^{(V_{2})}) \propto 1.

The following theorem shows that the prior induces the proper posterior distribution. The proof for this theorem is given in the supplementary material.

Theorem 3.2

Let n be a positive integer with $n \geq 2$ , and let $x_{1}, x_{2}, \dots, x_{n} \in R$ and $y_{1}, y_{2}, \dots, y_{n} \in N$ . If there exists a pair of indexes i and j such that $x_{i} \neq x_{j}$ , then

$\int_{- \infty}^{\infty} \int_{- \infty}^{\infty} \prod_{i = 1}^{n} λ_{i}^{y_{i}} \exp (- λ_{i}) d β_{0}^{(V_{2})} d β_{1}^{(V_{2})} < \infty,$

where $λ_{i} = \exp (β_{0}^{(V_{2})} + β_{1}^{(V_{2})} x_{i})$ .

3.1.3. Distributional assumption for $Y_{i, j, k}$

In this subsection, we provide a distribution for $Y_{i, j, k}$ given $Y_{i, j}$ and $X_{i, j^{'}}$ for $j^{'} \in [j]$ . This distribution is based on the following three premises:

$\sum_{k = 1}^{K} Y_{i, j, k} = Y_{i, j}$
$Y_{i, j, k} = X_{i, j, k}$ for $k \in V^{(1)}$
$Y_{i, j, k}$ is positively dependent on $X_{i, j^{*} (j, T_{k}; i), k}$ for $k \notin V^{(1)}$

The first premise is obvious from the definitions of $Y_{i, j, k}$ and $Y_{i, j}$ , and the second premise is based on the definitions of $X_{i, j, k}$ and $Y_{i, j, k}$ . When a type 1 vaccine is considered, the number of fully vaccinated people $Y_{i, j, k}$ coincides with the number of doses $X_{j, j, k}$ since only one dose is required for this type of vaccine. Next, we address the third premise. Recall that $T_{k}$ is the interval between first and last doses of the kth manufacturer's vaccine, and $j^{*} (j, T_{k}; i)$ is defined so that $d_{i, j} - d_{i, j^{*}} \approx T_{k}$ . Those who have gotten the first dose of the kth vaccine until the $j^{*} (j, T_{k}; i)$ th report date are expected to be fully vaccinated until jth report date. Thus, we assume that $Y_{i, j, k}$ is positively dependent on $X_{i, j^{*} (j, T_{k}; i), k}$ .

Using the premises, we suggest a distribution for $Y_{i, j, k}$ for $k \notin V^{(1)}$ . We let ${\tilde{Y}}_{ij} = (Y_{i, j, k (1)}, Y_{i, j, k (2)}, \dots, Y_{i, j, k (\tilde{K})})$ , which is the vector comprised of $Y_{i, j, k}$ s excluding the type 1 vaccines. Likewise we let ${\tilde{X}}_{i j^{*}} = (X_{i, j^{*} (j, T_{k (1)}; i), k (1)}, X_{i, j^{*} (j, T_{k (2)}; i), k (2)}, \dots, X_{i, j^{*} (j, T_{k (\tilde{K})}; i), k (\tilde{K})})$ . Given ${\tilde{X}}_{i, j^{*}}$ , $Y_{i, j}$ and $X_{i, j, k}$ , we suggest the distribution for ${\tilde{Y}}_{i, j}$ as

{\tilde{Y}}_{i, j} \sim Multinom (Y_{i, j} - \sum_{k \in V^{(1)}} X_{i, j, k}, {\tilde{X}}_{i, j^{*}} / \sum_{l = 1}^{\tilde{K}} X_{i, j^{*}, k (l)}) .

3.1.4. Model for the estimation of the vaccine effectiveness parameters

We propose a hierarchical model to analyze vaccine effectiveness studies introduced in Section 2.1.3. The hierarchical model extends the random effect model for meta-analysis proposed in [3]. The model in [3] is designed for the meta-analysis of one vaccine. We suggest the hierarchical model to consider more than one vaccine.

We review the model in [3]. Suppose we have n studies for the effectiveness of one vaccine. Bodnar et al. [3] defines the effect size as the log risk ratio, $\log (1 - VE)$ , where $VE \in [0, 1]$ is the vaccine effectiveness. The ith study gives the effect size $y_{i}$ with the standard error $σ_{i}$ . The random effect model supposes $(y_{1}, σ_{1}), \dots, (y_{n}, σ_{n})$ are generated from

\begin{aligned} y_{i} ∣ θ_{i}, σ_{i} & \sim N (θ_{i}, σ_{i}^{2}), \\ θ_{i} ∣ \bar{θ}, ω^{2} & \sim N (\bar{θ}, ω^{2}), \end{aligned}

(12)

where $θ_{i}$ represents the true effect size of the ith study and $\bar{θ}$ represents the overall mean of the true effect size. The parameter ω is the heterogeneity parameter, which represents environmental differences among the studies. For the Bayesian inference of model (12), [4] and [3] suggest the Berger and Bernardo reference prior as

π (μ, ω) \propto ω \sqrt{\sum_{i = 1}^{n} (σ_{i}^{2} + ω^{2})^{- 2}} .

We extend model (12) to analyze the observational studies of more than one vaccine. Suppose we have observational studies for K vaccines. Let $(y_{k, i}^{(0)}, σ_{k, i}^{(0)})$ and $(y_{k, i}^{(1)}, σ_{k, i}^{(1)})$ denote the results of the ith study for partially and fully vaccinated of the kth vaccine, respectively, $i = 1, \dots, n_{k}$ and $k = 1, \dots, K$ , where $n_{k}$ is the number of observational studies for the kth vaccine. The hierarchical model assumes that $y_{k, i}^{(0)}$ and $y_{k, i}^{(1)}$ are generated from the following distribution:

\begin{aligned} y_{k, i}^{(d)} & \sim N (θ_{k, i}^{(d)}, (σ_{k, i}^{(d)})^{2}), \\ θ_{k, i}^{(d)} & \sim N ({\bar{θ}}_{k}^{(d)}, ω^{2}), \\ {\bar{θ}}_{k}^{(d)} & \sim N (μ_{0}^{(d)}, (κ_{0}^{(d)})^{2}), \end{aligned}

(13)

where $θ_{k, i}^{(0)}$ and $θ_{k, i}^{(1)}$ represent the true effect size of the ith study for partially and fully vaccinated of the kth vaccine, respectively. The ${\bar{θ}}_{k}^{(0)}$ and ${\bar{θ}}_{k}^{(1)}$ represent the overall effect size for partially and fully vaccinated of the kth vaccine, respectively, and $μ_{0}^{(0)}$ and $μ_{0}^{(1)}$ represent the effectiveness of overall vaccines for partially and fully vaccinated of the kth vaccine, respectively. We append constraints on $θ_{k, i}^{(d)}$ , ${\bar{θ}}_{k}^{(d)}$ and $μ_{0}^{(d)}$ in the hierarchical model as follows:

\begin{aligned} I (θ_{k, i}^{(0)} & \leq θ_{k, i}^{(1)}), \\ I ({\bar{θ}}_{k}^{(0)} & \leq {\bar{θ}}_{k}^{(1)}), \\ I (μ_{0}^{(0)} & \leq μ_{0}^{(1)}) . \end{aligned}

The constraints imply that the effectiveness increases as the number of doses increases.

We suggest the following prior distribution:

\begin{aligned} π (ω) & \propto ω \sqrt{\sum_{k, i, d} ((σ_{k, i}^{(d)})^{2} + ω^{2})^{- 2}}, \\ π (μ_{0}^{(d)}) & \propto 1, \\ π (κ_{0}^{(d)}) & \propto 1. \end{aligned}

For the parameter of ω, we employ the prior distribution used by the random effect model (12). We assign the flat prior on the parameters of $μ_{0}^{(d)}$ and $κ_{0}^{(d)}$ motivated by Gelman et al. [8]. By the posterior distribution of ${\bar{θ}}_{k}^{(d)}$ , we estimate the effectiveness of the kth vaccine. If the observational studies of kth vaccine do not exist, i.e. $n_{k} = 0$ , the posterior distribution of ${\bar{θ}}_{k}^{(d)}$ is derived based on the overall mean parameter $μ_{0}^{(d)}$ in (13).

3.2. Models for infection induced seroprevalence

In this section, we propose a method to estimate $θ_{i}^{(I)} (t)$ using a hierarchical model, an extension of the model (14) proposed by Lee et al. [11],

X \sim Binom (N, p^{+} θ + (1 - p^{-}) (1 - θ)),

(14)

where N is the number of subjects in a serosurvey, X is the number of subjects who is test-positive, $p^{+}$ and $p^{-}$ are sensitivity and specificity of the serology test, respectively, and θ is the seroprevalence. While model (14) is used for the analysis of one set of serosurvey in a country, we suggest the hierarchical model to analyze the serosurvey data over countries given in Section 2.2.

First, we introduce a reparameterized form of model (14) in Section 3.2.1, and we propose the hierarchical model in Section 3.2.2 using the reparameterized model. We introduce notations for this section. We use 99 serosurveys introduced in Section 2.2, and let $N_{l}$ and $X_{l}$ denote the numbers of survey samples and test-positive samples in the lth serosurvey, respectively, $l = 1, \dots, 99$ . The index $i_{l}$ represents the country index in which the lth serosurvey is conducted, and the index $t_{l}$ indicates the last date in the sampling period of the lth serosurvey.

3.2.1. Reparameterization of model for one serosurvey

We reparametrize model (14) since we are interested in the seroprevalence by infection $θ_{i_{l}}^{(I)} (t_{l})$ . The reparameterized model is

\begin{aligned} X_{l} & \sim Binom (N_{l}, p_{l}^{+} θ_{i_{l}} (t_{l}) + (1 - p_{l}^{-}) (1 - θ_{i_{l}} (t_{l}))), \\ θ_{i_{l}} (t_{l}) & = θ_{i_{l}}^{(I)} (t_{l}) + θ_{i_{l}}^{(V)} (t_{l}) - θ_{i_{l}}^{(I)} (t_{l}) θ_{i_{l}}^{(V)} (t_{l}), \end{aligned}

(15)

$l = 1, \dots, 99$ , where $p_{l}^{+}$ and $p_{l}^{-}$ are the sensitivity and specificity of the serology test used in the lth survey, respectively. Recall that $θ_{i_{l}}^{(I)} (t_{l})$ and $θ_{i_{l}}^{(V)} (t_{l})$ denote the seroprevalence by infection and the proportion of the effectively vaccinated, respectively, in the $i_{l}$ th country at $t_{l}$ date. If a serosurvey is conducted before vaccination, then $θ_{i_{l}} (t_{l}) = θ_{i_{l}}^{(I)} (t_{l})$ . Note that among 99 serosurveys, 80 surveys are conducted before vaccination.

We construct a prior distribution on $θ_{i_{l}}^{(V)} (t_{l})$ from the number of effectively vaccinated in (2), divided by the population. Recall that the distribution for the number of effectively vaccinated is derived only for dates when the vaccination report is provided. If there is no vaccination report of the $i_{l}$ th country in date $t_{l}$ , we use the most recent report from the date. Given the prior on $θ_{i_{l}}^{(V)} (t_{l})$ , we propose a Bayesian method to estimate $θ_{i_{l}}^{(I)} (t_{l})$ in the following section.

3.2.2. Model for serosurvey data over countries

We propose a hierarchical model to analyze the serosurvey data over countries. Let $θ_{i_{l}}^{(C)} (t_{l})$ denote the proportion of the cumulative confirmed cases, which is referred to as confirmed ratio in the $i_{l}$ th country at $t_{l}$ date, respectively. We assume that random variable $\log (θ_{i_{l}}^{(I)} (t_{l}) / θ_{i_{l}}^{(C)} (t_{l}))$ is explained by country-specific random effect and country statistics: adult population density and GDP per capita of the corresponding country. Note that the random variable $θ_{i_{l}}^{(I)} (t_{l}) / θ_{i_{l}}^{(C)} (t_{l})$ represents the ratio of the number of infected to that of confirmed. We represent this assumption as

\begin{aligned} \log (θ_{i_{l}}^{(I)} (t_{l}) / θ_{i_{l}}^{(C)} (t_{l})) & \sim T N_{(0, - \log (θ_{i_{l}}^{(C)} (t_{l})))} (β_{i_{l}} + β_{1}^{(I)} P D_{i_{l}} + β_{2}^{(I)} G_{i_{l}}, τ^{2}), \\ β_{i_{l}} & \sim N (μ_{0}, σ^{2}), \end{aligned}

(16)

where $P D_{i_{l}}$ and $G_{i_{l}}$ are the standardized log adult population density and log of GDP per capita, and $T N_{(a, b)} (μ, σ^{2})$ is the truncated normal distribution with mean μ, covariance $σ^{2}$ and the range of $(a, b)$ . Combining (15) and (16), we construct the hierarchical model as

\begin{aligned} X_{l} & \sim Binom (N_{l}, p_{l}^{+} θ_{i_{l}} (t_{l}) + (1 - p_{l}^{-}) (1 - θ_{i_{l}} (t_{l}))), \\ θ_{i_{l}} (t_{l}) & = θ_{i_{l}}^{(I)} (t_{l}) + θ_{i_{l}}^{(V)} (t_{l}) - θ_{i_{l}}^{(I)} (t_{l}) θ_{i_{l}}^{(V)} (t_{l}), \\ \log (θ_{i_{l}}^{(I)} (t_{l})) - \log (θ_{i_{l}}^{(C)} (t_{l})) & \sim T N_{(0, - \log (θ_{i_{l}}^{(C)} (t_{l})))} (β_{i_{l}} + β_{1}^{(I)} P D_{i_{l}} + β_{2}^{(I)} G_{i_{l}}, τ^{2}), \\ β_{i_{l}} & \sim N (μ_{0}, σ^{2}), \end{aligned}

(17)

Next, we describe prior distributions on $θ_{i_{l}}^{(V)} (t_{l})$ , τ, $μ_{0}$ , σ, $β_{1}^{(I)}$ , $β_{2}^{(I)}$ , $p_{l}^{+}$ and $p_{l}^{-}$ . As suggested in Section 3.2.1, we use the distribution (2) for the prior on $θ_{i_{l}}^{(V)} (t_{l})$ . Gelman et al. [8] suggested the flat prior for the standard deviation σ in hierarchical models, and they also showed that this prior gives the proper posterior distribution when flat priors are given for other parameters, $μ_{0}$ , τ, $β_{1}^{(I)}$ and $β_{2}^{(I)}$ for our model. For $p_{l}^{+}$ and $p_{l}^{-}$ , we construct prior distributions based on the method in Section 4 of [11]. We give the detail in supplementary material.

4. Results

In this section, we give the results of the Bayesian inference for the regression models and the hierarchical models in Section 3, and we give the results of world seroprevalence estimation. We use NIMBLE [5] for the Bayesian inference of these models. In each inference, we generate 4000 posterior samples, including 2000 burn-in sample for 4 chains. All codes are available on https://github.com/klee564/worldsero.

In Section 4.1, we give the simulation study for models proposed in Section 3. In Section 4.2, we give the posterior distributions of the regression coefficients and the vaccine effectiveness parameters. In Section 4.3, we derive the predictive posterior distributions of $θ^{(V)}$ and $θ^{(I)}$ for each date and country and summarize the posterior distributions to figure out the world seroprevalence.

4.1. Simulation study

We conduct a simulation study to evaluate the models proposed in Section 3. We set the parameters $β^{(V_{1})}$ in (4), $(β_{0}^{(V_{2})}, β_{1}^{(V_{2})})$ in (11), $(μ_{0}^{(0)}, μ_{0}^{(1)}, κ_{0}^{(0)}, κ_{0}^{(1)}, ω)$ and $σ_{k, i}^{(d)}$ in (13) as the random values from the following distributions:

\begin{aligned} β^{(V_{1})} & \sim U (0.5, 1.5), \\ β_{0}^{(V_{2})} & \sim U (- 1, 0), \\ β_{1}^{(V_{2})} & \sim U (0, 1), \\ μ_{0}^{(1)} & \sim U (- 4, - 2), \\ μ_{0}^{(0)} ∣ μ_{0}^{(1)} & \sim μ_{0}^{(1)} + U (0, 2), \\ κ_{0}^{(0)}, κ_{0}^{(1)}, ω, & \sim U (0, 1), \\ σ_{k, i}^{(d)} & \sim U (0, 1) . \end{aligned}

We set the parameters $β_{1}^{(I)}, β_{2}^{(I)}, τ, μ_{0}$ and σ in (17) as the random values from the following distributions:

\begin{aligned} β_{1}^{(I)} & \sim U (- 0.5, 0.5), \\ β_{2}^{(I)} & \sim U (- 1, 0), \\ τ & \sim U (0, 1), \\ μ_{0} & \sim U (1.5, 2.5), \\ σ & \sim U (0, 1) . \end{aligned}

We set the covariates and the number of observations as the real data values and generate the simulation data with 100 repetitions.

We estimate the parameter of the models by the Bayesian method, which gives credible intervals for the parameters. We evaluate the accuracy of the Bayesian method by the coverage probability for the true parameter of the credible intervals since we are interested in interval estimation. Table 2 gives the coverage probabilities of the $95 %$ credible intervals for the regression coefficients and vaccine effectiveness parameters. The coverage probabilities attain the nominal probability.

Table 2.

Coverage probabilities for the parameters $β^{(V_{1})}$ in (3), ( $β_{0}^{(V_{2})}, β_{1}^{(V_{2})})$ in (11) and ${\bar{θ}}_{k}^{(d)}$ in (13).

Parameter	$β^{(V_{1})}$	$β_{0}^{(V_{2})}$	$β_{1}^{(V_{2})}$	${\bar{θ}}_{k}^{(d)}$	$β_{1}^{(I)}$	$β_{2}^{(I)}$
Coverage probability	$94 %$	$96 %$	$97 %$	$93.4 %$	$95 %$	$95 %$

Open in a new tab

4.2. Posterior distributions for regression coefficients and vaccine effectiveness

First, we present the posterior distributions for models (3) and (11), i.e. we give the posterior distributions of $β^{(V_{1})}$ , $β_{0}^{(V_{2})}$ and $β_{1}^{(V_{2})}$ . The posterior distributions are represented in Figure 3.

Figure 3. — The posterior samples of $β^{(V_{1})}$ , $β_{0}^{(V_{2})}$ and $β_{1}^{(V_{2})}$ in models (3) and (11).

The posterior distribution of $β^{(V_{1})}$ is concentrated around 1. Note that, by (4), $β^{(V_{1})}$ represents the relation between the usage rate by vaccine and the ratio of vaccine delivery amounts. The posterior means of $β_{0}^{(V_{2})}$ and $β_{1}^{(V_{2})}$ are $- 0.935$ and 1.15 respectively. For the convenience of interpretation, we interpret $β_{0}^{(V_{2})}$ and $β_{1}^{(V_{2})}$ via the model (7), a simplified version for the case when only a type 2 vaccine is used. According to (7), we have

E (X_{i, j} - 2 Y_{i, j}) = \exp (β_{0}^{(V_{2})}) W_{i, j}^{β_{1}^{(V_{2})}} .

(18)

The regression coefficients explain the relation between $X_{i, j} - 2 Y_{i, j}$ and $W_{i, j}$ via (18). Recall that the random variable $X_{i, j} - 2 Y_{i, j}$ represents the number of doses administrated to people who have gotten one dose but not finished vaccination, and $W_{i, j}$ approximates the doses administrated for recent T days, where T is the required interval of the vaccine.

We give the posterior distributions of $β_{1}^{(I)}$ and $β_{2}^{(I)}$ in model (17). The posterior samples are summarized in Figure 4.

Recall that the regression coefficients $β_{1}^{(I)}$ and $β_{2}^{(I)}$ appear in the following distribution:

\log (θ_{i_{l}}^{(I)} (t_{l}) / θ_{i_{l}}^{(C)} (t_{l})) \sim TN (β_{i_{l}} + β_{1}^{(I)} P D_{i_{l}} + β_{2}^{(I)} G_{i_{l}}, τ^{2}) .

The left term represents the log ratio of the seroprevalence by infection to the confirmed ratio, and $β_{1}^{(I)}$ and $β_{2}^{(I)}$ are the regression coefficients for the population density and the GDP, respectively. The posterior mean and the $95 %$ credible interval of $β_{1}^{(I)}$ are 0.105 and $[$ - $0.199, 0.413]$ , respectively. For the $β_{2}^{(I)}$ , the posterior mean and the credible intervals are $- 0.751$ and $[$ - $1.015,$ - $0.436]$ , respectively.

Next, we give the posterior distributions of the vaccine effectiveness parameter $E_{k}$ in Figure 5.

We have found that the vaccine effectivenesses with fully vaccinated Pfizer and Moderna are $84.2 %$ and $87.7 %$ , respectively, which is high compared to other vaccines.

4.3. Estimation of world seroprevalence

We derive the predictive posterior distributions of $θ_{i}^{(V)} (t)$ and $θ_{i}^{(I)} (t)$ for the ith country in t date. Recall that $θ_{i}^{(V)} (t)$ and $θ_{i}^{(I)} (t)$ denote the proportion of the effectively vaccinated population and seroprevalence by infection of the ith country at t date, respectively. We also define seroprevalence of the ith country at t date as

θ_{i} (t) = θ_{i}^{(V)} (t) + θ_{i}^{(I)} (t) - θ_{i}^{(V)} (t) θ_{i}^{(I)} (t) .

The predictive posterior distribution of $θ_{i}^{(V)} (t)$ is derived from the effectively vaccinated population, $M_{i, j}$ in (2), divided by the population $P_{i}$ . Recall that the index j in $M_{i, j}$ indicate report index, and reports are not given for everyday. When there is no report in date t, we use the most recent report from date t. The predictive posterior distribution of $θ_{i}^{(I)} (t)$ is derived from the distribution

\log (θ_{i}^{(I)} (t) / θ_{i}^{(C)} (t)) \sim TN (β_{i} + β_{1}^{(I)} P D_{i} + β_{2}^{(I)} G_{i}, τ^{2})

in (16), given $θ_{i}^{(C)} (t)$ , $P D_{i}$ and $G_{i}$ .

Next, we define the trend of world seroprevalences using $θ_{i}^{(I)} (t)$ , $θ_{i}^{(V)} (t)$ and $θ_{i} (t)$ . We define $θ_{t}^{(I)}$ , $θ_{t}^{(V)}$ and $θ_{t}$ as

\begin{aligned} θ_{t}^{(V)} & = \sum_{i} P_{i} θ_{i}^{(V)} (t) / P, \\ θ_{t}^{(I)} & = \sum_{i} P_{i} θ_{i}^{(I)} (t) / P, \\ θ_{t} & = \sum_{i} P_{i} θ_{i} (t) / P, \end{aligned}

where P is the sum of population of the all countries considered. The variables $θ_{t}^{(I)}$ , $θ_{t}^{(V)}$ and $θ_{t}$ describe the trends of world seroprevalence by infection, the proportion of effectively vaccinated in the world and the world seroprevalence, respectively, and these are represented in Figure 6.

Figure 6. — The trends of $θ_{t}^{(V)}$ , $θ_{t}^{(I)}$ and $θ_{t}$ from beginning of January 2021 to the end of July 2021. The gray area denotes the $95 %$ credible interval. The black line represents the posterior mean. The left, center, right graphs represent the trends of $θ_{t}^{(V)}$ , $θ_{t}^{(I)}$ and $θ_{t}$ , respectively.

As of 31st July the $95 %$ credible intervals of $θ^{(V)}$ , $θ^{(I)}$ and θ are $[22.3 %, 25.2 %]$ , $[15.2 %, 40.0 %]$ and $[35.5 %, 56.8 %]$ , respectively. We compare our estimate with the result by Bergeri et al. [2], which estimates the world seroprevalence in July 2021 as $[40.7 %, 49.8 %]$ . The centers of our estimate and [2] are $46.15 %$ and $45.25 %$ , respectively. The difference between the centers is $0.9 % p$ . The range of our interval estimate covers the interval by Bergeri et al. [2]. Thus, our result is consistent with the result of [2].

We present a treemap in Figure 7, which shows the posterior means of seroprevalences by country on 31st July 2021. The seroprevalences of China and India are $47 %$ and $55 %$ , respectively, which are similar to the world's seroprevalence on this date. France and UK attain over $70 %$ seroprevalence on this date.

We evaluate the accuracy of credible intervals for the seroprevalence via the leave-one-out cross-validation method. We split the serosurvey data into train data and test observation. Each observation of the data is a serosurvey for a particular date in a particular country. We fit the proposed model with the training data and then derive the predictive posterior distribution for the test observation. Out of 99 test observations, 91 are included in the predictive posterior distribution, i.e. we get a coverage probability of $92 %$ .

5. Discussion

We have proposed a novel Bayesian approach to estimate the seroprevalence of COVID-19 antibodies among the global adult population. This approach begins by estimating the seroprevalences due to infection and vaccination for each country, and then employs a Bayesian hierarchical model to combine these estimates into a global seroprevalence figure. Additionally, we constructed informative priors by utilizing external sources, such as data from clinical trials.

There are many studies on the estimation of seroprevalence in a population. However, these studies focus on estimating the seroprevalence on the date and country in which the sample is collected, and hence the estimation of the world seroprevalence is not apparent. Furthermore, the previous works on the vaccination data were mainly on the cumulative doses administrated and the fully vaccinated population, while the method proposed in the paper predicted the effective vaccinated population using the information on the efficacies of vaccines.

The methods used in this paper could be improved. Firstly, in the hierarchical model for the seroprevalence of infection, additional covariates could be explored and used for the model. The covariates used in this study are national statistics which do not depend on the date factor. Thus, we expect that explanatory power could be improved by adding the date-dependent covariate, such as the daily number of COVID tests in a country. Secondly, the model could be refined by considering the sampling period, as the current analysis only uses the last day of the sampling period. This approach may not fully capture the trends in infection rates over time. Lastly, the current study has some limitations, including being based on data up to July 2021 and not accounting for the decline in neutralizing antibodies among vaccinated individuals, potential underreporting of COVID-19 cases, changes in social isolation regulations, population adherence, and test availability. To improve the accuracy of the results, it may be necessary to update the data and incorporate these limitations into the model.

Supplementary Material

Supplemental Material

CJAS_A_2335569_SM1677.pdf^{(188.1KB, pdf)}

Acknowledgments

The first and second authors equally contributed to this work.

Funding Statement

Kwangmin Lee was supported by Chonnam National University (Grant number: 2023-0482) and the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. RS-2023-00211979). Seongil Jo was supported by INHA UNIVERSITY Research grant and the National Research Foundation of Korea (NRF) grant funded by the Korea government(MSIT) (NRF-2022R1A5A7033499 and RS-2023-00209229). Jaeyong Lee was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government(MSIT) (No. 2020R1A4A1018207 and NRF-2023R1A2C1003050).

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

1.Arora R.K., Joseph A., Van Wyk J., Rocco S., Atmaja A., May E., Yan T., Bobrovitz N., Chevrier J., Cheng M.P., Williamson T., and Buckeridge D.L., SeroTracker: a global SARS-CoV-2 seroprevalence dashboard, Lancet Infect. Dis. 21 (2021), pp. e75–e76. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Bergeri I., Whelan M., Ware H., Subissi L., Nardone A., Lewis H.C., Li Z., Ma X., Valenciano M., Cheng B., Ariqi L.A., Rashidian A., Okeibunor J., Azim T., Wijesinghe P., Le L.-V., Vaughan A., Pebody R., Vicari A., Yan T., Yanes-Lane M., Cao C., Clifton D.A., Cheng M.P., Papenburg J., Buckeridge D., Bobrovitz N., Arora R.K., and Van Kerkhove M.D., Unity Studies Collaborator Group , Global epidemiology of SARS-CoV-2 infection: A systematic review and meta-analysis of standardized population-based seroprevalence studies, jan 2020-oct 2021, MedRxiv (2021), pp. 2021–12.
3.Bodnar O., Link A., Arendacká B., Possolo A., and Elster C., Bayesian estimation in random effects meta-analysis using a non-informative prior, Stat. Med. 36 (2017), pp. 378–399. [DOI] [PubMed] [Google Scholar]
4.Bodnar O., Link A., and Elster C., Objective Bayesian inference for a generalized marginal random effects model, Bayesian Anal. 11 (2016), pp. 25–45. [Google Scholar]
5.de Valpine P., Turek D., Paciorek C.J., Anderson-Bergman C., Lang D.T., and Bodik R., Programming with models: Writing statistical algorithms for general model structures with NIMBLE, J. Comput. Graph. Stat. 26 (2017), pp. 403–413. [Google Scholar]
6.Dong Q. and Gao X., Bayesian estimation of the seroprevalence of antibodies to SARS-CoV-2, JAMIA Open 3 (2020), pp. 496–499. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Forni G. and Mantovani A., on behalf of the COVID-19 Commission of Accademia Nazionale dei Lincei, Rome , COVID-19 vaccines: Where we stand and challenges ahead, Cell Death Differ. 28 (2021), pp. 626–639. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Gelman A., Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper), Bayesian Anal. 1 (2006), pp. 515–534. [Google Scholar]
9.Higdon M.M., Wahl B., Jones C.B., Rosen J.G., Truelove S.A., Baidya A., Nande A.A., ShamaeiZadeh P.A., Walter K.K., Feikin D.R., Patel M.K., Knoll M.D., and Hill A.L., A systematic review of COVID-19 vaccine efficacy and effectiveness against SARS-CoV-2 infection and disease, MedRXiv. (2021), pp. 9. [DOI] [PMC free article] [PubMed]
10.Kline D., Li Z., Chu Y., Wakefield W.C., Miller J., Turner A.N., and Clark S.J., Estimating seroprevalence of SARS-CoV-2 in Ohio: A Bayesian multilevel poststratification approach with multiple diagnostic tests, preprint (2020), arXiv:2011.09033. [DOI] [PMC free article] [PubMed]
11.Lee K., Jo S., and Lee J., Seroprevalence of SARS-CoV-2 antibodies in South Korea, J. Korean Stat. Soc. 50(3) (2021), pp. 891–904. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Lu H., Stratton C.W., and Tang Y.W., Outbreak of pneumonia of unknown etiology in Wuhan, China: The mystery and the miracle, J. Med. Virol. 92 (2020), pp. 401–402. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Mathieu E., Ritchie H., Ortiz-Ospina E., Roser M., Hasell J., Appel C., Giattino C., and Rodés-Guirao L., A global database of COVID-19 vaccinations, Nat. Hum. Behav. 5(7) (2021), pp. 947–953. [DOI] [PubMed] [Google Scholar]
14.Stringhini S., Wisniak A., Piumatti G., Azman A.S., Lauer S.A., Baysson H., De Ridder D., Petrovic D., Schrempft S., Marcus K., Yerly S., Vernez I.A., Keiser O., Hurst S., Posfay-Barbe K.M., Trono D., Pittet D., Gétaz L., Chappuis F., Eckerle I., Vuilleumier N., Meyer B., Flahault A., Kaiser L., and Guessous I., Seroprevalence of anti-SARS-CoV-2 IgG antibodies in Geneva, Switzerland (SEROCoV-POP): A population-based study, The LANCET 396 (2020), pp. 313–319. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.The New York Times , Coronavirus vaccine tracker, 2021. Available at https://www.nytimes.com/interactive/2020/science/coronavirus-vaccine-tracker.html. Accessed July 31, 2021.
16.UNICEF , COVID-19 vaccine market dashboard, 2021. Available at https://www.unicef.org/supply/covid-19-vaccine-market-dashboard/. Accessed July 31, 2021.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material

CJAS_A_2335569_SM1677.pdf^{(188.1KB, pdf)}

[CIT0001] 1.Arora R.K., Joseph A., Van Wyk J., Rocco S., Atmaja A., May E., Yan T., Bobrovitz N., Chevrier J., Cheng M.P., Williamson T., and Buckeridge D.L., SeroTracker: a global SARS-CoV-2 seroprevalence dashboard, Lancet Infect. Dis. 21 (2021), pp. e75–e76. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0002] 2.Bergeri I., Whelan M., Ware H., Subissi L., Nardone A., Lewis H.C., Li Z., Ma X., Valenciano M., Cheng B., Ariqi L.A., Rashidian A., Okeibunor J., Azim T., Wijesinghe P., Le L.-V., Vaughan A., Pebody R., Vicari A., Yan T., Yanes-Lane M., Cao C., Clifton D.A., Cheng M.P., Papenburg J., Buckeridge D., Bobrovitz N., Arora R.K., and Van Kerkhove M.D., Unity Studies Collaborator Group , Global epidemiology of SARS-CoV-2 infection: A systematic review and meta-analysis of standardized population-based seroprevalence studies, jan 2020-oct 2021, MedRxiv (2021), pp. 2021–12.

[CIT0003] 3.Bodnar O., Link A., Arendacká B., Possolo A., and Elster C., Bayesian estimation in random effects meta-analysis using a non-informative prior, Stat. Med. 36 (2017), pp. 378–399. [DOI] [PubMed] [Google Scholar]

[CIT0004] 4.Bodnar O., Link A., and Elster C., Objective Bayesian inference for a generalized marginal random effects model, Bayesian Anal. 11 (2016), pp. 25–45. [Google Scholar]

[CIT0005] 5.de Valpine P., Turek D., Paciorek C.J., Anderson-Bergman C., Lang D.T., and Bodik R., Programming with models: Writing statistical algorithms for general model structures with NIMBLE, J. Comput. Graph. Stat. 26 (2017), pp. 403–413. [Google Scholar]

[CIT0006] 6.Dong Q. and Gao X., Bayesian estimation of the seroprevalence of antibodies to SARS-CoV-2, JAMIA Open 3 (2020), pp. 496–499. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0007] 7.Forni G. and Mantovani A., on behalf of the COVID-19 Commission of Accademia Nazionale dei Lincei, Rome , COVID-19 vaccines: Where we stand and challenges ahead, Cell Death Differ. 28 (2021), pp. 626–639. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0008] 8.Gelman A., Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper), Bayesian Anal. 1 (2006), pp. 515–534. [Google Scholar]

[CIT0009] 9.Higdon M.M., Wahl B., Jones C.B., Rosen J.G., Truelove S.A., Baidya A., Nande A.A., ShamaeiZadeh P.A., Walter K.K., Feikin D.R., Patel M.K., Knoll M.D., and Hill A.L., A systematic review of COVID-19 vaccine efficacy and effectiveness against SARS-CoV-2 infection and disease, MedRXiv. (2021), pp. 9. [DOI] [PMC free article] [PubMed]

[CIT0010] 10.Kline D., Li Z., Chu Y., Wakefield W.C., Miller J., Turner A.N., and Clark S.J., Estimating seroprevalence of SARS-CoV-2 in Ohio: A Bayesian multilevel poststratification approach with multiple diagnostic tests, preprint (2020), arXiv:2011.09033. [DOI] [PMC free article] [PubMed]

[CIT0011] 11.Lee K., Jo S., and Lee J., Seroprevalence of SARS-CoV-2 antibodies in South Korea, J. Korean Stat. Soc. 50(3) (2021), pp. 891–904. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0012] 12.Lu H., Stratton C.W., and Tang Y.W., Outbreak of pneumonia of unknown etiology in Wuhan, China: The mystery and the miracle, J. Med. Virol. 92 (2020), pp. 401–402. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0013] 13.Mathieu E., Ritchie H., Ortiz-Ospina E., Roser M., Hasell J., Appel C., Giattino C., and Rodés-Guirao L., A global database of COVID-19 vaccinations, Nat. Hum. Behav. 5(7) (2021), pp. 947–953. [DOI] [PubMed] [Google Scholar]

[CIT0014] 14.Stringhini S., Wisniak A., Piumatti G., Azman A.S., Lauer S.A., Baysson H., De Ridder D., Petrovic D., Schrempft S., Marcus K., Yerly S., Vernez I.A., Keiser O., Hurst S., Posfay-Barbe K.M., Trono D., Pittet D., Gétaz L., Chappuis F., Eckerle I., Vuilleumier N., Meyer B., Flahault A., Kaiser L., and Guessous I., Seroprevalence of anti-SARS-CoV-2 IgG antibodies in Geneva, Switzerland (SEROCoV-POP): A population-based study, The LANCET 396 (2020), pp. 313–319. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0015] 15.The New York Times , Coronavirus vaccine tracker, 2021. Available at https://www.nytimes.com/interactive/2020/science/coronavirus-vaccine-tracker.html. Accessed July 31, 2021.

[CIT0016] 16.UNICEF , COVID-19 vaccine market dashboard, 2021. Available at https://www.unicef.org/supply/covid-19-vaccine-market-dashboard/. Accessed July 31, 2021.

PERMALINK

Estimation of world seroprevalence of SARS-CoV-2 antibodies

Kwangmin Lee

Seongmin Kim

Seongil Jo

Jaeyong Lee

Abstract

1. Introduction

2. Materials

2.1. Vaccine data

2.1.1. Vaccination data by country

2.1.2. Delivery amount of vaccines

2.1.3. Observational studies for vaccine effectiveness

Table 1.

2.2. Serological survey data

Figure 1.

3. A Bayesian method for the seroprevalence estimation

3.1. Models for vaccine induced seroprevalence

3.1.1. Model for Xi,j,k

Figure 2.

Theorem 3.1

3.1.2. Model for Yi,j

Theorem 3.2

3.1.3. Distributional assumption for Yi,j,k

3.1.4. Model for the estimation of the vaccine effectiveness parameters

3.2. Models for infection induced seroprevalence

3.2.1. Reparameterization of model for one serosurvey

3.2.2. Model for serosurvey data over countries

4. Results

4.1. Simulation study

Table 2.

4.2. Posterior distributions for regression coefficients and vaccine effectiveness

Figure 3.

Figure 4.

Figure 5.

4.3. Estimation of world seroprevalence

Figure 6.

Figure 7.

5. Discussion

Supplementary Material

Acknowledgments

Funding Statement

Disclosure statement

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

3.1.1. Model for $X_{i, j, k}$

3.1.2. Model for $Y_{i, j}$

3.1.3. Distributional assumption for $Y_{i, j, k}$