Underreporting of SARS-CoV-2 infections during the first wave of the 2020 COVID-19 epidemic in Finland—Bayesian inference based on a series of serological surveys

Tuomo A Nieminen; Kari Auranen; Sangita Kulathinal; Tommi Härkänen; Merit Melin; Arto A Palmu; Jukka Jokinen

doi:10.1371/journal.pone.0282094

. 2023 Jun 23;18(6):e0282094. doi: 10.1371/journal.pone.0282094

Underreporting of SARS-CoV-2 infections during the first wave of the 2020 COVID-19 epidemic in Finland—Bayesian inference based on a series of serological surveys

Tuomo A Nieminen ^1,^2,^*, Kari Auranen ³, Sangita Kulathinal ², Tommi Härkänen ^1,², Merit Melin ¹, Arto A Palmu ¹, Jukka Jokinen ^1,²

Editor: Timothy J Wade⁴

PMCID: PMC10289354 PMID: 37352274

Abstract

In Finland, the first wave of the COVID-19 epidemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) took place from March to June 2020, with the majority of COVID-19 cases diagnosed in the Helsinki-Uusimaa region. The magnitude and trend in the incidence of COVID-19 is one way to monitor the course of the epidemic. The diagnosed COVID-19 cases are a subset of the infections and therefore the COVID-19 incidence underestimates the SARS-CoV-2 incidence. The likelihood that an individual with SARS-CoV-2 infection is diagnosed with COVID-19 depends on the clinical manifestation as well as the infection testing policy and capacity. These factors may fluctuate over time and the underreporting of infections changes accordingly. Quantifying the extent of underreporting allows the assessment of the true incidence of infection. To obtain information on the incidence of SARS-CoV-2 infection in Finland, a series of serological surveys was initiated in April 2020. We develop a Bayesian inference approach and apply it to data from the serological surveys, registered COVID-19 cases, and external data on antibody development, to estimate the time-dependent underreporting of SARS-Cov-2 infections during the first wave of the COVID-19 epidemic in Finland. During the entire first wave, there were 1 to 5 (95% probability) SARS-CoV-2 infections for every COVID-19 case. The underreporting was highest before April when there were 4 to 17 (95% probability) infections for every COVID-19 case. It is likely that between 0.5%–1.0% (50% probability) and no more than 1.5% (95% probability) of the adult population in the Helsinki-Uusimaa region were infected with SARS-CoV-2 by the beginning of July 2020.

Introduction

When a novel virus initiates an epidemic, an important question is how fast the virus spreads in the population. If the virus causes specific clinical disease, the rate of epidemic growth can be monitored by the incidence of diagnosed disease cases. However, mild or asymptomatic infections may be difficult or impossible to observe directly, and therefore the true incidence of infection can not be learned solely based on the diagnosed cases. Infection usually leaves a mark in the form of antibodies, i.e. immunoglobulin proteins developed by the immune system and capable of identifying and neutralising the virus. Consequently, the true incidence of infection can be learned through serological surveys, i.e. studies of the prevalence of individuals with antibodies (seroprevalence). Comparing the seroprevalence to the cumulative incidence of diagnosed cases allows one to learn about the underreporting of infections, which consequently allows monitoring the true spread of the virus.

There are challenges in estimating the level of underreporting. The rate of infections and diagnostic practises may quickly change, and there may be different delays from infection to disease onset and to developing antibodies. In this case study, we propose a Bayesian approach for estimating the time-dependent underreporting of infections during the beginning of an epidemic and we apply our method to data from the 2020 COVID-19 epidemic in Finland. In our analysis we integrate three data sources: series of serological surveys, registered disease cases, and external data on antibody development.

In Finland, the first wave of the COVID-19 epidemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) occurred from March through June 2020. In early March, tens of weekly COVID-19 cases were diagnosed in the Helsinki-Uusimaa region (HUS area) with a population of 1.7 million, marking the beginning of the epidemic in the region, while relatively few cases were diagnosed in other parts of the country. Fig 1 shows the numbers of new COVID-19 cases by week and municipality in the HUS area. Already by mid March, hundreds of weekly cases were diagnosed. The rate of new cases started to decline in early April, most likely because of a partial lockdown in the country. By mid June, the rate of weekly cases, both in the HUS area and the country as a whole, reduced to the tens of cases, and the first wave of the COVID-19 epidemic ended by the end of June. A total of 7286 COVID-19 cases were diagnosed during the first epidemic wave, of which 5347 cases were diagnosed in individuals residing in the HUS area.

Fig 1 — In each map, the number of cases in each municipality is shown if it is 5 or more. Detailed data for three weeks in June (2020-06-08 / 2020-06-15, 2020-06-15 / 2020-06-22 and 2020-06-22 / 2020-06-29), with total 147 cases, are not shown in the figure.

The clinical manifestations of SARS-CoV-2 infection range from asymptomatic to severe and potentially fatal disease. To be diagnosed as a COVID-19 case, a SARS-CoV-2 infection needs to be laboratory confirmed or, alternatively, a clinical diagnosis of COVID-19 made by a medical doctor. The likelihood of a SARS-CoV-2 infection being detected thus depends on the clinical manifestation as well as the infection testing policy and capacity at the time of infection.

It is likely that a relatively large proportion of infections went undetected during the first wave of the epidemic in Finland. No widespread testing of asymptomatic individuals was in place, making it probable that at least almost all asymptomatic infections were missed. Many symptomatic infections were likely missed as well due to the care instructions and testing policy in place. In Finland, the underreporting was probably most prominent among the young and healthy in the beginning of the first epidemic wave, when the official care instructions for healthy individuals with symptoms compatible with COVID-19 were to stay at home with no contact to health care [1]. These instructions were affected by the limited infection testing capacity. During the epidemic peak, the daily number of infection tests in the HUS area was still increasing through rapid capacity building. The daily testing capacity increased from approximately 300 during March to 1000 during April to 1500 tests from May onward [2].

Based on a population-based seroepidemiological study in Spain in April-May 2020, Pollán et al. found that approximately one third of SARS-CoV-2 infections were asymptomatic and that a substantial proportion of symptomatic infections also went undetected [3]. Stringhini et al. analysed the prevalence of immunoglobulin G (IgG) antibodies in Geneva during spring 2020 and estimated that there were 11 SARS-Cov-2 infections for every detected COVID-19 case [4]. Erikstrup et al. analysed blood donation data in April-May 2020 in Denmark and estimated that the ratio of the expected number of seropositives to the number of COVID-19 cases was between 7–20 [5]. To obtain information on the incidence of SARS-CoV-2 infection in Finland, a series of serological surveys (serosurveys) was initiated in April 2020.

While there may be significant delays from SARS-CoV-2 infection until developing detectable antibodies, i.e. until seroconversion, symptoms and diagnosis of COVID-19 usually occur with less delay. This means that the two sources of observations are not directly comparable at any given time. One solution to this problem is to compare the SARS-CoV-2 seroprevalence to the cumulative incidence of COVID-19 from 2–3 weeks ago, thus accounting for the average delay in developing antibodies after COVID-19 symptoms. This approach can provide an estimate of underreporting but it does not take into account the uncertainty and individual-level variation in the time lag from COVID-19 symptoms to seroconversion.

To better address the delays in antibody responses, in this paper, we utilise previously published data about the time lag from COVID-19 symptom onset to seroconversion [6]. We estimate the distribution of the time lag and project the SARS-CoV-2 seroprevalence based on the COVID-19 incidence. We then estimate the SARS-CoV-2 seroprevalence based on the observations from the series of serosurveys. Finally, we compare the seroprevalence projections with the estimated seroprevalences over time and learn the time-evolving underreporting of SARS-CoV-2 infections based on the ratio of the two measures of seroprevalence.

We utilise Bayesian inference and data from the HUS area to carry out the analysis. The novelty of our methodology is in accounting for the uncertainty in the time lag from disease symptoms to seroconversion when estimating the time-evolving underreporting of infections. Our analysis shows how the underreporting of SARS-CoV-2 infections evolved over time during the first epidemic wave in Finland.

Data sources

Study population

The target population in this study include individuals aged 18–69 years and living in the HUS area with native language Finnish, Swedish, English or Russian. We utilised the Finnish Population Information System (PIS) to retrieve the native languages of all COVID-19 cases and the serological survey participants. We also retrieved the age distribution of the study population from the same system. The PIS includes the Finnish personal identity code, birth date, native language and municipality of residence for all Finnish residents [7]. We present some data for the whole HUS area population, but our main analysis is based on data from the study population.

COVID-19 cases

The Finnish National Infectious Diseases Register (FNIDR) collects individual-level data on patients infected with SARS-CoV-2 [8]. These data consist of COVID-19 cases notified as either a positive SARS-CoV-2 finding from a microbiological laboratory or a clinical diagnosis by a medical doctor. Approximately 95% of the COVID-19 cases during the first epidemic wave in Finland were based on a positive SARS-CoV-2 finding from a polymerase chain reaction (PCR) test. The data was extracted for analysis on 31st November 2021.

The sample date of each positive PCR test and/or a doctor’s diagnosis is recorded in the FNIDR along with information regarding the patient, including the Finnish personal identity code. Records related to the same patient during a 12-month period are combined as a single COVID-19 case. In our analysis, the COVID-19 diagnosis date is taken to be the first positive PCR sample date or the first doctor’s diagnosis date, whichever occurred first. According to expert evaluation during early 2020, the delay from symptom onset to COVID-19 diagnosis was deemed to be on average 3.5 days in the Helsinki-Uusimaa region.

Serological surveys

In April 2020, the Finnish Institute for Health and Welfare (THL) initiated a series of serological surveys (serosurveys) to obtain information on how large a proportion of the population had developed antibodies to SARS-CoV-2 in different regions in Finland over time [9]. Each survey targeted most of the largest municipalities in Finland and individuals aged 18–69. In each survey round, individuals were randomly sampled from PIS and invited to participate. Successive surveys were conducted weekly or biweekly.

Fig 2 shows the recruitment to and participation in the surveys in the HUS area during the spring 2020. Due to practical reasons, only Finnish speaking individuals were recruited during the first two weeks, after which the study expanded to cover individuals with native language Swedish, English or Russian. The questionnaire was translated to each language. Other language groups were included in June 2020. The recruitment targeted only few of the largest municipalities during the first two weeks and then expanded to cover all municipalities in the HUS area. The sample size in the HUS area decreased after the second week and the participation rate declined from 64% to 50% during spring 2020.

The age distribution of the study population and the survey participants during the first epidemic wave are shown in S1 Fig. The median and the 25% quantile of the age of the survey participants were slightly higher than in the study population, indicating that the participation rate was higher in older age groups. Otherwise the age distribution of the participants was similar to the study population.

Participation in the survey included giving a blood sample. The first and last blood samples during the first epidemic wave were taken on 9th April 2020 and 3rd July 2020, respectively.

Laboratory methods

Blood samples from the serosurvey participants were analysed using a two stage procedure: (1) a screening test, and (2) a microneutralisation test (MNT) following a positive result at stage (1). The screening test was a bead-based fluorescence immunoassay that measures IgG antibodies to the SARS-CoV-2 nucleoprotein [10]. The MNT is a cytopatic effect-based test, which measures the capacity of neutralising antibodies to prevent an infectious virus from causing damage in cell culture. SARS-CoV-2 strains circulating in Finland in early 2020 were used in the MNT assay; CoV-19/Finland/1/2020 (GISAID accession ID EPI_ISL_407079) and hCoV-19/Finland/FIN-25/2020 (EPI_ISL_412971). MNT was used as the second test as it is highly specific to SARS-CoV-2 [10, 11]. Obtaining positive results from the two tests, the screening test and the MNT combined, was considered a confirmed presence of antibodies due to a past or ongoing SARS-CoV-2 infection (seroconversion). In the following, the combined test is referred to as the confirmation test.

In order to maximise accuracy, the confirmation test was calibrated utilising data unrelated to the surveys [10]. The ground truth for a past or ongoing SARS-CoV-2 infection was based on a positive PCR test close to 30 days prior to the antibody tests. The ground truth of no SARS-Cov-2 was based on blood samples from 2019. Based on calibration, a sample was considered positive for the screening test if the mean fluorescent intensity (MFI) value of the test was above 500. In the MNT, neutralising antibodies were detected from 2-fold serially diluted serum samples starting from dilution 1:4. Based on calibration, a titer of ≥4 was considered positive. S2 Fig describes the optimised test performance on the calibration data for both the screening and confirmation tests. The screening test was 100% sensitive, after which the MNT was both 100% specific and 100% sensitive. Therefore the optimised performance of the confirmation test was 100% sensitive and 100% specific. The sensitivity and specificity of the screening test alone were 100% and 97.59%, respectively.

Development and detection of antibodies

For the screening test, we say that an individual is seropositive if the test gives a positive result. If the seropositivity is due to a SARS-CoV-2 infection, we say that the individual is seroconverted. An individual may be seropositive but not seroconverted, because the screening test may produce a false positive result due to cross-reactive IgG antibodies induced by other human coronaviruses. Neutralising antibodies measured by the MNT are always due to SARS-CoV-2 and therefore and individual with a positive confirmation test is always both seropositive and seroconverted.

The time from infection to seroconversion is subject to individual-level variation. If time from infection is short, the antibody concentration may not have reached the test detection threshold. If time from infection is long, the antibodies may wane below the detection threshold. The sensitivity of antibody detection (e.g. the confirmation test) is therefore likely to be lower than 100% in both of these cases. When modelling the time-dependent seroconversion, we take into account the slow development of antibodies after infection. However, we omit waning immunity due to the relatively short study period.

For symptomatic individuals, the symptoms usually develop sooner than detectable antibodies. Tan et al. present results where symptomatic SARS-CoV-2 infected patients were followed for 6 weeks starting from symptom onset and reported the IgG positive proportions of patients for each week [6]. The antibody test utilised in their analysis was similar to the screening test of the current study. The data are reproduced in Table 1. A total 312 tests were performed on 65 patients, with 3–7 days between consecutive tests. At day 7 since symptom onset, only 3.4% of the patients tested positive for IgG antibodies. At day 14, 50% tested positive and when 28–49 days had passed, between 74% and 87% tested positive. Tan et al. report that of the 67 patients included in their study, 29 were classified with severe pneumonia [6]. The median age of the patients was 49 years and twenty-five patients had underlying diseases.

Table 1. Percentage of seroconverted COVID-19 patients by time since symptom onset.

Day	Patients	IgG positive	%
7	58	2	3.4
10	62	12	19.4
14	61	31	50.8
21	54	32	59.3
28	35	26	74.3
35	22	17	77.3
42	15	13	86.7
49	5	4	80.0

Open in a new tab

A total 312 tests were performed on 65 patients. Day is the number of days passed since COVID-19 symptom onset, Patients are the number of patients tested and IgG positive are the number of patients who tested positive for SARS-CoV-2 IgG antibodies. Data from Tan et al. [6].

Statistical models and methods

Let T = [0, D], D = 86, denote the study period, i.e. the time period starting on 9th April 2020 (the date of the first blood sample taken from the serosurvey participants), until 3rd July 2020 (the date of the last blood sample taken during the first epidemic wave). Let τ_i denote the day of SARS-CoV-2 infection in individual i, i = 1, …, N. Here N = 1000821 is the size of the study population. The infections we consider may have occurred before the study period but not after (i.e. τ_i may be negative and τ_i < D).

After the infection, on day s_i, the individual may develop symptoms of COVID-19. Then, C days after the symptom onset, on day r_i = s_i + C, the individual may be diagnosed with COVID-19. In this case, information about the diagnosis and the individual is recorded in the FNIDR as a COVID-19 case. We assume that the delay C from symptom onset to diagnosis is 3.5 days and is the same for all individuals. The cumulative number of COVID-19 cases by day t is R(t), where $R (t) = \sum_{i}^{N} 1 (r_{i} \leq t)$ .

An individual i has seroconverted by day t if t > a_i > τ_i, where a_i is the day after which the SARS-CoV-2 antibodies in the individual are detectable. We define $A_{i} (t) = 1 (a_{i} < t)$ as an indicator function taking value 1 for individual i if seroconversion has occurred by day t and 0 otherwise. For individuals with diagnosed COVID-19 we assume that seroconversion occurs after the symptom onset day (i.e. a_i > s_i). In those cases, we use U_i to denote the number of days from symptom onset to seroconversion. Fig 3 summarises the notation and describes the timeline from SARS-CoV-2 infection to seroconversion.

Regardless of the infection status, an individual from the study population may be randomly selected to participate in one of the serosurveys. Let $y_{i, t}^{(z)} \in {0, 1}$ denote the binary test result (i.e. seropositivity) for individual i who was randomly selected into the survey and gave a sample for antibody testing on day t ∈ T, where z ∈ {Screen, Confirmation} denotes the test used to derive the result. We denote the specificity of test z as $δ^{(z)} = Pr (y_{i, t}^{(z)} = 0 ∣ τ_{i} > t)$ . If the test z is not fully specific, i.e. δ^(z) < 1, then the result may be positive ( $y_{i, t}^{(z)} = 1$ ) without a SARS-COV-2 infection.

Fig 4 displays how SARS-CoV-2 infections may have been observed as COVID-19 cases or positive antibody test results. To compare estimates of seroprevalence based on the two types of observation (serosurveys and COVID-19 cases), we quantify the distribution of the time lag from COVID-19 symptom onset to seroconversion. We then project the time-dependent seroprevalence based on the diagnosed COVID-19 cases, which allows for comparison to the seroprevalence estimated from the serosurveys.

Estimation target

Under two independent models, the quantity of interest is seroprevalence π(t), i.e. the proportion of the population that has seroconverted by time t, where $π (t) = Pr (A_{i} (t) = 1) = E (A_{i} (t))$ , for i = 1, …, N. We estimate π(t) using (i) observations from the serosurveys and (ii) the incidence of COVID-19 cases. We denote π⁽⁰⁾(t) to indicate the seroprevalence when based on the serosurveys and π⁽¹⁾(t) when based on COVID-19 cases. Our interest is in estimating the ratio of these two seroprevalence parameters on each day t ∈ T during the study period:

\begin{matrix} \begin{matrix} Δ (t) = \frac{π^{(0)} (t)}{π^{(1)} (t)} . \end{matrix} \end{matrix}

(1)

We estimate π⁽⁰⁾(t) and the corresponding Δ(t) separately for data from the two antibody tests but consider the analysis based on the confirmation test as the main result. In section Models we describe an Estimation model used to estimate π⁽⁰⁾(t) and a Projection model used to estimate π⁽¹⁾(t). We expect that π⁽⁰⁾(t) gives a reasonably unbiased estimate of the true seroprevalence π(t) but expect that the projection π⁽¹⁾(t) gives an underestimate of the true π(t). We therefore expect that Δ(t) > 1 and interpret Δ(t) as an underreporting ratio, i.e. quantifying the extent of underreporting of SARS-CoV-2 infections up until time t.

Models

In this section, we specify the Estimation and Projection model of the seroprevalence. We then describe the estimation of seroprevalence and underreporting under both models. We utilise a Bayesian framework for statistical inference and numerical methods to derive the posterior distributions of all unknown quantities.

Estimation model

This model relates to the lower part of Fig 4 (Sampling). The Estimation model is used to estimate the time-dependent seroprevalence based on antibody test results in the serosurvey participants. Due to the small numbers of daily blood samples in the serosurveys, we split the study period T into 13 non-overlapping seven day periods (weeks), W = [0, 7), [7, 14), ….[84, 86]. We assume that the seroprevalence is piecewise constant by week and let $π_{w}^{(0)}$ denote the seroprevalence during week w ∈ W.

We describe the prior uncertainty in the weekly seroprevalence as follows. For the first week, the logit of the seroprevalence $g (π_{1}^{(0)})$ is assumed to be normally distributed with expectation μ₁ and variance $σ_{1}^{2}$ . Note that the normal distribution is the maximum entropy distribution for $g (π_{1}^{(0)})$ under the constraints that its expectation is μ₁ and variance is $σ_{1}^{2}$ . The logit of the prevalence in any later week is assumed to depend on the prevalence during the previous week with a non-decreasing trend. A shared variance parameter σ² (which is different from σ₁) controls the strength of the dependency on the previous weeks, with σ given a gamma prior with parameters α and β. The structure of the prior model thus is:

\begin{matrix} \begin{matrix} σ \sim Gamma (α, β), \\ g (π_{1}^{(0)}) \sim N (μ_{1}, σ_{1}^{2}), \\ g (π_{w}^{(0)}) \sim N (g (π_{w - 1}^{(0)}) + {trend}_{w}, σ^{2}) for w \geq 2, where \\ {trend}_{w} = {\begin{matrix} 0, when w = 2 \\ max {0, g (π_{w - 1}^{(0)}) - g (π_{w - 2}^{(0)})}, when w > 2, \end{matrix} \end{matrix} \end{matrix}

(2)

where g(π) = log(π/(1 − π)) is the logit function. This defines a prior distribution of the parameter vector $g (π^{(0)}) = (g (π_{1}^{(0)}), . ., g (π_{13}^{(0)}))$ . We denote the prior distribution of g(π⁽⁰⁾) as p(g(π⁽⁰⁾);Φ), where the vector Φ = (α, β, μ₁, σ₁) collects the hyperparameters. The seroprevalence for week w is $π_{w}^{(0)} = g^{- 1} (g (π_{w}^{(0)}))$ , where g⁻¹(x) = 1/(1 + exp(−x)) is the inverse-logit function.

The observations $y_{i, w}^{(z)} \in {0, 1}$ arise when n_w randomly selected individuals from the population give a blood sample during week w and a result is derived via antibody test z. The probability that the test result is positive for individual i is

\begin{matrix} \begin{matrix} Pr (y_{i, w}^{(z)} = 1) & = f (π_{w}^{(0)}, δ^{(z)}) \\ = π_{w}^{(0)} + (1 - π_{w}^{(0)}) \cdot (1 - δ^{(z)}), \end{matrix} \end{matrix}

(3)

where 1 − δ^(z) is the probability that an individual without SARS-CoV-2 infection gives a (false) positive test result.

Let $y_{w}^{(z)} = \sum_{i = 1}^{n_{w}} y_{i, w}^{(z)}$ denote the number of positive samples during week w. We assume that, conditionally on the weekly seroprevalence, the observations $y_{i, w}^{(z)}$ are independent and identically distributed. The conditional probability model of the total count $y_{w}^{(z)}$ , where w ∈ W, then is

\begin{matrix} \begin{matrix} y_{w}^{(z)} ∣ g (π_{w}^{(0)}); δ^{(z)} \sim Binom (n_{w}, f (π_{w}^{(0)}, δ^{(z)})) . \end{matrix} \end{matrix}

(4)

Based on the vector of observations $y^{(z)} = (y_{1}^{(z)}, . . ., y_{13}^{(z)})$ , the likelihood function of the logit seroprevalence g(π⁽⁰⁾) is

\begin{matrix} \begin{matrix} p (y^{(z)} ∣ g (π^{(0)}); δ^{(z)}) = \prod_{w \in W} Binom (y_{w}^{(z)} | n_{w}, f (π_{w}^{(0)}, δ^{(z)})) . \end{matrix} \end{matrix}

(5)

The posterior distribution of g(π⁽⁰⁾) is proportional to the product of the prior (2) and the likelihood (5):

\begin{matrix} \begin{matrix} p (g (π^{(0)}) ∣ y^{(z)}; Φ, δ^{(z)}) \propto p (g (π^{(0)}); Φ) p (y^{(z)} ∣ g (π^{(0)}), δ^{(z)}) . \end{matrix} \end{matrix}

(6)

The estimation model is described graphically in Fig 5. We defined an informative prior distribution for the Estimation model seroprevalence. The chosen hyperparameter values μ₁ = logit(0.05) and σ₁ = 2 correspond to an approximate prior expectation 0.13 for the seroprevalence at the start of the study but with large variance. The chosen hyperparameter values α = 2 and β = 40 correspond to expected value of approximately 0.5 for the standard deviation between weekly seroprevalences on the probability scale. S3 Fig shows the prior distribution for π⁽⁰⁾. In the prior distribution, each weekly seroprevalence π⁽⁰⁾(t) has a large variance. The prior mean and variance both increase as t increases.

Projection model

This model relates to the upper part of Fig 4 (Selection). The model is learned from previously published data on antibody development after COVID-19 symptoms. We first describe the model and then show how it is utilised to project the time-dependent seroprevalence based on COVID-19 cases in the FNIDR.

For individual j, the number of days from COVID-19 symptom onset to seroconversion is described by the random variable U_j. We assume that each U_j has a lognormal distribution with parameters μ_U and $σ_{U}^{2}$ . The probability that patient j has secoconverted by day u since symptom onset is Pr(U_j ≤ u) = F_U(u;θ), where $θ = (μ_{U}, σ_{U}^{2})$ .

To estimate the parameters θ, we utilise data based on patients who had SARS-CoV-2 antibodies tested on multiple days after COVID-19 symptoms [6]. The data are shown in Table 1. We denote the test result by $y_{j}^{q} \in {0, 1}$ for individuals j = 1, …, n^q, where n^q is the number of individuals tested q days after symptom onset, and q ∈ Q^Tan = {7, 10, 14, …, 42, 49}. If the test result is positive (i.e. $y_{j}^{q} = 1$ ), the patient is seroconverted and the seroconversion must have occurred before day q. The probability model for the individual observation is

\begin{matrix} \begin{matrix} y_{j}^{q} ∣ θ \sim Bern (F_{U} (q; θ)) . \end{matrix} \end{matrix}

(7)

We assume that the test results are independent given day q and the parameters θ. Based on the observations $y^{T a n} = {y_{j}^{q}, j = 1, . . ., n^{q}, q \in Q^{T a n}}$ , the likelihood function of the parameters θ is

\begin{matrix} \begin{matrix} p (y^{T a n} ∣ θ) = \prod_{q \in Q^{T a n}} \prod_{j = 1}^{n^{q}} Bern (y_{j}^{q} | F_{U} (q; θ)) . \end{matrix} \end{matrix}

(8)

We assume an uninformative prior distribution:

\begin{matrix} p (θ) = p (μ_{U}, σ_{U}^{2}) \propto 1 / σ_{U}^{2} . \end{matrix}

(9)

The posterior distribution is proportional to the product of the prior (9) and the likelihood (8):

\begin{matrix} \begin{matrix} p (θ ∣ y^{T a n}) \propto p (θ) p (y^{T a n} ∣ θ) . \end{matrix} \end{matrix}

(10)

The posterior predictive distribution of F_U is

\begin{matrix} \begin{matrix} \hat{F_{U}} (u) = p (y_{j}^{u} ∣ y^{T a n}) = \int F_{U} (u; θ) p (θ ∣ y^{T a n}) d θ . \end{matrix} \end{matrix}

(11)

We utilise the posterior predictive distribution $\hat{F_{U}}$ to project seroprevalence based on the FNIDR COVID-19 cases. For each day t ∈ T during the study period, we first predict the probability of seroconversion in each case i, for whom q_i days have passed since symptom onset. We assume that the symptom onset day was C = 3.5 days before the diagnosis day r_i, and so q_i = t − (r_i − C). The probabilities of seroconversion, each given by $\hat{F_{U}} (q_{i})$ , are then combined as the expected number of cases seroconverted, and the seroprevalence is obtained by dividing by the population size N. Formally, we project the seroprevalence for day t ∈ T as

\begin{matrix} \begin{matrix} π^{(1)} (t) & = \frac{1}{N} \sum_{i}^{R (t + C)} E [A_{i} (t) ∣ y^{T a n}] \\ = \frac{1}{N} \sum_{i}^{R (t + C)} \hat{F_{U}} (t - (r_{i} - C)), \end{matrix} \end{matrix}

(12)

where R(t + C) is the number of COVID-19 cases with symptom onset before day t. We call π⁽¹⁾(t) the projected seroprevalence. The Projection model is described graphically in Fig 6.

Fig 6 — Left plate: The duration from symptoms to seroconversion was modelled based on external data. Individuals j, j = 1, …, N^(Tan), experienced COVID-19 symptoms on day s_j = 0 and were tested for antibodies q days later, where q varied from 7 to 49 days. Individuals were tested on multiple days. Here, A_j(u) denotes whether individual j had seroconverted by day u, and ( $y_{j}^{(q)}$ ) indicates the result of an antibody test taken on the q:th day. The duration from symptoms to seroconversion was modelled as a lognormal distribution with parameters (μ_U, σ_U). Right plate: The posterior distribution of (μ_U, σ_U) is utilised to project the seroconversion status A_i(t) for each individual i = 1, …, R(t + C) with COVID-19 symptom onset before day t ∈ T during the study period. The symptoms are assumed to have occurred on day ${\hat{s}}_{i} = r_{i} - C$ , where C is the lag from symptom onset to the COVID-19 diagnosis day r_i. The individual projections are used to derive the population level projection for the seroprevalence on day t, π⁽¹⁾(t).

Estimation of seroprevalence and underreporting

In the Estimation model, the posterior distribution for the parameter vector g(π⁽⁰⁾) was obtained by sampling from p(g(π⁽⁰⁾) ∣ y^(z);Φ, δ^(z)), see Eq 6. Each sample was then transformed with g⁻¹(.) to obtain samples from the posterior distribution of each weekly seroprevalence $π_{w}^{(0)}$ . This provided samples for each day t ∈ w of the week, resulting in samples from the posterior distribution of each daily seroprevalence π⁽⁰⁾(t), t ∈ T.

In the Projection model, the posterior distribution for θ was obtained by sampling from p(θ ∣ y^Tan), see Eq 10. For each posterior sample and for each day t ∈ T during the study period, seroprevalence was projected as described in Eq 12, resulting in samples from the posterior predictive distribution of each daily seroprevalence π⁽¹⁾(t), t ∈ T.

Identical number of samples (S = 40000) were drawn from the posterior distributions of π⁽⁰⁾(t) and π⁽¹⁾(t). For each sample from π⁽⁰⁾(t) and π⁽¹⁾(t), a sample from Δ(t) was obtained by division, repeating over each day t ∈ T during the study period.

We utilised the No-U-Turn Sampler algorithm for sampling, which is an efficient Markov Chain Monte Carlo algorithm [12]. We used STAN and the R package Rstan to carry out the sampling and monitored convergence via the Rhat statistic [13–15]. The STAN model code and an R code example are available on github.com/TuomoNieminen/covid19underreporting.

The choices for hyperparameters and other needed quantities to carry out the estimation are shown in S1 Table. See section Sensitivity analysis for sensitivity analysis regarding the hyperparameter choices.

Ethics

The study protocol was approved by the ethical committee of the Hospital District of Helsinki and Uusimaa (HUS/1137/2020). Written informed consent was obtained from all participants.

Results

SARS-CoV-2 seroprevalence and the cumulative incidence of COVID-19

Table 2 shows the weekly numbers of blood samples and antibody test results in the serosurveys during the first epidemic wave. Out of 1465 samples taken between 9th April 2020 and 3rd July 2020, a total 35 (2.39%) were screening test positive and a total 7 (0.48%) were confirmation test positive. Five of the confirmed positive samples were taken before 4th May 2020, when the weekly numbers of samples were high, and they correspond to weekly sample seroprevalences 0.29%, 0.43% and 1.18%. After 4th May 2020, the weekly number of available samples decreased significantly and only two confirmed positive samples were observed.

Table 2. COVID-19 cases and serology survey results in the Helsinki-Uusimaa region during spring 2020.

	COVID-19 cases^a (cumulative)			Serological surveys^b (weekly)
Period	HUS (%)	Study (%)	Samples	Screening pos. (%)	Confirmation pos. (%)
10.02.2020—16.02.2020	0–10	0–10	-	-	-
17.02.2020—23.02.2020	0–10	0–10	-	-	-
24.02.2020—01.03.2020	0–10	0–10	-	-	-
02.03.2020—08.03.2020	24 (0)	20 (0)	-	-	-
09.03.2020—15.03.2020	220 (0.01)	190 (0.02)	-	-	-
16.03.2020—22.03.2020	611 (0.04)	505 (0.05)	-	-	-
23.03.2020—29.03.2020	965 (0.06)	737 (0.07)	-	-	-
30.03.2020—05.04.2020	1578 (0.09)	1030 (0.1)	-	-	-
06.04.2020—12.04.2020	2212 (0.13)	1332 (0.13)	23	1 (4.35)	0 (0)
13.04.2020—19.04.2020	2825 (0.16)	1621 (0.16)	339	8 (2.36)	1 (0.29)
20.04.2020—26.04.2020	3436 (0.2)	1895 (0.19)	465	13 (2.8)	2 (0.43)
27.04.2020—03.05.2020	3965 (0.23)	2138 (0.21)	170	4 (2.35)	2 (1.18)
04.05.2020—10.05.2020	4466 (0.26)	2415 (0.24)	139	2 (1.44)	0 (0)
11.05.2020—17.05.2020	4804 (0.28)	2636 (0.26)	88	2 (2.27)	1 (1.14)
18.05.2020—24.05.2020	4987 (0.29)	2747 (0.28)	47	0 (0)	0 (0)
25.05.2020—31.05.2020	5118 (0.3)	2825 (0.28)	48	0 (0)	0 (0)
01.06.2020—07.06.2020	5200 (0.3)	2863 (0.29)	48	2 (4.17)	1 (2.08)
08.06.2020—14.06.2020	5240 (0.31)	2885 (0.29)	44	1 (2.27)	0 (0)
15.06.2020—21.06.2020	5279 (0.31)	2899 (0.29)	23	0 (0)	0 (0)
22.06.2020—28.06.2020	5315 (0.31)	2915 (0.29)	9	0 (0)	0 (0)
29.06.2020—04.07.2020	5347 (0.31)	2932 (0.29)	22	2 (9.09)	0 (0)
All weeks	5347 (0.31)	2932 (0.29)	1465	35 (2.39)	7 (0.48)

Open in a new tab

The column COVID-19 cases (cumulative) shows the cumulative number and cumulative incidence of COVID-19 cases by the end of each week (Period). The column Serological surveys (weekly) shows weekly results from the serological surveys for the target population of the current study.

^a HUS: COVID-19 cases in the Helsinki-Uusimaa region of Finland; Study: COVID-19 cases in the target population of the current study. Populations 1.72M and 1.00M, respectively.

^b Samples gives the weekly number of blood samples. Screening pos. (%) gives the weekly number and proportion of samples where SARS-CoV-2 IgG antibodies were detected with the screening test. Confirmation pos. (%) gives the weekly number and proportion of positive samples confirmed via a microneutralisation test.

Table 2 also shows the cumulative incidence of COVID-19 cases in the study population and in all HUS area residents. Three weeks prior to the first confirmed positive blood sample, the cumulative incidence of COVID-19 in the study population was 0.07% (736 cases, population 1.0 million), and in three weeks it increased to 0.13% (1330 cases). In all HUS area residents the cumulative incidence of COVID-19 was 0.31% by the end of the first epidemic wave (5348 cases, population 1.7 million).

Fig 7 shows the estimates and projections of the seroprevalence, obtained under the Estimation model and Projection model. Results are shown for both the screening and confirmation tests. Based on the confirmation test, the posterior mean of the seroprevalence remains around 0.5% until the end of the study period where it slightly increases. The increase at the end is affected by the prior trend, combined with a low number of available blood samples. Based on the screening test, the seroprevalence behaves similarly but the posterior mean is lower and the posterior variance is greater. In both cases, the posterior mean of the projected seroprevalence (based on the COVID-19 cases) remains lower than the posterior mean of the estimated seroprevalence. The discrepancy to the estimated seroprevalence is greater during the beginning of the study period compared to the rest of the study period.

Table 3 shows the estimates and projections of the seroprevalence for selected dates during the study period. Based on the confirmation test, the estimated seroprevalence in the HUS area was 0.49 (95% CrI: 0.20–0.91) on 9th April 2020 and 0.58 (95% CrI: 0.23–1.16) on 28th May 2020. The corresponding seroprevalence projections based on COVID-19 cases are 0.06 (95% CrI:0.05–0.06) and 0.23 (95% CrI:0.21–0.24), respectively. Fig 8 shows the posterior distributions of the seroprevalence obtained under the Estimation model on 28th May 2020. Based on the confirmation test, the interquartile range (IQR) for the seroprevalence was 0.4%–0.67%. The seroprevalence based on the screening test has more uncertainty and the posterior median is lower.

Table 3. Estimated and projected seroprevalences and the underreporting ratios during the study period.

	COVID-19 cases	Confirmation test		Screening test
date	π⁽¹⁾(t)	π⁽⁰⁾(t)	Δ(t)	π⁽⁰⁾(t)	Δ(t)
09.04.2020	0.055 (0.050–0.061)	0.49 (0.20–0.91)	8.92 (3.64–16.53)	0.30 (0.022–0.91)	5.49 (0.40–16.53)
16.04.2020	0.080 (0.072–0.087)	0.49 (0.20–0.89)	6.14 (2.54–11.26)	0.30 (0.022–0.90)	3.79 (0.28–11.34)
23.04.2020	0.106 (0.096–0.114)	0.49 (0.21–0.89)	4.68 (1.95– 8.53)	0.30 (0.023–0.91)	2.89 (0.21– 8.58)
30.04.2020	0.132 (0.121–0.141)	0.50 (0.21–0.91)	3.81 (1.59– 6.95)	0.31 (0.023–0.92)	2.37 (0.17– 7.02)
07.05.2020	0.158 (0.145–0.168)	0.51 (0.22–0.94)	3.27 (1.37– 6.00)	0.32 (0.024–0.96)	2.05 (0.15– 6.12)
14.05.2020	0.184 (0.170–0.194)	0.53 (0.22–0.99)	2.90 (1.20– 5.43)	0.34 (0.024–1.01)	1.84 (0.13– 5.52)
21.05.2020	0.209 (0.194–0.220)	0.56 (0.23–1.06)	2.67 (1.08– 5.11)	0.36 (0.025–1.09)	1.72 (0.12– 5.24)
28.05.2020	0.230 (0.214–0.241)	0.58 (0.23–1.16)	2.54 (1.01– 5.04)	0.39 (0.027–1.21)	1.69 (0.12– 5.28)
04.06.2020	0.246 (0.231–0.257)	0.62 (0.24–1.29)	2.51 (0.96– 5.27)	0.43 (0.028–1.41)	1.73 (0.11– 5.74)
11.06.2020	0.259 (0.244–0.268)	0.66 (0.24–1.49)	2.56 (0.93– 5.75)	0.48 (0.029–1.73)	1.86 (0.11– 6.69)
18.06.2020	0.268 (0.254–0.276)	0.71 (0.25–1.76)	2.67 (0.92– 6.57)	0.56 (0.031–2.26)	2.08 (0.11– 8.50)
25.06.2020	0.274 (0.262–0.281)	0.78 (0.25–2.16)	2.86 (0.91– 7.91)	0.67 (0.032–3.14)	2.43 (0.12–11.53)
02.07.2020	0.279 (0.268–0.285)	0.88 (0.25–2.73)	3.14 (0.91– 9.80)	0.83 (0.033–4.66)	2.97 (0.12–16.71)

Open in a new tab

The column COVID-19 cases shows the projected seroprevalence π⁽¹⁾(t), and the columns Confirmation test and Screening test show estimated seroprevalence π⁽⁰⁾(t), and the underreporting ratio (Δ(t)), see Eq 1) of SARS-CoV-2 infections during the study period. The seroprevalences are shown in percentage scale. Displayed are the posterior means along with 95% credible intervals, derived from the 2.5% and 97.5% quantiles of the distributions.

Fig 8 — In the images, t corresponds to 8th May 2020, learned from the screening (left image) and confirmation (right image) test data. The seroprevalence is shown in percentage scale. (a) Posterior density of π⁽⁰⁾(t), where t corresponds to 8th May 2020 (Screening test). (b) Posterior density of π⁽⁰⁾(t), where t corresponds to 8th May 2020 (Confirmation test).

Underreporting

Fig 9 shows the posterior mean and quantiles of the underreporting ratio Δ(t) (see Eq 1), based on either the confirmation or the screening tests. For both tests, the posterior mean of Δ(t) first decreases, indicating higher underreporting during the beginning of the epidemic, then settles at around 2–3, and finally increased slightly toward the end of the first wave.

Table 3 shows the posterior mean and credible interval of Δ(t) for selected dates during the study period. Based on the confirmation test, there had been 8.9 (95% CrI: 3.6–16.5) infections for every COVID-19 case up until 9th April 2020. The underreporting then decreased, and up until 28th May 2020 our estimate is that there had been 2.5 (95% CrI: 1.0–5.0) SARS-CoV-2 infections for every COVID-19 case. The estimate of the underreporting ratio then remained at the same level until the end of the first wave.

Fig 10 shows the posterior distribution for the underreporting ratio by 28th May 2020, based on either the screening or confirmation tests. Based on the confirmation test, the IQR for underreporting was 1.8—3.0. The estimate derived from the screening test data has more uncertainty and shows lower underreporting.

Time from COVID-19 symptom onset to seroconversion

S4 Fig describes the posterior distributions of μ_U and σ_U, the parameters of the lognormal distribution of the time from COVID-19 symptom onset to seroconversion. The posterior medians of μ_U and σ_U are 2.87 and 0.72, respectively. The figure also shows the posterior predictive distribution for the time from symptom onset to seroconversion. The predicted median delay from symptom onset to seroconversion is close to 18 days and the 75% quantile is over 29 days. By day 60 since symptom onset, the probability of seroconversion is over 95%.

Sensitivity analysis

S5 Fig shows the prior and posterior distributions of σ, the strength of dependency in the Estimation model, learned from the screening and confirmation test data. In both cases, the posterior distribution is similar to the (informative) prior distribution, indicating that the data do not contain much information about σ and the analysis may be sensitive to the prior distribution of σ.

S2 Table shows estimates of the underreporting ratio Δ(t), based on data from the confirmation test, using different values for hyperparameters μ₁, σ₁ and β. Smaller values of β correspond to a higher prior expectation and higher prior variance for σ and in turn higher posterior variance for Δ(t). Smaller values of β also correspond to slightly higher posterior means for Δ(t), but only at the start and end of the study. For example, comparing choices β = 2 to β = 40 when α = 2, logit(μ₁) = 0.05 and σ₁ = 2, the 95% credible intervals for Δ(t) on 28th May 2020 are 0.4 − 7.3 and 1.0 − 5.0, respectively, however, the posterior means are almost identical (2.54 and 2.56). With a choice of β = 120, the underreporting ratio estimates are similar on 28th May 2020 and at the end of the study, while with smaller values of β there is more uncertainty in the estimates at the end of the study.

Choice of a larger logit(μ₁) corresponds to a higher posterior mean for Δ(t), but only marginally. For example, comparing the choice logit(μ₁) = 0.005 to logit(μ₁) = 0.05 when σ₁ = 2, β = 40, the posterior means for Δ(t) on 28th May 2020 are 2.4 and 2.6, respectively. A choice of smaller σ₁ reduces the posterior variance of Δ(t) and elevates the effect of the chosen μ₁, but the effects are small.

In all cases, the effects of the hyperparameter choices are magnified towards the end of the study period, when the number of available samples from the serosurveys is low.

Discussion

We estimated that with 95% probability there were 1 to 5 SARS-CoV-2 infections for every COVID-19 case during the first epidemic wave in Finland. A 50% probability interval for the underreporting was 1.8–3.0. The underreporting was highest before April 2020 when there were 4 to 17 infections for every COVID-19 case (95% probability). It is likely that the seroprevalence in the Helsinki-Uusimaa region was over 0.5% already by the end of May 2020 (95% CrI: 0.2–1.2), while the cumulative incidence of COVID-19 cases in the region was 0.3% by the end of June. Based on the estimate of underreporting and the cumulative incidence of COVID-19 cases, we estimate that between 0.5%–1% (50% probability) and no more than 1.5% (95% probability) of the population in the Helsinki-Uusimaa region were infected with SARS-CoV-2 by the end of June 2020.

There is great uncertainty about the estimated seroprevalence and the corresponding estimate of underreporting at the end of the study period, due to the small number of samples available in the serosurveys. The estimates are therefore sensitive to the model specification (i.e. hyperparameters). Accordingly, we consider the most robust estimate of underreporting during the first wave pertaining to the end of May 2020. We do not expect that the magnitude of underreporting changed significantly during the rest of the first wave, as there were no changes in virus testing policy or capacity. While our analysis included prior information related to the dependency between seroprevalences on consecutive weeks, our sensitivity analysis indicated that a prior choice of stronger dependency could result in more robust estimates of underreporting towards the end of the first wave.

Our analysis leaves a small but reasonable probability that by the end of the first wave there was no underreporting at all. Our estimation approach allowed values of the underreporting ratio below one, which would correspond to there being more COVID-19 cases than SARS-CoV-2 infections. This could occur in theory, in case the diagnosis procedure for COVID-19 (i.e. PCR test) was unspecific and the virus testing was widespread. Nevertheless, we believe this to be unrealistic in our study and simply interpret values below one to represent absence of underreporting. It seems, however, also unrealistic that no underreporting occurred, as in the general population the virus testing was targeted to symptomatic individuals only. Findings from a population-based screening in Iceland during March 2020 show that 43% of individuals who tested positive for SARS-CoV-2 were asymptomatic and findings from Spain indicate that one third of infections were asymptomatic in April-May 2020 [3, 16]. A systematic review and meta-analysis of 95 published studies estimates that globally 41% (34%—48%) of confirmed COVID-19 cases were asymptomatic during the pre COVID-19-vaccine era [17]. Our analysis also leaves a small but reasonable probability that only 20% or less of SARS-CoV-2 infections were detected during the first epidemic wave. We believe that this may still be plausible, as other countries show even higher underreporting [3, 5].

A key assumption in our analysis was that the serosurvey participants represented the population of interest. The participation rate was 50%–64% and there were several factors which may have caused selection bias, as survey participation may correlate with the likelihood of SARS-CoV-2 infection. First, during the first two weeks, the surveys targeted only few of the largest municipalities. These had the highest numbers of COVID-19 cases, which may lead to overestimating the seroprevalence and thus the underreporting. However, an analysis using data only from the largest municipality (Helsinki) showed similar estimates of underreporting. Secondly, the participation rate in younger age groups (18–29) was lower than in other age groups. Age is likely associated with the incidence of SARS-CoV-2 infection due to differences in social behaviour. In April 2020, Finns aged 18–29 had a similar frequency of daily social contacts than those aged 30–59, but a higher frequency of contacts than those aged 60–69 [18]. The underrepresentation of young adults in our study can lead to underestimation of the seroprevalence and of underreporting. Third, in several population health examination surveys, participation rates have been found lower among individuals with lower education [19]. Those individuals often work in professions where working remotely and social distancing may be more difficult to arrange, and thus they may be more exposed to infection. If those previous findings hold in this survey, this can lead to underestimation of seroprevalence and thus underreporting. Fourth, historically, the participation rate in Finnish health examination surveys has been lower in language groups other than Finnish and Swedish [20]. The incidence of COVID-19 during the first epidemic wave was several times higher in language groups other than Finnish, Swedish, English or Russian (S6 Fig). However, as the target population of our study includes only those four language groups, we do not believe that the possible underrepresentation of language groups other than Finnish and Swedish is likely to bias our results. Finally, our preliminary analyses from the serosurveys beyond the first wave indicate that subjects with a past confirmed SARS-CoV-2 infection tend to have a lower participation rate. It is possible that those with a confirmed infection were less willing to participate. This can lead to underestimation of the seroprevalence and of underreporting.

Our study was limited to those 18–69 years old. For ethical reasons, the elderly most vulnerable to severe COVID-19 were not invited to participate during the beginning of the epidemic as participation required a medical site visit and therefore could increase the risk of infection with SARS-CoV-2. Children were not invited due to difficulties in obtaining informed consent from minors. The median age of COVID-19 cases in the HUS area showed a decreasing trend during spring 2020, most likely due to the increase in testing capacity, allowing detection of milder disease cases (S7 Fig). It is therefore likely that the underreporting was both higher and decreased more in the younger age groups during the first epidemic wave. Other serological studies have used regression analysis or post stratification to account for differences in the age and sex distributions between the survey participants and the underlying population [3, 4, 21]. These methods could help reduce bias and allow for the estimation of age-dependent underreporting. We decided not to use such analytical methods due to the very small number of confirmed positive samples.

Another key assumption in our analysis was that the time-dependent probability of seroconversion after COVID-19 symptoms, as estimated from the external data set from Tan et al., is similar to how the antibody detection in the serological surveys would perform. Otherwise, the underreporting ratio, i.e. the ratio of the estimated seroprevalence (based on serosurveys) and the projected seroprevalence (based on COVID-19 cases) may not accurately describe underreporting. The patients in Tan et al. were all hospitalised and several of them were classified with severe pneumonia. By contrast, the majority of the FNIDR COVID-19 cases during the first epidemic wave did not require hospital care. Severe cases may have higher antibody responses, and this may cause us to overestimate the projected seroprevalence and hence underestimate the underreporting [22]. Additionally, the SARS-COV-2 antibody detection method utilised in Tan et al. differed from the methods utilised in the serosurveys. The serosurvey antibody detection was calibrated to be 100% sensitive by day 30 since infection. By contrast, in Tan et al., only 74% of the patients had seroconverted by day 28 since symptom onset, and accordingly, our seroprevalence projection yielded approximately 75% probability of seroconversion by day 30 since symptom onset. This discrepancy indicates that we may overestimate the time lag to developing detectable antibodies after COVID-19 symptoms. This in turn indicates that we may overestimate the underreporting during the beginning of the epidemic, at worst by a factor of around 0.75. Therefore, instead of 4–17 there were perhaps 3–13 infections for every COVID-19 case before April.

When projecting the seroprevalence, we assumed that the probability of seropositivity following COVID-19 symptoms is strictly increasing over time. In reality, the antibody levels eventually wane and after 8 months since SARS-CoV-2 infection, the N-IgG antibodies are detectable in only 66% of individuals [22]. Our analysis covers a period of four months, and there were not many infections in Finland before March 2020, so at worst we measured antibodies from serosurvey participants who were infected four months ago. The detectability of antibodies would then be at least 66% and possibly over 80%, assuming a linear decrease from the 100% detectability at one month. By contrast, our seroprevalence projection gives an almost 100% probability of seropositivity at four months since COVID-19 symptom onset. This worst-case discrepancy would correspond to overestimating the underreporting by 20% at the end of the study period. To analyse data beyond the first epidemic wave, the seroprevalence projection should be modified to allow for a decrease in the probability of seropositivity after appropriate time.

We included an analysis based on the screening test to demonstrate how our method can be used with tests which are not fully specific. The estimates of seroprevalence based on the screening test were lower than those based on the confirmation test, when adjusting for the expected false positive rate of the screening test. This implies that either the specificity of the screening test was higher than expected, or alternatively, the confirmation test was not fully specific. The confirmation test utilises a microneutralisation test (MNT) as the second test to confirm the presence of SARS-CoV-2 antibodies. Based on an analysis of a large number of pre-pandemic sera from different age cohorts, the MNT can be considered to be fully specific [10]. It is therefore extremely unlikely that any of our 7 confirmed positives samples was a false positive; more likely the true specificity of the screening test was higher than we assumed. In our analysis, we assumed that the specificity of the screening test was a known constant, based on a 81/83 true negative finding. In reality, however, there is uncertainty in the exact specificity, and the results derived from the screening test therefore have more uncertainty than our analysis implies. For analysing data from a test with unknown specificity, we agree with treating the specificity as an unknown parameter, as recommended by Gelman and Carpenter, and as implemented by e.g. Stringhini et al. [4, 23].

We used a constant value 3.5 days as the delay from symptom onset to COVID-19 diagnosis, which was based on expert evaluation and information available during early 2020. In reality, the exact delay is unknown and subject to variation. It is likely that 3.5 days is a reasonable estimate for the average delay in the Helsinki-Uusimaa region during spring 2020, as according to internal infection tracking data at our institute, in the capital city (Helsinki) the delay was close to 6 days in March 2020, close to 4 days in April and close to 3 days in May. Small variations in this delay do not affect our analysis, as small changes in the COVID-19 symptom onset day would not significantly alter the seroprevalence projection. Our results are therefore not sensitive to small changes in the choice of delay.

Our results imply that the spread of SARS-CoV-2 infection was very limited in Finland during spring 2020 compared to other European countries, as seroprevalence was still likely under 1% in the densely populated Helsinki-Uusimaa region by the beginning of June. For example in Spain seroprevalence was likely over 10% around Madrid by May 11 [3], and in Geneva, Switzerland, it was 10.8% (8.2–13.9) by May 9th. Finland had the advantage of being slightly isolated from main land Europe and therefore the epidemic started a few weeks later, giving more time to implement social distancing. The general public’s compliance with epidemic recommendations was likely very high. There was a large reduction in the daily numbers of social contacts in the early part of the 2020 COVID-19 epidemic in Finland, which was likely a major contributor to the steady decline of the epidemic in the country [18].

In summary, we presented a Bayesian approach to estimate the time-dependent underreporting of SARS-CoV-2 infections during the COVID-19 epidemic. We implemented the proposed approach to data from adults living in the Helsinki-Uusimaa region of Finland during the first epidemic wave in 2020. The analysis we here describe can also be applied in real time, and our method informed about the spread, detection, and severity of SARS-CoV-2 infection in Finland during 2020. Our results indicate that most SARS-CoV-2 infections were not detected and the underreporting was most severe during the beginning of the epidemic. However, as the cumulative incidence of COVID-19 was very low, it is likely that less than 1.5% of the population in the Helsinki-Uusimaa region had been infected with SARS-CoV-2 by the beginning of July 2020. Assuming that the underreporting was similar in other parts of the country and in children and the elderly, the first wave of the COVID-19 epidemic left a vast majority of the Finnish population unaffected, with almost the entire population still unexposed and susceptible to SARS-CoV-2.

Supporting information

S1 Table. Parameters of the prior distribution in the estimation model, and the specificities of the screening and confirmation tests.

(PDF)

Click here for additional data file.^{(67.2KB, pdf)}

S2 Table. Influence of choices of hyperparameters on the estimation of underreporting ratio Δ(t).

Shown are posterior means and 95% credible intervals for Δ(t), based on the confirmation test data, for 9th April 2020 (t = 0), 28th May 2020 (t = 49) and 2nd July 2020 (t = 84), using different values for the parameters μ₁, σ₁, and β. The value used for the parameter α was 2.

(PDF)

Click here for additional data file.^{(76.1KB, pdf)}

S1 Fig. Age distributions of study sub-populations.

Age distributions of: population in the Helsinki-Uusimaa region at the end of 2021 (HUS); COVID-19 cases for the HUS population during the first wave of the COVID-19 epidemic in 2020 (FNIDR); the study population, i.e. the target population of the current study (HUS (incl.)); COVID-19 cases from the study population during the first wave of the COVID-19 epidemic in 2020 (FNIDR (incl.)); serological survey participants from the study population during the first wave (Serosurveys).

(TIF)

Click here for additional data file.^{(435KB, tif)}

S2 Fig. The serological survey antibody tests and their performances on the calibration data.

The screening test is the result of the IgG antibody test, which may give false positive results. The confirmation test is a combination of the IgG and microneutralization tests (MNT), where the IgG positive samples are tested again with the MNT. After optimizing performance on the calibration data, which includes samples from PCR positive and negative individuals, the sensitivity and specificity of the screening test are 33/33 (100%) and 81/83 (97.59%), respectively, while the sensitivity and specificity of the confirmation test are 33/33 (100%) and 83/83 (100%), respectively.

(TIF)

Click here for additional data file.^{(48.1KB, tif)}

S3 Fig. Estimation model seroprevalence prior distribution.

Prior mean, and 2.5% and 97.5% quantiles for each weekly seroprevalence $π_{w}^{(0)}$ in the Estimation model. The estimates were computed based on 40000 samples generated from the prior distribution of π.

(TIF)

Click here for additional data file.^{(463.6KB, tif)}

S4 Fig. Time from COVID-19 symptom onset to seroconversion.

The three images show, starting from the the left: the posterior distribution for μ_U, the posterior distribution for σ_U, and the posterior predictive distribution for U, the time from COVID-19 symptom onset to seroconversion. The distribution for U was obtained by sampling from the lognormal distribution, using samples from the joint posterior distribution for (μ_U, σ_U).

(TIF)

Click here for additional data file.^{(247.8KB, tif)}

S5 Fig. Prior and posterior distributions for the parameter σ.

Image on the left shows the prior distribution, the middle image shows the posterior distribution based on confirmation test data, and the image on the right shows the posterior distribution based on the screening test data.

(TIF)

Click here for additional data file.^{(264.8KB, tif)}

S6 Fig. Incidence of COVID-19 cases in the Helsinki-Uusimaa region by age group and language during the first wave of the epidemic in 2020.

The language groups are Finnish (fi), Swedish (sv), English (en), Russian (ru) and other.

(TIF)

Click here for additional data file.^{(349.8KB, tif)}

S7 Fig. Age distribution of COVID-19 cases in the Helsinki-Uusimaa region during the first wave of the COVID-19 epidemic in 2020.

(TIF)

Click here for additional data file.^{(437.1KB, tif)}

Acknowledgments

We thankfully acknowledge the fluent collaboration with the Digital and Population Data Services Agency DVV for access to the Finnish Population Information System (PIS) and especially for HUS Diagnostic Center HUSLAB for study sample logistics. We thank all the study participants. We thank Juha Oksanen, Esa Ruokokoski, Elina Isosaari, Niina Ikonen, Dennis Ahlfors, Timo Koskenniemi, Nina Ekström, Pamela Österlund, Anu Haveri and Camilla Virta for their contributions related to data management and analyses.

Data Availability

Aggregated data which supports the findings of the study are included within the paper. Data sufficient to approximately reproduce the main results are also available online in machine readable format (csv) here: github.com/TuomoNieminen/covid19underreporting. Individual level data are not available due to data privacy by Finnish law. More information may be requested from the Finnish Institute for Health and Welfare by contacting kirjaamo@thl.fi.

Funding Statement

This study was funded internally by the Finnish Institute for Health and Welfare with state budget item for SARS-CoV-2 studies. The authors were employees of the Finnish Institute for Health and Welfare, but the funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1.Helsinki University Hospital. Kaikkia koronavirusepäilyjä ei enää testata; 2020. https://www.hus.fi/ajankohtaista/kaikkia-koronavirusepailyja-ei-enaa-testata.
2. Jarva H, Lappalainen M, Luomala O, Jokela P, Jääskeläinen AE, Jääskeläinen AJ, et al. Laboratory-based surveillance of COVID-19 in the Greater Helsinki area, Finland, February-June 2020. International Journal of Infectious Diseases. 2021;104:111–116. doi: 10.1016/j.ijid.2020.12.038 [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Pollán M, Pérez-Gómez B, Pastor-Barriuso R, Oteo J, Hernán MA, Pérez-Olmeda M, et al. Prevalence of SARS-CoV-2 in Spain (ENE-COVID): a nationwide, population-based seroepidemiological study. The Lancet. 2020;396(10250):535–544. doi: 10.1016/S0140-6736(20)31483-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Stringhini S, Wisniak A, Piumatti G, Azman AS, Lauer SA, Baysson H, et al. Seroprevalence of anti-SARS-CoV-2 IgG antibodies in Geneva, Switzerland (SEROCoV-POP): a population-based study. The Lancet. 2020;396(10247):313–319. doi: 10.1016/S0140-6736(20)31304-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Erikstrup C, Hother CE, Pedersen OBV, Mølbak K, Skov RL, Holm DK, et al. Estimation of SARS-CoV-2 Infection Fatality Rate by Real-time Antibody Screening of Blood Donors. Clinical Infectious Diseases. 2020;72:249–253. doi: 10.1093/cid/ciaa849 [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Tan W, Lu Y, Zhang J, Wang J, Dan Y, Tan Z, et al. Viral Kinetics and Antibody Responses in Patients with COVID-19. medRxiv. 2020. [Google Scholar]
7.Digital and population data service agency. Population Information System; 2021. https://dvv.fi/en/population-information-system. Available from: https://dvv.fi/en/population-information-system.
8.Finnish Institute for Health and Welfare. Finnish National Infectious Diseases Register; 2021. https://thl.fi/en/web/infectious-diseases-and-vaccinations/surveillance-and-registers/finnish-national-infectious-diseases-register.
9.Finnish Institute for Health and Welfare. Serological population study of the coronavirus epidemic; 2020. https://thl.fi/en/web/thlfi-en/research-and-development/research-and-projects/serological-population-study-of-the-coronavirus-epidemic.
10. Ekström N, Virta C, Haveri A, Dub T, Hagberg L, Solastie A, et al. Analytical and clinical evaluation of antibody tests for SARS-CoV-2 serosurveillance studies used in Finland in 2020. medRxiv. 2021 [Google Scholar]
11. Haveri A, Smura T, Kuivanen S, Österlund P, Hepojoki J, Ikonen N, et al. Serological and molecular findings during SARS-CoV-2 infection: the first case study in Finland, January to February 2020. Eurosurveillance. 2020;25(11). doi: 10.2807/1560-7917.ES.2020.25.11.2000266 [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Homan MD, Gelman A. The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo. J Mach Learn Res. 2014;15(1):1593–1623. [Google Scholar]
13. Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, et al. Stan: A probabilistic programming language. Journal of statistical software. 2017;76(1):1–32. doi: 10.18637/jss.v076.i01 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.R Core Team. R: A Language and Environment for Statistical Computing; 2021. Available from: https://www.R-project.org/.
15.Stan Development Team. RStan: the R interface to Stan; 2020. http://mc-stan.org. Available from: http://mc-stan.org.
16. Gudbjartsson DF, Helgason A, Jonsson H, Magnusson OT, Melsted P, Norddahl GL, et al. Spread of SARS-CoV-2 in the Icelandic Population. New England Journal of Medicine. 2020;382(24):2302–2315. doi: 10.1056/NEJMoa2006100 [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Ma Q, Liu J, Liu Q, Kang L, Liu R, Jing W, et al. Global Percentage of Asymptomatic SARS-CoV-2 Infections Among the Tested Population and Individuals With Confirmed COVID-19 Diagnosis: A Systematic Review and Meta-analysis. JAMA Network Open. 2021;4(12):e2137257–e2137257. doi: 10.1001/jamanetworkopen.2021.37257 [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Auranen K, Shubin M, Karhunen M, Sivelä J, Leino T, Nurhonen M. Social Distancing and SARS-CoV-2 Transmission Potential Early in the Epidemic in Finland. Epidemiology. 2021;32(4):525–532. doi: 10.1097/EDE.0000000000001344 [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Härkänen T, Karvanen J, Tolonen H, Lehtonen R, Djerf K, Juntunen T, et al. Systematic handling of missing data in complex study designs-experiences from the Health 2000 and 2011 Surveys. Journal of Applied Statistics. 2016;43(15):2772–2790. doi: 10.1080/02664763.2016.1144725 [DOI] [Google Scholar]
20. Tolonen H, Koponen P, Borodulin K, Männistö S, Peltonen M, Vartiainen E. Language as a determinant of participation rates in Finnish health examination surveys. Scandinavian Journal of Public Health. 2018;46(2):240–243. doi: 10.1177/1403494817725243 [DOI] [PubMed] [Google Scholar]
21. Merkely B, Szabó AJ, Kosztin A, Berényi E, Sebestyén A, Lengyel C, et al. Novel coronavirus epidemic in the Hungarian population, a cross-sectional nationwide survey to support the exit policy in Hungary. GeroScience. 2020;42(4):1063–1074. doi: 10.1007/s11357-020-00226-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Haveri A, Ekström N, Solastie A, Virta C, Österlund P, Isosaari E, et al. Persistence of neutralizing antibodies a year after SARS-CoV-2 infection in humans. European Journal of Immunology. 2021;51:3202–3213. doi: 10.1002/eji.202149535 [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Gelman A, Carpenter B. Bayesian analysis of tests with unknown specificity and sensitivity. Journal of the Royal Statistical Society: Series C (Applied Statistics). 2020;69(5):1269–1283. doi: 10.1111/rssc.12435 [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0282094.r001

Decision Letter 0

Timothy J Wade

21 Mar 2023

PONE-D-22-33699Underreporting of SARS-CoV-2 infections during the first wave of the 2020 COVID-19 epidemic in Finland - Bayesian inference based on a series of serological surveysPLOS ONE

Dear Dr. Nieminen,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. In your revised manuscript, please pay particular attention to the issues raised by Reviewer 3 regarding the model being poorly estimated. Also note that although Reviewer 2 raises concerns about the novelty of the approach, PLOS One manuscripts are evaluated on the basis of methodological rigor and high ethical standards, regardless of perceived novelty.

Please submit your revised manuscript by May 05 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Timothy J Wade, Ph.D

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that your ethics statement indicates "The study protocol was submitted for ethical review to the ethical review board of the Hospital District of Helsinki and Uusimaa. Written informed consent was obtained from all participants of the serological surveys." To ensure that your submission complies with our policy on human subject research (https://journals.plos.org/plosone/s/human-subjects-research) please clarify in the methods section of the manuscripts whether the ethical review board of the Hospital District of Helsinki and Uusimaa approved this study. If applicable please provide approval numbers.

3. Thank you for stating the following financial disclosure:

“This study was funded by the Finnish Institute for Health and Welfare.”

Please state what role the funders took in the study. If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

If this statement is not correct you must amend it as needed.

Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

4. Thank you for stating the following in the Competing Interests section:

“We report no conflict of interests related to the current work. Finnish Institute for Health and Welfare (THL) conducts Public-Private Partnership with vaccine manufacturers and has received research funding from Sanofi Inc., Pfizer Inc., and GlaxoSmithKline Biologicals SA for studies not related to COVID-19. Nieminen, Melin, Palmu and Jokinen have been investigators in these studies, but they have received no personal remuneration.”

Please confirm that this does not alter your adherence to all PLOS ONE policies on sharing data and materials, by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests). If there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

Please include your updated Competing Interests statement in your cover letter; we will change the online submission form on your behalf.

5. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

6. We note that you have included the phrase “data not shown” in your manuscript. Unfortunately, this does not meet our data sharing requirements. PLOS does not permit references to inaccessible data. We require that authors provide all relevant data within the paper, Supporting Information files, or in an acceptable, public repository. Please add a citation to support this phrase or upload the data that corresponds with these findings to a stable repository (such as Figshare or Dryad) and provide and URLs, DOIs, or accession numbers that may be used to access these data. Or, if the data are not a core part of the research being presented in your study, we ask that you remove the phrase that refers to these data.

7. Your ethics statement should only appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please move it to the Methods section and delete it from any other section. Please ensure that your ethics statement is included in your manuscript, as the ethics statement entered into the online submission form will not be published alongside your manuscript.

8. We note that Figures 1 and 2 in your submission contain map images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission:

a. You may seek permission from the original copyright holder of Figures 1 and 2 to publish the content specifically under the CC BY 4.0 license.

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission.

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

b. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.

The following resources for replacing copyrighted map figures may be helpful:

USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/

The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/

Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html

NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/

Landsat: http://landsat.visibleearth.nasa.gov/

USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/#

Natural Earth (public domain): http://www.naturalearthdata.com/

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

Reviewer #4: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: To assess under-reporting, the authors compared SARS-CoV-2 infections from serological surveys to reported COVID-19 illnesses during the first pandemic wave in Finland. Their analyses are rigorous, clearly described, and results are comparable to others (e.g., JAMA 2021; 326:1400-09, Lancet Reg Health Am 2023; 18:100403). As severity increases with co-morbidities, for which age is a proxy, and reporting increases with severity, I imagine that under-reporting of childhood infections was disproportionate. Could the authors stratify by age?

Reviewer #2: Review of “Underreporting of SARS-CoV-2 infections during the first wave of the 2020 COVID-19 epidemic in Finland – Bayesian inference based on a series of serological surveys”

by Tuomo A. Nieminen, Kari Auranen, Sangita Kulathinal, Tommi Härkänen, Merit Melin, Arto A. Palmu, Jukka Jokinen

Summary

In this manuscript, a Bayesian approach is used to estimate the underreporting of SARS-CoV-2 infections during the first wave of COVID-19 in Finland, that is, from March to June 2020. The analysis is based on a series of serological surveys.

It is estimated by the authors that there were 1 to 5 infections for every detected case during the first wave. Reporting is estimated to have been much poorer during the first months of this period (before April) with 4 to 17 infections for every detected case.

General comments

To estimate the underreporting of a disease is particularly important when the number of asymptomatic cases is high, as it is known to be the case for the COVID-19 disease. To do so, to use a series of serological surveys, when available as it is the case here, is a proper choice and to perform the analyses based on a Bayesian inference method appears also well designed for this purpose.

However, although the paper is well written and the analysis well driven, I found the results of rather limited reach. Maybe this methodology was not applied yet to COVID-19 in Finland, but it is rather common in itself (see e.g. [1-2]), and it has been applied to many other countries by other authors since the beginning of the pandemic (e.g. [3-5]). From our point of view, methodologically speaking, the approach is not sufficiently new to deserve a publication in PLoS.

From an epidemiological point of view, these results are nice and of some interest, but to make it really useful, one would expect to have these results put in perspective with other factors and/or with explanations on the behaviors specifically observed.

In particular, results appear rather different from what was observed in other countries in Europe but the present analysis does not help to understand these differences. For instance, the prevalence in Finland is significantly lower than most of the other European countries. Can the underestimation contribute to explain such a behavior? Or may it result from dynamical reasons (a model was recently obtained showing that, some epidemiological systems can have a very different time evolution in amplitude under strictly the same sanitary conditions [6]) or due to specific policies (as it has been the case in several Asian countries)?

Here, a retrospective analysis is performed by the authors based on a serological survey, but such a serological was not available at the very beginning of the epidemic. To cope with this difficulty, other authors have used more basic approaches based on case fatality ratio [7]. What would have been the effect of such a rough approach in comparison to the (more robust) Bayesian approach here used?

I think these types of questions will deepen the investigations and make the discussions and the work interesting to a wider audience.

Despite its technical interest and quality, at this stage, I don’t think the present work can help much to understand the behavior observed in Finland in comparison to other countries in Europe or in the world. For this reason, I cannot recommend it for publication in a PLoS journal.

References

[1] M. Dvorzak and H. Wagner, Sparse Bayesian modelling of underreported count data, Statistical Modelling, 2016, 16, 24-46.

[2] Turbé H, Bjelogrlic M, Robert A, Gaudet-Blavignac C, Goldman JP, Lovis C. Adaptive Time-Dependent Priors and Bayesian Inference to Evaluate SARS-CoV-2 Public Health Measures Validated on 31 Countries. Front Public Health, 2021, 8, 583401.

[3] Lope DJ, Demirhan H. 2022. Spatiotemporal Bayesian estimation of the number of under-reported COVID-19 cases in Victoria Australia. PeerJ, 10, e14184 http://doi.org/10.7717/peerj.14184

[4] Paixão B, Baroni L, Pedroso M, Salles R, Escobar L, de Sousa C, de Freitas Saldanha R, Soares J, Coutinho R, Porto F, Ogasawara E. Estimation of COVID-19 Under-Reporting in the Brazilian States Through SARI. New Gener Comput., 2021, 39(3-4), 623-645.

[5] Ricardo Cao & José E. Chacón (2022) Introduction to the special issue on Data Science for COVID-19, Journal of Nonparametric Statistics, 34(3), 555-569.

[6] Thenon N, Peyre M, Huc M, Touré A, Roger F, Mangiarotti S (2022) COVID-19 in Africa: Underreporting, demographic effect, chaotic dynamics, and mitigation strategy impact. PLoS Negl. Trop. Dis., 16(9), e0010735.

[7] Russell Timothy W , Hellewell Joel , Jarvis Christopher I , van Zandvoort Kevin , Abbott Sam , Ratnayake Ruwan , CMMID COVID-19 working group , Flasche Stefan, Eggo Rosalind M , Edmunds W John , Kucharski Adam J . Estimating the infection and case fatality ratio for coronavirus disease (COVID-19) using age-adjusted data from the outbreak on the Diamond Princess cruise ship, 2020. Euro Surveill. 2020, 25(12), 2000256. https://doi.org/10.2807/1560-7917.

Detailed comments

*p.2 “If the virus causes clinical disease, the rate of […]”: Maybe I would say instead “If the virus causes specific clinical disease, the rate of […]”?

*p.3 line 26 “the numbers of COVID-19 cases”: To avoid any misunderstanding I’d say “the numbers of COVID-19 new cases”

*p.4 line 1: Ref. [2] has been published now, since more than one year now… the authors did not update their bibliography before submitting the manuscript. See https://doi.org/10.1016/j.ijid.2020.12.038

*p.4 line 53-60: Indeed, it is important to have information from other countries for comparison.

*p.4 line 57-58 “[…] estimated that there were 11 SARS-Cov-2 infections for every COVID-19 case.”: Maybe I’d say “detected cases” instead of just “case”.

*p.4, line 73: Here also, the Ref. [6] was not updated. See 10.1172/JCI138759

*lines 72-85: it seems it is here that you explain what will be investigated in the present study. But it is not very clear, we have to deduce it by reading the two paragraphs. I think, you should state it more clearly, maybe with “in this paper” at line 72 “To better address the delays in antibody responses, in this paper, we utilise […] »

*line 81-83 “The novelty of our methodology is in accounting for the uncertainty in the time lag from disease symptoms to seroconversion when estimating the time-evolving underreporting of infections. Our analysis shows how the underreporting of SARS-CoV-2 infections evolved over time during the first epidemic wave in Finland.”: It appears to be the main contribution of this paper. I find it rather narrow for an international publication.

* lines 98-100 “These data consist of COVID-19 cases notified as either a positive SARS-CoV-2 finding from a microbiological laboratory or a clinical diagnosis by a medical doctor.”: If the two sources of information are separated, the analysis could be performed on the two datasets to investigate the robustness of the analysis.

Reviewer #3: General comments

This paper focusses on the underreporting of SARS-CoV-2 during the first wave of the 2020 epidemic in Finland. It focusses on Bayesian inference based on serological surveys. It uses different data sources to identify infection rates.

The paper is well written and clear mostly. The method is applied to data for age group 18 – 69 but doesn’t included older people. Older people ,possibly in care homes, could have a higher risk of spread or infection. There must be a greater explanation/justification for the exclusion.

Page 4 what is the extended capital region? Its not defined anywhere.

The focus is the estimation of the underreporting ratio which is a function of seroprevalence divided by case count. The seroprevalence is assumed to have a random walk prior distribution in the logit of the prevalence, and the count of tests is used to estimate the posterior distribution of the seroprevalence; whereas the case count is used in a binary model and the underlying

Table 3 displays the results of estimation for the seroprevalence from surveys and case counts.

There is an issue about this table however. A credible intervals is shown for the ratio (underreporting) but the two data streams are modelled separately. How can an interval be constructed for the ratio when these models are separately run using MCMC. The samples cant be shared.

An interpretation issue: according to Table 3 while the ratio stabilizes from June onwards (around 2.5) the credible interval crosses 1.0 and so the underreporting is poorly estimated. This is not mentioned but is a serious problem.

In general for a ratio to be estimated with a credible interval the ratio should have been computed within a joint model for seroprevalence and case counts.

Finally I note that various prior parameters are assumed for the distributions included in the models and these are given in Table S1 for the estimation model, and while Table S2 shows effects of varying some of these, it is noticeable that at later time the credible interval for the ratio crosses 1.0 for most entries and so the underreporting ratio is poorly estimated.

Minor Comments

Abstract mentions ‘extended capital region’ its not clear what this is ?

Abstract: infection statistics are not really necessary in the abstract . These can be removed.

Page 5 line 105 Notifications of what? Its not defined.

Page 5 line 109 How were the symptom onset delays estimated over time? This is not explained

Page 10 line 219 ‘…have been observed…’

Page 12 line 260-261 are sigma and sigma_1 the same or not?

Reviewer #4: The authors use Bayesian Inference and three different sources of data – COVID 19 cases, serological surveys, and external data on antibody development to estimate time-dependent underreporting of COVID-19 cases during the first wave of the COVID-19 epidemic in Finland.

The authors measure the underreporting of SARS-CoV-2 infections as the ratio of two seroprevalences – (i) based on observations from the serosurveys, and (ii) estimated using the reported COVID-19 incidence and data on antibody development. The paper is interesting, written in detail and with sound modeling and analysis.

Some minor comments:

Abstract line 15: change ‘external data’ to ‘external data on antibody development’

How is the value of the delay from symptom onset to diagnosis, C set at 3.5 days? The authors later mention that the result is very sensitive to this value. However, some explanation as to why 3.5 days was chosen is warranted.

Provide some details on how the prior for (\\sigma and other parameters) are chosen? This could be done in the Estimation Model section or the Sensitivity analysis section. It is mentioned in the section ‘Sensitivity analysis’ that the data is not very informative about some parameters making the selection of model priors more salient.

Line 328-332: This could probably go in the Appendix/Supplementary Information

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Reviewer #4: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Jun 23;18(6):e0282094. doi: 10.1371/journal.pone.0282094.r002

Author response to Decision Letter 0

1 Jun 2023

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

TN: Our comments are prefixed with “TN: “. The line references in our comments refer to the manuscript file (without tracked changes).

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

TN: We have used PLOS ONE’s style including references to figures and tables, and those related to supplementary files. We have added the corresponding author initials.

TN: We have clarified in the methods section as follows: ”The study protocol was approved by the ethical committee of the Hospital District of Helsinki and Uusimaa (HUS/1137/2020).”

3. Thank you for stating the following financial disclosure:

“This study was funded by the Finnish Institute for Health and Welfare.”

If this statement is not correct you must amend it as needed.

Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

TN: We have clarified the financial disclosure statement in the cover letter.

4. Thank you for stating the following in the Competing Interests section:

Please include your updated Competing Interests statement in your cover letter; we will change the online submission form on your behalf.

TN: We have included an updated version of the competing interests statement in the cover letter.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

TN: We have included an updated data availability statement in the cover letter.

TN: The phrases “data not shown” were in reference to data which were not a core part of the research presented. We removed those phrases from the manuscript.

TN: We have included an ethics statement within the manuscript at the end of the Methods section.

We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission:

a. You may seek permission from the original copyright holder of Figures 1 and 2 to publish the content specifically under the CC BY 4.0 license.

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission.

The following resources for replacing copyrighted map figures may be helpful:

USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/

The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/

Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html

NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/

Landsat: http://landsat.visibleearth.nasa.gov/

USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/#

Natural Earth (public domain): http://www.naturalearthdata.com/

TN: The map figures 1 and 2 were created by the authors, using open-source software (R program). The figures are not previously copyrighted.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

Reviewer #4: Yes

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

3. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

4. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

5. Review Comments to the Author

TN: Thank you for the comments. The serological surveys targeted adults only and we have no data on children. We have now clarified in the discussion that “Our study was limited to those 18–69 years old. For ethical reasons, the elderly most vulnerable to severe COVID-19 were not invited to participate during the beginning of the epidemic as participation required a medical site visit and therefore could increase the risk of infection with SARS-CoV-2. Children were not invited due to difficulties in obtaining informed consent from minors.”. (lines 505-).

TN: We also note that it indeed is likely that underreporting was higher in younger age groups, as the detected covid-19 cases showed a decreasing trend in age: “It is therefore likely that the underreporting was both higher and decreased more in the younger age groups during the first epidemic wave.” (lines 511-). We also updated the last paragraph of the discussion to further note the age limitation in our study.

TN: Stratification by age is generally a good suggestion. However, the very low number of confirmed positive samples available (only 7 total) restricts adjusted or stratified analyses in our case, as we comment in the discussion: “Other serological studies have used regression analysis or post stratification to account for differences in the age and sex distributions between the survey participants and the underlying population … We decided not to use such analytical methods due to the very small number of confirmed positive samples.” (lines 517-).

Reviewer #2: Review of “Underreporting of SARS-CoV-2 infections during the first wave of the 2020 COVID-19 epidemic in Finland – Bayesian inference based on a series of serological surveys”

by Tuomo A. Nieminen, Kari Auranen, Sangita Kulathinal, Tommi Härkänen, Merit Melin, Arto A. Palmu, Jukka Jokinen

Summary

General comments

TN: We agree that utilizing serological data to assess underreporting is not novel, however our analysis does include methodological novelty in terms of incorporating data on the time lag to developing antibodies. Our research work is worth publishing; our study clearly shows that in Finland the spread of SARS-CoV-2 was very limited during the early phases of the pandemic.

TN: Our analysis is retrospective, but the methodology we describe can also be applied in real time to accumulating data. In fact, we originally applied the method in real-time during the beginning of the 2020 COVID-19 epidemic in Finland, and the results from our analysis informed about the spread of SARS-CoV-2 in Finland during that time. We added the following sentence to the end of the Discussion: “The analysis we here describe can also be applied in real time, and our method informed about the spread, detection, and severity of SARS-CoV-2 infection in Finland during 2020.” (lines 596-)

TN: We agree that the comparison of the seroprevalence estimates to those from other countries is relevant, and the possible reasons are worth discussing. Finland had the advantage of being slightly isolated from mainland Europe and thus the epidemic started a few weeks later and never really developed into much of an epidemic during spring 2020, as implicated by our study.

TN: The general public’s compliance with recommendations was likely very good due to Finland being a high-trust society. The political decisions during spring 2020 were also rather extreme, for example the whole extended capital area (Helsinki-Uusimaa region) was isolated from the rest of the country for several weeks. The result was that there was a large reduction in the daily numbers of social contacts in the early part of the 2020 COVID-19 epidemic in Finland, which was likely a major contributor to the steady decline of the epidemic in the country (Auranen et al. 2021). We have now added a paragraph to the end of discussion where we discuss the differences to other European countries as well as the possible reasons for these differences (lines 582-).

TN: Of note, another study utilising the same data shows that the prevalence of infection-induced antibodies remained at < 7% in Finland until the emergence of the Omicron variant at the end of 2021 (Solastie et al 2023).

Auranen, Kari & Shubin, Mikhail & Karhunen, Markku & Sivelä, Jonas & Leino, Tuija & Nurhonen, Markku. (2021). Social Distancing and SARS-CoV-2 Transmission Potential Early in the Epidemic in Finland. Epidemiology (Cambridge, Mass.). Publish Ahead of Print. 10.1097/EDE.0000000000001344.

Anna Solastie, Tuomo Nieminen, Nina Ekström, Hanna Nohynek, Lasse Lehtonen, Arto A. Palmu, Merit Melin. Changes in SARS-CoV-2 seroprevalence and population immunity in Finland, 2020–2022. medRxiv 2023.02.17.23286042; doi: https://doi.org/10.1101/2023.02.17.23286042

I think these types of questions will deepen the investigations and make the discussions and the work interesting to a wider audience.

TN: We believe that the results implicated by this study are quite interesting. Our study implies that the incidence of SARS-CoV-2 infection was very low in Finland compared to other countries. We also present ideas for methodological development in the evaluation of underreporting of infections. As noted above, we have now added additional discussion related to the differences in seroprevalence compared to other countries (lines 582-).

TN: The serological surveys were started quite quickly in Finland and the first samples were collected during early April 2020, only a month after the epidemic had started off. The analysis which we present here was originally performed in real-time during 2020. We have now added the following sentence to the last paragraph of discussion: “The analysis we here describe can also be applied in real time, and our method informed about the spread, detection, and severity of SARS-CoV-2 infection in Finland during 2020.”

TN: As noted above, there is also additional evidence now that indeed the spread of SARS-CoV-2 remained quite limited in Finland until the emergence of the Omicron variant (Solastie 2023).

References

[1] M. Dvorzak and H. Wagner, Sparse Bayesian modelling of underreported count data, Statistical Modelling, 2016, 16, 24-46.

[3] Lope DJ, Demirhan H. 2022. Spatiotemporal Bayesian estimation of the number of under-reported COVID-19 cases in Victoria Australia. PeerJ, 10, e14184 http://doi.org/10.7717/peerj.14184

[5] Ricardo Cao & José E. Chacón (2022) Introduction to the special issue on Data Science for COVID-19, Journal of Nonparametric Statistics, 34(3), 555-569.

Detailed comments

*p.2 “If the virus causes clinical disease, the rate of […]”: Maybe I would say instead “If the virus causes specific clinical disease, the rate of […]”?

TN: Thank you, we changed the phrase as suggested.

*p.3 line 26 “the numbers of COVID-19 cases”: To avoid any misunderstanding I’d say “the numbers of COVID-19 new cases”

TN: Thank you, we changed the phrase to “the numbers of new COVID-19 cases”.

TN: Thank you, we have updated to reference.

*p.4 line 53-60: Indeed, it is important to have information from other countries for comparison.

TN: We agree. We have now added an additional chapter to the discussion where we compare the seroprevalence observed in our study to those in a few other European countries (lines 582-)

*p.4 line 57-58 “[…] estimated that there were 11 SARS-Cov-2 infections for every COVID-19 case.”: Maybe I’d say “detected cases” instead of just “case”.

TN: We clarified as “… detected COVID-19 case.”.

*p.4, line 73: Here also, the Ref. [6] was not updated. See 10.1172/JCI138759

TN: The publication referenced by 10.1172/JCI138759 is a different publication with a very similar title. The content is different: this newer peer-reviewed publication does not include the data which we reference in our manuscript. Therefore, we keep to the original reference.

TN: Thank you, we clarified as suggested.

TN: The main contribution is understood correctly. However, also the results; that the cumulative incidence of SARS-CoV-2 was very low in Finland during spring 2020 compared to other European countries, are also interesting. We have now added comparisons to the seroprevalences in few other European countries (lines 582-)

TN: Restricting the analysis to the laboratory confirmed COVID-19 cases is a possible sensitivity analysis. However, as we note in the manuscript, the proportion of cases notified as a clinical diagnosis was very low, under 5% “Approximately 95% of the COVID-19 cases during the first epidemic wave in Finland were based on a positive SARS-CoV-2 finding from a polymerase chain reaction (PCR) test” (lines 99-101). Therefore, excluding the cases based on clinical diagnoses could not significantly affect our main results. We respectfully suggest that there is no need for this sensitivity analysis.

Reviewer #3: General comments

TN: Thank you for pointing out the need for justification of the target age group. We have clarified in the discussion as follows: “For ethical reasons, the elderly most vulnerable to severe COVID-19 were not invited to participate during the beginning of the epidemic as participation required a medical site visit and therefore could increase the risk of infection with SARS-CoV-2. Children were not invited due to difficulties in obtaining informed consent from minors.” (lines 505-)

Page 4 what is the extended capital region? Its not defined anywhere.

TN: We have changed “extended capital region” -> “Helsinki-Uusimaa region” in all places.

Table 3 displays the results of estimation for the seroprevalence from surveys and case counts.

TN: Thank you for the comment. Our estimate of underreporting is based on two separate models for seroprevalence. We post-process samples attained from these two separate models to attain samples from the distribution of the underreporting ratio. There should not be any issue with this.

TN: Of note is that the underreporting ratio is not a parameter, but a posterior quantity. This comment let us find that we had misleadingly labeled the underreporting ratio as a parameter on S2 Table, which we have now fixed.

In general for a ratio to be estimated with a credible interval the ratio should have been computed within a joint model for seroprevalence and case counts.

TN: In our current modeling approach, values of underreporting ratio below one have non-zero probability, i.e. they are possible. We can still interpret their meaning as there being no underreporting. We have added to the discussion: “Our estimation approach allowed values of the underreporting ratio below one, which would correspond to there being more COVID-19 cases than SARS-CoV-2 infections. This could occur in theory, in case the diagnosis procedure for COVID-19 (i.e. PCR test) was unspecific and the virus testing was widespread. Nevertheless, we believe this to be unrealistic in our study, and we simply interpret values below one to represent absence of underreporting.” (lines 458-).

TN: It is true that one could build a joint model in which one could incorporate additional assumptions related to the underreporting ratio; for example that it can only take values greater than one. This more complex model could describe the phenomenon more accurately. However, the absence of additional assumptions and the lack of a more complex analytical approach do not necessarily mean, in our opinion, that the underreporting ratio is poorly estimated. We do, however, agree that values of the underreporting ratio below one are unrealistic in this case, as noted above.

TN: We could, of course, choose the prior distributions differently. It would be possible to construct a more informative prior distribution in the Estimation model, which would result in narrower credible intervals for underreporting. We comment more on this in the sensitivity analysis (lines 411-) and also note in the discussion that a more informative prior distribution is a possibility (lines 452-).

TN: Our study describes an analysis performed already during 2020, utilising knowledge/information available at the time, and the results were used to inform about the spread of SARS-CoV-2 in Finland in real time. We now note this fact in the last paragraph of discussion. Since current understanding of the epidemic in Finland is partly based on the data which we are presenting, using that knowledge for constructing very informative prior distributions would not seem appropriate in this case.

TN: Our current approach is a step forward in the methodology of the analysis of underreporting of infections during the early phases of an epidemic, and in the utilisation of different sources of information in such analyses. Future work could focus on yet more complex modelling of the phenomenon.

TN: We agree that the underreporting ratio estimates from later times during the study are unreliable. We comment on this in the discussion as follows: “There is great uncertainty about the estimated seroprevalence and the corresponding estimate of underreporting at the end of the study period, due to the small number of samples available in the serosurveys.”. As noted above, we have now also added a note to the discussion that the Estimation model prior distribution could be more informative with regards the dependency between seroprevalences on consecutive weeks (lines 452-).

TN: See our previous comments regarding the underreporting ratio being poorly estimated.

Minor Comments

Abstract mentions ‘extended capital region’ its not clear what this is ?

TN: We have changed “extended capital region” -> “Helsinki-Uusimaa region” in all places.

Abstract: infection statistics are not really necessary in the abstract . These can be removed.

TN: We think that the implications of the study with regards to the incidence of SARS-CoV-2 infection in Finland during 2020 are important, as the incidence was likely quite different compared to other European countries. Based on other comments, we have added discussion about the differences to other European countries to the second to last paragraph of Discussion.

Page 5 line 105 Notifications of what? Its not defined.

TN: We changed “Notifications” to “Records”, which are described in the previous sentence. Hopefully this is now clearer.

Page 5 line 109 How were the symptom onset delays estimated over time? This is not explained

TN: Thank you, this was indeed a bit unclear. At the time of the original analysis during 2020 (which we are describing in our current study), we only had available an expert evaluation of these delays. We have clarified that “According to expert evaluation during early 2020, the delay from symptom onset to COVID-19 diagnosis was deemed to be on average 3.5 days in the Helsinki-Uusimaa region.” (lines 108-110)

TN: However, later on we could verify the accuracy of this expert evaluation based on internal infection tracking data available at our institute, which allows for estimation in how the delay from symptom onset to diagnosis evolved in the capital city (Helsinki) during 2020. We do not have direct access to these data, but we asked for summary statistics to verify the accuracy of the expert evaluation of the delay. We now note in the Discussion that based on these internal data the expert evaluation was likely reasonably accurate (lines 572-).

Page 10 line 219 ‘…have been observed…’

TN: Thank you, we corrected the typo.

Page 12 line 260-261 are sigma and sigma_1 the same or not?

TN: These are different. We added a clarification to the text that these are indeed different parameters (line 258). Hopefully this helps.

TN: Thank you for the comments.

Some minor comments:

Abstract line 15: change ‘external data’ to ‘external data on antibody development’

TN: We made this change as suggested.

TN: We have now clarified in the methods and discussion that the choice of delay C = 3.5 was based on expert evaluation and data available during early 2020 (lines 108-110). We also mention in the discussion that this was likely a reasonably accurate estimate based on internal infection tracking data from the capital city (Helsinki), and we note that small variations in this delay do not significantly affect our analysis, and our results are not sensitive to small changes in the choice of delay C (lines 572-).

TN: The individual variation in this delay is not accounted for in our analysis. But, unless the individual variation around those approximately 4 days is very significant, the effect to the analysis cannot be significant.

TN: Thank you for the suggestion. We have added a paragraph to the end of the “Estimation model” section, which describes the hyperparameter choices (lines 279-). We also made small edits to the Sensitivity analysis section to further clarify the effects of some possible choices (lines 411-).

Line 328-332: This could probably go in the Appendix/Supplementary Information

TN: This is a fine suggestion, but respectively, we would like to keep references to the computational methods in the main text as we feel that the implementation is of some relevance as well.

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Reviewer #4: No

TN: We used PACE to process our figure files.

Attachment

Submitted filename: ResponseToReviewers.docx

Click here for additional data file.^{(50.7KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0282094.r003

Decision Letter 1

Timothy J Wade

6 Jun 2023

Underreporting of SARS-CoV-2 infections during the first wave of the 2020 COVID-19 epidemic in Finland - Bayesian inference based on a series of serological surveys

PONE-D-22-33699R1

Dear Dr. Nieminen,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Timothy J Wade, Ph.D

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

PLoS One. doi: 10.1371/journal.pone.0282094.r004

Acceptance letter

Timothy J Wade

13 Jun 2023

PONE-D-22-33699R1

Underreporting of SARS-CoV-2 infections during the first wave of the 2020 COVID-19 epidemic in Finland - Bayesian inference based on a series of serological surveys

Dear Dr. Nieminen:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Timothy J Wade

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Table. Parameters of the prior distribution in the estimation model, and the specificities of the screening and confirmation tests.

(PDF)

Click here for additional data file.^{(67.2KB, pdf)}

S2 Table. Influence of choices of hyperparameters on the estimation of underreporting ratio Δ(t).

(PDF)

Click here for additional data file.^{(76.1KB, pdf)}

S1 Fig. Age distributions of study sub-populations.

(TIF)

Click here for additional data file.^{(435KB, tif)}

S2 Fig. The serological survey antibody tests and their performances on the calibration data.

(TIF)

Click here for additional data file.^{(48.1KB, tif)}

S3 Fig. Estimation model seroprevalence prior distribution.

(TIF)

Click here for additional data file.^{(463.6KB, tif)}

S4 Fig. Time from COVID-19 symptom onset to seroconversion.

(TIF)

Click here for additional data file.^{(247.8KB, tif)}

S5 Fig. Prior and posterior distributions for the parameter σ.

(TIF)

Click here for additional data file.^{(264.8KB, tif)}

S6 Fig. Incidence of COVID-19 cases in the Helsinki-Uusimaa region by age group and language during the first wave of the epidemic in 2020.

The language groups are Finnish (fi), Swedish (sv), English (en), Russian (ru) and other.

(TIF)

Click here for additional data file.^{(349.8KB, tif)}

S7 Fig. Age distribution of COVID-19 cases in the Helsinki-Uusimaa region during the first wave of the COVID-19 epidemic in 2020.

(TIF)

Click here for additional data file.^{(437.1KB, tif)}

Attachment

Submitted filename: ResponseToReviewers.docx

Click here for additional data file.^{(50.7KB, docx)}

Data Availability Statement

[pone.0282094.ref001] 1.Helsinki University Hospital. Kaikkia koronavirusepäilyjä ei enää testata; 2020. https://www.hus.fi/ajankohtaista/kaikkia-koronavirusepailyja-ei-enaa-testata.

[pone.0282094.ref002] 2. Jarva H, Lappalainen M, Luomala O, Jokela P, Jääskeläinen AE, Jääskeläinen AJ, et al. Laboratory-based surveillance of COVID-19 in the Greater Helsinki area, Finland, February-June 2020. International Journal of Infectious Diseases. 2021;104:111–116. doi: 10.1016/j.ijid.2020.12.038 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0282094.ref003] 3. Pollán M, Pérez-Gómez B, Pastor-Barriuso R, Oteo J, Hernán MA, Pérez-Olmeda M, et al. Prevalence of SARS-CoV-2 in Spain (ENE-COVID): a nationwide, population-based seroepidemiological study. The Lancet. 2020;396(10250):535–544. doi: 10.1016/S0140-6736(20)31483-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0282094.ref004] 4. Stringhini S, Wisniak A, Piumatti G, Azman AS, Lauer SA, Baysson H, et al. Seroprevalence of anti-SARS-CoV-2 IgG antibodies in Geneva, Switzerland (SEROCoV-POP): a population-based study. The Lancet. 2020;396(10247):313–319. doi: 10.1016/S0140-6736(20)31304-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0282094.ref005] 5. Erikstrup C, Hother CE, Pedersen OBV, Mølbak K, Skov RL, Holm DK, et al. Estimation of SARS-CoV-2 Infection Fatality Rate by Real-time Antibody Screening of Blood Donors. Clinical Infectious Diseases. 2020;72:249–253. doi: 10.1093/cid/ciaa849 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0282094.ref006] 6. Tan W, Lu Y, Zhang J, Wang J, Dan Y, Tan Z, et al. Viral Kinetics and Antibody Responses in Patients with COVID-19. medRxiv. 2020. [Google Scholar]

[pone.0282094.ref007] 7.Digital and population data service agency. Population Information System; 2021. https://dvv.fi/en/population-information-system. Available from: https://dvv.fi/en/population-information-system.

[pone.0282094.ref008] 8.Finnish Institute for Health and Welfare. Finnish National Infectious Diseases Register; 2021. https://thl.fi/en/web/infectious-diseases-and-vaccinations/surveillance-and-registers/finnish-national-infectious-diseases-register.

[pone.0282094.ref009] 9.Finnish Institute for Health and Welfare. Serological population study of the coronavirus epidemic; 2020. https://thl.fi/en/web/thlfi-en/research-and-development/research-and-projects/serological-population-study-of-the-coronavirus-epidemic.

[pone.0282094.ref010] 10. Ekström N, Virta C, Haveri A, Dub T, Hagberg L, Solastie A, et al. Analytical and clinical evaluation of antibody tests for SARS-CoV-2 serosurveillance studies used in Finland in 2020. medRxiv. 2021 [Google Scholar]

[pone.0282094.ref011] 11. Haveri A, Smura T, Kuivanen S, Österlund P, Hepojoki J, Ikonen N, et al. Serological and molecular findings during SARS-CoV-2 infection: the first case study in Finland, January to February 2020. Eurosurveillance. 2020;25(11). doi: 10.2807/1560-7917.ES.2020.25.11.2000266 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0282094.ref012] 12. Homan MD, Gelman A. The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo. J Mach Learn Res. 2014;15(1):1593–1623. [Google Scholar]

[pone.0282094.ref013] 13. Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, et al. Stan: A probabilistic programming language. Journal of statistical software. 2017;76(1):1–32. doi: 10.18637/jss.v076.i01 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0282094.ref014] 14.R Core Team. R: A Language and Environment for Statistical Computing; 2021. Available from: https://www.R-project.org/.

[pone.0282094.ref015] 15.Stan Development Team. RStan: the R interface to Stan; 2020. http://mc-stan.org. Available from: http://mc-stan.org.

[pone.0282094.ref016] 16. Gudbjartsson DF, Helgason A, Jonsson H, Magnusson OT, Melsted P, Norddahl GL, et al. Spread of SARS-CoV-2 in the Icelandic Population. New England Journal of Medicine. 2020;382(24):2302–2315. doi: 10.1056/NEJMoa2006100 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0282094.ref017] 17. Ma Q, Liu J, Liu Q, Kang L, Liu R, Jing W, et al. Global Percentage of Asymptomatic SARS-CoV-2 Infections Among the Tested Population and Individuals With Confirmed COVID-19 Diagnosis: A Systematic Review and Meta-analysis. JAMA Network Open. 2021;4(12):e2137257–e2137257. doi: 10.1001/jamanetworkopen.2021.37257 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0282094.ref018] 18. Auranen K, Shubin M, Karhunen M, Sivelä J, Leino T, Nurhonen M. Social Distancing and SARS-CoV-2 Transmission Potential Early in the Epidemic in Finland. Epidemiology. 2021;32(4):525–532. doi: 10.1097/EDE.0000000000001344 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0282094.ref019] 19. Härkänen T, Karvanen J, Tolonen H, Lehtonen R, Djerf K, Juntunen T, et al. Systematic handling of missing data in complex study designs-experiences from the Health 2000 and 2011 Surveys. Journal of Applied Statistics. 2016;43(15):2772–2790. doi: 10.1080/02664763.2016.1144725 [DOI] [Google Scholar]

[pone.0282094.ref020] 20. Tolonen H, Koponen P, Borodulin K, Männistö S, Peltonen M, Vartiainen E. Language as a determinant of participation rates in Finnish health examination surveys. Scandinavian Journal of Public Health. 2018;46(2):240–243. doi: 10.1177/1403494817725243 [DOI] [PubMed] [Google Scholar]

[pone.0282094.ref021] 21. Merkely B, Szabó AJ, Kosztin A, Berényi E, Sebestyén A, Lengyel C, et al. Novel coronavirus epidemic in the Hungarian population, a cross-sectional nationwide survey to support the exit policy in Hungary. GeroScience. 2020;42(4):1063–1074. doi: 10.1007/s11357-020-00226-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0282094.ref022] 22. Haveri A, Ekström N, Solastie A, Virta C, Österlund P, Isosaari E, et al. Persistence of neutralizing antibodies a year after SARS-CoV-2 infection in humans. European Journal of Immunology. 2021;51:3202–3213. doi: 10.1002/eji.202149535 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0282094.ref023] 23. Gelman A, Carpenter B. Bayesian analysis of tests with unknown specificity and sensitivity. Journal of the Royal Statistical Society: Series C (Applied Statistics). 2020;69(5):1269–1283. doi: 10.1111/rssc.12435 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Underreporting of SARS-CoV-2 infections during the first wave of the 2020 COVID-19 epidemic in Finland—Bayesian inference based on a series of serological surveys

Tuomo A Nieminen

Kari Auranen

Sangita Kulathinal

Tommi Härkänen

Merit Melin

Arto A Palmu

Jukka Jokinen

Roles

Abstract

Introduction

Fig 1. Numbers of COVID-19 cases by week and municipality in the Helsinki-Uusimaa region during the first wave of the 2020 COVID-19 outbreak.

Data sources

Study population

COVID-19 cases

Serological surveys

Fig 2. Population sampling in the Helsinki-Uusimaa region during the first 10 weeks of the serological surveys.

Laboratory methods

Development and detection of antibodies

Table 1. Percentage of seroconverted COVID-19 patients by time since symptom onset.

Statistical models and methods

Fig 3. Timeline from a SARS-CoV-2 infection to seroconversion.

Fig 4. Observations related to SARS-CoV-2 infections.

Estimation target

Models

Estimation model

Fig 5. The model for seroprevalence π(0)(t) (estimation model).

Projection model

Fig 6. The model for seroprevalence π(1)(t) (projection model).

Estimation of seroprevalence and underreporting

Ethics

Results

SARS-CoV-2 seroprevalence and the cumulative incidence of COVID-19

Table 2. COVID-19 cases and serology survey results in the Helsinki-Uusimaa region during spring 2020.

Fig 7. Seroprevalence in the Helsinki-Uusimaa region during the first wave of the COVID-19 epidemic.

Table 3. Estimated and projected seroprevalences and the underreporting ratios during the study period.

Fig 8. Posterior distributions of seroprevalence π(0)(t).

Underreporting

Fig 9. Extent of underreporting in the Helsinki-Uusimaa region during the first wave of the COVID-19 epidemic.

Fig 10. Posterior distributions of underreporting ratio Δ(t).

Time from COVID-19 symptom onset to seroconversion

Sensitivity analysis

Discussion

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Timothy J Wade

Roles

Author response to Decision Letter 0

Decision Letter 1

Timothy J Wade

Roles

Acceptance letter

Timothy J Wade

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Fig 5. The model for seroprevalence π⁽⁰⁾(t) (estimation model).

Fig 6. The model for seroprevalence π⁽¹⁾(t) (projection model).

Fig 8. Posterior distributions of seroprevalence π⁽⁰⁾(t).