Abstract
The proportion of SARS-CoV-2 infections ascertained through healthcare and community testing is generally unknown and expected to vary depending on natural factors and changes in test-seeking behaviour. Here we use population surveillance data and reported daily case numbers in the United Kingdom to estimate the rate of case ascertainment. We mathematically describe the relationship between the ascertainment rate, the daily number of reported cases, population prevalence, and the sensitivity of PCR and Lateral Flow tests as a function time since exposure. Applying this model to the data, we estimate that 20%–40% of SARS-CoV-2 infections in the UK were ascertained with a positive test with results varying by time and region. Cases of the Alpha variant were ascertained at a higher rate than the wild type variants circulating in the early pandemic, and higher again for the Delta variant and Omicron BA.1 sub-lineage, but lower for the BA.2 sub-lineage. Case ascertainment was higher in adults than in children. We further estimate the daily number of infections and compare this to mortality data to estimate that the infection fatality rate increased by a factor of 3 during the period dominated by the Alpha variant, and declined in line with the distribution of vaccines. This manuscript was submitted as part of a theme issue on “Modelling COVID-19 and Preparedness for Future Pandemics”.
Keywords: SARS-CoV-2, Ascertainment rate, Infection fatality rate
1. Introduction
Testing for SARS-CoV-2 in the UK aims to accomplish two things — first, to rapidly confirm suspected cases of COVID-19 disease via symptomatic testing in order to contain outbreak clusters, and second, to establish the overall burden of infection by taking a random sample of the population. Since not all infected individuals receive a test, and some of those who do will receive a false negative result, the number of positive diagnostic tests provides a lower estimate of the number of people exposed to the virus (Russell et al., 2020). In contrast, random testing can provide an unbiased estimate of prevalence, but is an inefficient way to rapidly identify infection clusters, and may also have biases depending on the extent to which a positive test indicates the true infection status of the individual.
Both types of data are available in the UK: the number of positive tests from people with suspected infection are published daily on the UK government dashboard (UK coronavirus dashboard, 2022a), and the Office for National Statistics COVID-19 Infection Survey (CIS) regularly publishes estimates of the population prevalence based on unbiased sampling (Office for national statistics, 2022a). The existence of these sources creates an opportunity to answer an important question: what proportion of all infections are being reported through diagnostic testing? Knowing this can help to estimate true incidence rates, a quantity central to understanding how the virus is spreading, and determine the infection fatality rate (IFR) of the disease.
Here we describe how diagnostic case numbers can be used to model the proportion of the population testing positive. By calibrating this model against surveillance data, we estimate the case ascertainment rate, defined as the proportion of infections that were reported through diagnostic testing; the incidence, defined as the number of newly infected individuals each day; and the IFR. Our differs from previous work as we do not rely on prior assumptions about the IFR to estimate incidence (Noh and Danuser, 2021, Reese et al., 2020). Moreover, we estimate the IFR using a data set substantially larger than any that has previously been used (Meyerowitz-Katz and Merone, 2020).
The model incorporates the different types of test and differences caused by variants of SARS-CoV-2, which have been shown to result in higher severity and a different range of clinical symptoms (Wang et al., 2021, Ong et al., 2021). New variants might have different pathological characteristics that could potentially affect the test-seeking behaviour of those infected, which we expect to directly affect case ascertainment. Examination of age related and regional variation in case ascertainment provides a novel way to consider these developments, and to enrich our understanding of the epidemiology of the virus.
2. Materials and methods
2.1. Data
We are primarily concerned with daily Pillar 1 and 2 case data (UK coronavirus dashboard, 2022a), hereafter referred to as diagnostic test cases, which represent tests done in health care settings and the community, respectively. We use to denote the number of Pillar 1 and 2 cases from test type on day . Here, can be PCR or LFD. These counts come from lab-based PCR tests and lateral flow device (LFD) testing, as performed in many community settings (UK coronavirus dashboard, 2022b). We use data provided for the regions of the UK ( regions of England and other nations), and the -year age bands, which we aggregate into distinct age bands to be consistent with the CIS data. Since the age bands for the Pillar 1 & 2 data are not perfectly aligned with those for the CIS data, we first distribute them into 1-year age brackets, assuming an equal distribution of cases within each bracket, before re-aggregating.
At the time of writing, the number of cases detected by test type were available for England, but not provided at the regional level or for different age bands. We therefore approximate the proportion of cases that come from each test type by partitioning the total case numbers according to the proportion calculated at the national scale.
The CIS provides estimates for the estimated percentage of people testing positive for coronavirus for the regions of England, other nations of the UK, and age bands in England. Data for nations and age bands represent samples collected over -day intervals. We take the th day as the representative time point of this estimate. Data for the nations is provided weekly and so we take it to represent the th day of the -day period. Population counts for the regions and the age categories were compiled from CIS data (Office for national statistics, 2022b).
There is uncertainty in the CIS which we transfer to our own analysis as follows. From the CIS, we use the “rate” and 95% confidence interval over a series of time points between September 2020 and June 2022 (Office for national statistics, 2022a). The exact distributions are not provided by the CIS source, so we approximate them with Normal distributions with mean equal to the CIS rate values and variance calculated to be as consistent as possible with the CIS confidence intervals. We construct a sampled time series by taking a series of samples from the series of distributions. The sampled time series of percentages is then applied to the population to give, , the total number of test-positive people, where is the midpoint of the time interval that the data represent. We repeat our analysis on time series independently constructed in this way to obtain a distribution of results.
The CIS provides an estimate of the proportion of tests that achieve different testing targets using the TaqPath test (Public Health England, 2020). We use these to estimate the proportion of infections that are from wild-type, Alpha, Delta and Omicron variants (BA.1 and BA.2 sub-lineages) of SARS-CoV-2. We consider tests that are negative for the S target gene and positive for the two other targets, known as S-gene target failure (SGTF), to be a proxy for the Alpha and Omicron BA.1 variants. Since tests that are positive for S and exactly one of the other targets (N or ORFab1) may indicate any lineage (Sanderson, 2021), we discard those that are negative on the S target and one other target from our calculation of the SGTF proportion.
Based on the time points when the SGTF proportion reached locally maximum or minimum values, we assume the variant class follows from SGTF as follows. All infections from the beginning of the pandemic until November 1st 2020, and all infections up to March 1st 2021 that are not SGTF, are of the wild-type variants. Infections that are SGTF are of the Alpha variant if they were reported between November 1st 2020 and November 1st 2021, and the Omicron BA.1 sub-lineage if they we reported after November 1st 2021. Infections that are not SGTF are assumed to be the Delta variant if they were reported between March 1st 2021 and January 9th 2022, and the Omicron BA.2 sub-lineage if they we reported after January 9th 2022. We therefore consider variant classes of interest: Wild-type, Alpha, Delta, Omicron BA.1 and omicron BA.2 whose proportions we denote using , , , , , respectively.
The number of deaths in England of individuals who have tested positive for coronavirus within 28 days is provided in 5-year age bands (UK coronavirus dashboard, 2022c). As with the case numbers, these data were first distributed into 1-year age brackets assuming an equal distribution of cases within each bracket, before being re-aggregated into age bands consistent with the CIS.
2.2. Modelling the time from exposure to the time of positive test
We define two random variables, and , representing the time an individual was exposed and the time they received a positive test, respectively. Assuming daily time steps, we express the probability that an individual who received a positive test from a sample taken at time was first exposed to the virus at time ,
| (1) |
The joint probability distribution can be pieced together from various sources by considering the sequence of events that result in an individual testing positive.
First, we consider the time the individual was exposed to the virus and acquired the infection. We denote the prior probability that the infection was acquired at time by . Next, we consider the time between exposure and the time that they received a test. For symptomatic cases we assume that the test occurs shortly after symptom onset, i.e. the time since exposure is equal to the sum of the incubation period and a delay parameter that we assume to be a fixed quantity. The subscript represents the type of test being performed, and we have chosen , and (which we later test in a sensitivity analysis).
The probability of a test on day is thus where is the probability that the duration of the incubation period is , which we assume to be Log-normal with a mean of 5.5 days and dispersion parameter 1.52 (Lauer et al., 2020, Xin et al., 2021). To get a probability distribution expressing the length of the incubation period in discrete days, we integrate over consecutive intervals of length . For simplicity, cases ascertained independently of symptoms, for example those found through screening, contact tracing, or on hospital admission, are assumed to follow the same distribution.
Once the individual has acquired the infection and has had a test, the test must be positive to become an ascertained infection. The probability of testing positive varies as a function, , of the time since exposure . We use the functions provided by Hellewell et al. (2021) and shown in Fig. 1. The PCR curve is similar in shape to the shedding profile found in other studies (He et al., 2020, Long et al., 2020, Wölfel et al., 2020, Smith et al., 2021) with viral load typically peaking at day and persisting for a mean duration of days (Cevik et al., 2020). Variation is associated with severity of illness but not age or sex (Chen et al., 2021, Yonker et al., 2021, Jones et al., 2020). Studies that look for a difference between asymptomatic and symptomatic infections do not report consistent results (Kissler et al., 2020, Long et al., 2020). While one study with a small sample found that the Alpha variant had a longer viral course than the wild type (Kissler et al., 2021a), studies generally show that shedding profiles do not differ significantly between variants (Ke et al., 2021b, Kissler et al., 2021b). In contrast, vaccination has been shown to reduce incidence of high shedding rates and duration of shedding (Kissler et al., 2021b, Pritchard et al., 2021, Ke et al., 2021a, Antonelli et al., 2021) which we address in a later section.
Fig. 1.
(A) The test sensitivity as a function of time from Hellewell et al. (2021). The function, gives the probability that a PCR test will be positive if performed on an infected person days after exposure. (B) The incubation period probability distribution from Lauer et al. (2020), shown here are the probability of symptom onset on each day since exposure.
We express the probability that an infected individual was exposed on day and tested positive on day by multiplying the probabilities mentioned above,
| (2) |
If we assume an uninformative prior probability, , of exposure on day , then substituting Eq. (2) into Eq. (1) gives
| (3) |
Substituting , and using the notation , we have
| (4) |
giving the probability distribution of time between exposure and test for the set of ascertained cases corresponding to the test types .
2.3. Estimating the ascertainment rate
We define the ascertainment rate as the proportion of SARS-CoV-2 infections that result in a positive PCR or LFD test and are recorded in the Pillar 1 & 2 case data. We introduce the time-dependent ascertainment rate , a vector whose th element, , is the proportion of infections that occurred at time that get reported through diagnostic testing at any subsequent time. We also consider the incidence, , defined as the number of newly acquired infections on day .
The number of ascertained cases that were exposed at time can be expressed in two ways: first, by multiplying the incidence by the ascertainment rate, and second by expressing the number of reported cases that were exposed on day as a function of the daily case counts. Equating the two gives
| (5) |
We estimate the number of individuals in the population who would test positive (by PCR) on day , if tested, by summing over all infections times and weighting by the probability that each one is test-positive on day
| (6) |
Combining with Eq. (5) we can express this as a function of time and the unknown vector of parameters
| (7) |
We estimate ascertainment by finding the vector that minimizes the difference between the estimated and observed values,
| (8) |
where is the set of time points for which we have empirical estimates of prevalence. Eq. (8) combines the daily diagnostic case counts, the population positivity from surveillance, the incubation period distribution, and the time-dependent test sensitivity of PCR and LFD tests, to provide an estimate of the proportion of infections being reported at time .
In practice, we estimate at weekly time points and use linear interpolation to create a daily time series. The solution to Eq. (8) is found numerically using the optimize.minimize from the scipy library in Python. The optimization is made more efficient by inputting an initial guess based on an approximation to given by
| (9) |
Note that this equation uses the reported prevalence shifted forward by days which is approximately the time since exposure of someone who received a positive test result through random surveillance. We estimate the credible interval for by substituting the upper and lower bounds of the credible interval for into Eq (8). The validity of this method is demonstrated in a supplementary analysis in which it is tested on synthetic data.
2.4. Time-independent ascertainment rate
Motivated by the possibility that variants of concern may have different pathological characteristics to each other or elicit different test-seeking responses, we estimate ascertainment rates for each variant class. Unlike the previous section these rates are constant for each variant in each region and age band. We let to denote the wild-type, Alpha, Delta, Omicron BA.1 and Omicron BA.2 variants, respectively, and where denotes the time-independent rate for variant class . Recalling that is the proportion of infections caused by variants of class , we use a revised estimate of that weights the contribution of each variant class by its proportion
| (10) |
We can then estimate by taking the value that minimizes the absolute error between and taking only the time points up to March 1st 2021 when SGTF positive tests were associated with the Delta variant. Specifically, the time-independent ascertainment rates are estimated by numerically solving
| (11) |
2.5. Infection fatality rate
The infection fatality rate (IFR) is defined as the proportion of individuals infected who then die as a direct result of the infection. For a given monthly mortality figure, we count the corresponding number of infections from summed over all corresponding exposure dates. We include a day time from exposure to death ( days to symptom onset and days between onset and death) to be consistent with previous studies and the time between peaks in case and death data in England (Hu et al., 2021). For example, the IFR for September is the number of recorded deaths in that month divided by the number of infections that occurred between August 10th and September 10th.
2.6. Effect of vaccination on ascertainment rate
We apply different test sensitivity functions to the proportion of infections that are in individuals who have received a vaccine and those who have not. Here we first describe how the proportion of infections that are in vaccinated, and unvaccinated, people is estimated. We then describe how this was accommodated into our analysis.
Vaccine effectiveness is defined as
| (12) |
It varies depending on the time since vaccination, the number of doses, the specific vaccine given, and the outcome measured e.g. infection, symptoms, hospitalization. Effectiveness is lowest when the measured outcome is infection of any kind, regardless of symptoms. This is estimated to be (56%) (Pritchard et al., 2021). We choose to use this low value to avoid underestimating the effect of vaccines; lower estimates of effectiveness results in higher proportions of infections that are subject to the effects of the vaccine.
We want to know the proportion of infections on day that are in the population of people that have received the vaccine by day , which we denote with . We have that where is the number of new infections on day and is the number of people vaccinated by day . Similarly where is the population. Substituting into Eq. (12) gives
| (13) |
Vaccination has been reported to reduce the time until viral clearance of those infected (Kissler et al., 2021b). It was reported that the time from viral peak to viral clearance was days shorter for vaccinated individuals. To model this we assume that there is no viral shedding detectable by either PCR or LFD test or more days after exposure, shortening the time to viral clearance considerably more than the reported effect to ensure we do not underestimate the effect of vaccines in this sensitivity analysis. For infections in vaccinated people we denote these modified functions using , and for the equivalent of Eq. (6) with S substituted for S. The modified version of Eq. (7) is
| (14) |
and finally the modified Eq. (8) is
| (15) |
3. Results
The percentage of cases ascertained estimated with Eq. (8) varies by time, region, and age band (Fig. 2). These results are sensitive to variation in surveillance data, particularly when infection levels are low and there is less data to inform the estimate. For example, from the time that the infection survey began until September 2020, which is not shown in the figure, the results are highly variable and occasionally produce estimates of ascertainment that are above 100%. In general, when case rates are low we see an increase in variability due to the smaller sample size, whereas when case rates are relatively high the estimated ascertainment rate becomes more reliable.
Fig. 2.
Ascertainment rate in the age bands and regions of the UK expressed as a percentage, from Eq. (8). Presented are the median and confidence intervals from the distribution of solutions to Eq. (8) over samplings of the surveillance data, . Incidence, , is shown as a percentage of the population.
Case ascertainment is related to the proportion of infections that lead to symptomatic infection. This is apparent from the low ascertainment rates observed in the lowest age categories, which are known to be less likely to develop symptoms (Poletti et al., 2021). There were notable increases after March 2021 in school age children, possibly indicating that the mass testing in that age category that coincided with school reopening caused a higher detection rate of asymptomatic infections. Similarly, since vaccination is effective at preventing infections from becoming symptomatic, the decreasing ascertainment rate seen in the two highest age bands from January to April 2021 may have resulted from vaccination in those groups.
Increases from April to June 2021 occur in every group and appear to coincide with the rise in cases of the Delta variant. While this could imply that the Delta variant is more likely to cause symptomatic infection, it could also be the result of behavioural factors as restrictions to physical contact were being removed and lateral flow tests were being more widely used. There is similarity between the time series of age bands that are close to each other, whereas changes in the ascertainment rate in any given region appears to be unaffected by neighbouring regions (Fig. S.3).
Fig. S.3.
Correlations between the ascertainment rate time series of (A) all age bands and (B) all regions. Values show the Pearson correlation coefficient, . Correlations where or are not displayed.
To compare regions, ages, and different phases of the pandemic, we consider different variant classes: the wild-type that existed before the emergence of the Alpha variant, the Alpha and Delta variants, and the BA.1 and BA.2 sub-lineages of the Omicron variant, where we have used S-gene target failure to approximate the proportion of cases belonging to each class. Modelling a different time-independent ascertainment rate for each variant provides remarkable agreement between the modelled population prevalence and the value reported by the surveillance study (Fig. S.1). The best-fit ascertainment rates are shown in Fig. 3. Differences between variants may reflect varying symptomatic responses, or they may reflect other behavioural factors that have changed over time.
Fig. S.1.
The percentage of people who would test positive if tested. Comparison of the ONS CIS to the modelled value based on reported cases and the estimated ascertainment rates given in Fig. 3.
Fig. 3.
Estimates of time-independent ascertainment rates . Presented are the median and confidence intervals from the distribution of solutions to Eq. (11) over samplings of the surveillance data, .
The ascertainment rate for the wild-type is lower than the rate for the Alpha, Delta and Omicron BA.1 variants across almost all age bands and regions of the UK. While the difference between Alpha, Delta and Omicron BA.1 is less clear, it is typically the case that ascertainment increased for the Delta variant over the Alpha and increased again for the Omicron BA.1 variant before decreasing substantially for the Omicron BA.2 variant during a time when free access to LFD and PCR tests was no longer available. Ascertainment rates are lowest in the youngest age band and increases with age up to the to band. During times when free access to testing was widely available, around 30% to 40% of infections were ascertained. This is lower than the percentage of infections that are symptomatic, estimated to be around 70% (Buitrago-Garcia et al., 2020, Sah et al., 2021), implying that a considerable number of symptomatic infections do not get ascertained.
We calculate the IFR for each month for the oldest age bands (Fig. 4). We have chosen not to show lower ages as the low numbers of deaths in these groups make the results highly variable do not provide a reliable estimate of the true IFR. Within the age bands for which data are sufficient, the IFR increases with age. The increasing trend in IFR for the two oldest age bands in November 2020 may be a combination of higher severity of the Alpha variant (Davies et al., 2021), the increased pressure on the healthcare system, or a seasonal affect on immunity. The subsequent reduction is close to what we would expect to see given that the vaccines give some protection against infection; while vaccines reduced the number of deaths considerably, they simultaneously reduced the number of infected people. For instance, using 90% effectiveness of vaccines against death and 56% against infection (Sheikh et al., 2021, Pritchard et al., 2021), one can calculate from the definition of effectiveness that the IFR of the vaccinated population should be 22% of the IFR for the unvaccinated.
Fig. 4.
Infection fatality rate (IFR). The estimated percentage of infections that cause mortality. The shaded region shows the 95% confidence interval computed by using computed from the upper and lower estimates of prevalence given by the surveillance data. Dashed lines show the population in the respective age band that had received at least one dose of a COVID-19 vaccine.
We tested the robustness of these results against reasonable changes in the assumptions of our model. Firstly, viral clearance may occur more rapidly in individuals who have been vaccinated (Kissler et al., 2021b). While we cannot model this effect precisely, making liberal assumptions about vaccine effectiveness and its effect on the test-sensitivity profile (see Section 2.6) gives results that are lower by a few percent (Fig. S.2). The most substantial effect is observed in older age bands. Similarly, the IFR presented in Fig. 4 may be an overestimate during times when vaccine coverage is high. Fig. S.4 shows the range of values that are plausible given the duration of viral shedding in vaccinated individuals.
Fig. S.2.
Ascertainment rates shown in Eq. (8) compared to the equivalent value that incorporates the effect of vaccination. Modelling assumptions have been made to provide the largest reasonable deviation from the original ascertainment estimate with the data available. Therefore it is likely that a precise treatment of vaccination in the model would yield a result within the shaded area between the curves.
Fig. S.4.
Infection fatality rates shown in Fig. 4 compared to the equivalent value that incorporates the effect of vaccination. Modelling assumptions have been made to provide the largest reasonable deviation from the original ascertainment estimate with the data available. Therefore it is likely that a precise treatment of vaccination in the model would yield a result within the shaded area between the curves.
Secondly, the model assumed a delay between symptom onset and receiving a PCR test of day. We do not have observational evidence to support this and is also reasonable. Repeating the analysis with yields a mean increase (across all age bands and variants) of 0.05 percentage points with a standard deviation of 0.89 to the results reported in Fig. 3, suggesting relatively low sensitivity to this modelling decision.
Thirdly, since the time between exposure and receiving a test in our model is based on the time of symptom onset, it does not correctly describe cases ascertained from tests that are not related to symptoms. The main route of case ascertainment in the UK was community testing, which was advised primarily for those who are experiencing symptoms (Pillar 2 constitutes around 80 to 95% of cases depending on the time period). However, if the main route of case ascertainment was instead through contact tracing or asymptomatic screening, then the methods would need to be adapted to accommodate this.
Additionally, we assumed that the proportion of cases reported from LFD tests, as opposed to PCR, for England could be applied across all age bands and regions, whereas in reality they are unlikely to be proportioned equally. Repeating our analysis under the extreme assumption that 100% of community and healthcare reported cases are from LFD tests results in an mean decrease of 0.08 percentage points with a standard deviation of 1.75, again demonstrating low sensitivity to this modelling assumption.
Finally, some empirical estimates of test sensitivity are higher than the maximum of and (Arevalo-Rodriguez et al., 2020, Brümmer et al., 2021). Repeating our analysis using an adjusted versions and that are linearly scaled so that peaks at , we find ascertainment rates increase by a mean of 7.5 percentage points with standard deviation of 3.4, suggesting that any inaccuracy in our assumption about test sensitivities could substantially affect the outcomes presented here.
4. Discussion
The extensive efforts in the United Kingdom to monitor the COVID-19 epidemic have provided the opportunity to quantify a critically important parameter – the ascertainment rate – defined as the likelihood that an infected individual will get tested and receive a positive diagnosis. Here we compared the daily reported number of cases to an unbiased estimate of population prevalence to estimate the proportion of cases that are ascertained through community testing and healthcare. We also computed the daily number of new infections and from this were able to track the infection fatality ratio across time.
Variation in case ascertainment may result from differences in clinical presentation, public perception, availability of testing, or many other possible reasons. It was revealed to be related to age, with infections in the youngest age bands being the least likely to be diagnosed. Infections related to the Alpha, Delta, and Omicron BA.1 variants were more likely to be ascertained compared to variants that were circulating earlier in the pandemic (the wild type) or during a time when access to free tests was no longer available (BA.2). The IFR showed substantial variation across time, increasing substantially into winter 2020 before declining with the distribution of vaccines.
Ascertainment appears to be dependent on the SARS-CoV-2 variant. It is not possible to determine the extent to which this variation is caused by changes in symptomatic response or by external factors that may alter the propensity of the individual to seek a test. After accounting for the effects of the different variants on the ascertainment rate, we have shown that the two data sources are largely in agreement with each other. This suggests a consistency in test-seeking behaviour over time periods of months, highlighting the reliability of the diagnostic test data as measure of epidemic severity. In general, when cases are increasing, it is because infections are increasing, not because people have become more likely to receive a tests, although changes in test-seeking do occur on longer time scales.
The challenge when comparing the trend seen in random survey data to that seen in reported community cases is that the former is a measure of prevalence and the latter a measure of incidence. Our methodology resolves this by modelling the relationship between the two. Our method is related to the deconvolution approach previously used to estimate the incidence of other infectious diseases (Brookmeyer et al., 1994). Indeed, this approach could be applied directly to the surveillance data to estimate incidence, however, it would not reflect changes in incidence that occur on a sub-weekly time-scale. Because our method utilizes the daily resolution of the case data, it captures the daily variation in incidence while achieving almost perfect consistency between the two data sources.
The estimation of infection incidence performed here offers an alternative to methods that use serological data (Shioda et al., 2021). This allows for more accurate representation of key metrics related to epidemic control such as the reproduction rate, generation time, case doubling rates, hospitalization and fatality rates. Our analysis revealed considerable variability in the IFR that goes beyond that expected from age and vaccination status alone. The three-fold increase in IFR in the age band beginning in November 2020 suggests that multiple factors contribute to the risk of death from infection and therefore there may be multiple ways to minimize mortality in future winter seasons. The subsequent decline adds to the body of evidence showing the effectiveness of vaccinations.
Our results are dependent on a number of simplifying assumptions. We have applied a model that assumes all individuals experience similar viral dynamics once infected, and the time for between exposure and receiving a test follows the same distribution regardless of age or location. We have assumed that testing occurs at the time of symptom onset plus an additional delay, however, since LFD tests are expected to be used for asymptomatic screening the time between exposure and receiving an LFD test may be shorter than we have assumed. This would particularly affect children during periods when LFD testing was widely used in schools.
We highlight that the methods here may be translated to a variety of current and future epidemiological studies. As the COVID-19 pandemic has expanded the scale and scope of health surveillance data to an unprecedented level, the methods required to parse such data, and create interpretations useful to inform decision makers and increase public awareness, need also to adapt. The methods presented here are novel, although built from established mathematical concepts, and this reflects constant requirement to re-evaluate and refresh the set of mathematical and statistical tools available to analysts as the landscape of public health continues to evolve.
CRediT authorship contribution statement
Ewan Colman: Conceptualization, Methodology, Software, Data curation, Writing – original draft, Visualization. Gavrila A. Puspitarani: Methodology, Data curation, Writing – review & editing. Jessica Enright: Conceptualization, Methodology, Writing – review & editing. Rowland R. Kao: Conceptualization, Methodology, Writing – review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
We thank Chris Banks, Anthony Wood, Paul Bessel, Thomas Doherty, Tijani Sulaimon and Gianluigi Rossi for providing feedback on this research prior to submission.
Funding
This work was supported by the Wellcome Trust [grant number 209818/Z/17/Z].
Footnotes
Supplementary material related to this article can be found online at https://doi.org/10.1016/j.jtbi.2022.111333. Validation of methods on synthetic data.
Appendix A. Supporting information
Appendix B. Supplementary data
The following is the Supplementary material related to this article.
Data availability
Data and code are available at https://github.com/EwanColman/Estimating_SARS-CoV-2_case_ascertainment.
References
- Antonelli M., Penfold R.S., Merino J., Sudre C.H., Molteni E., Berry S., Canas L.S., Graham M.S., Klaser K., Modat M., Murray B., Kerfoot E., Chen L., Deng J., Österdahl M.F., Cheetham N.J., Drew D.A., Nguyen L.H., Pujol J.C., Hu C., Selvachandran S., Polidori L., May A., Wolf J., Chan A.T., Hammers A., Duncan E.L., Spector T.D., Ourselin S., Steves C.J. Risk factors and disease profile of post-vaccination SARS-CoV-2 infection in UK users of the COVID symptom study app: a prospective, community-based, nested, case-control study. Lancet Infect. Diseases. 2021 doi: 10.1016/S1473-3099(21)00460-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arevalo-Rodriguez I., Buitrago-Garcia D., Simancas-Racines D., Zambrano-Achig P., Del Campo R., Ciapponi A., Sued O., Martinez-García L., Rutjes A.W., Low N., Bossuyt P.M., Perez-Molina J.A., Zamora J. False-negative results of initial RT-PCR assays for COVID-19: A systematic review. PLoS One. 2020;15(12):1–19. doi: 10.1371/journal.pone.0242958. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brookmeyer R., Gail M.H., Gail M.H. Oxford University Press on Demand; 1994. AIDS Epidemiology: A Quantitative Approach. [Google Scholar]
- Brümmer L.E., Katzenschlager S., Gaeddert M., Erdmann C., Schmitz S., Bota M., Grilli M., Larmann J., Weigand M.A., Pollock N.R., Macé A., Carmona S., Ongarello S., Sacks J.A., Denkinger C.M. Accuracy of novel antigen rapid diagnostics for SARS-CoV-2: A living systematic review and meta-analysis. PLoS Med. 2021;18(8):1–41. doi: 10.1371/journal.pmed.1003735. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Buitrago-Garcia D., Egli-Gany D., Counotte M.J., Hossmann S., Imeri H., Ipekci A.M., Salanti G., Low N. Occurrence and transmission potential of asymptomatic and presymptomatic SARS-CoV-2 infections: A living systematic review and meta-analysis. PLoS Med. 2020;17(9) doi: 10.1371/journal.pmed.1003346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cevik M., Tate M., Lloyd O., Maraolo A.E., Schafers J., Ho A. 2020. SARS-CoV-2, SARS-CoV-1 and MERS-CoV viral load dynamics, duration of viral shedding and infectiousness: a living systematic review and meta-analysis. SARS-CoV-1 and MERS-CoV Viral Load Dynamics, Duration of Viral Shedding and Infectiousness: A Living Systematic Review and Meta-Analysis. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen P.Z., Bobrovitz N., Premji Z., Koopmans M., Fisman D.N., Gu F.X. SARS-CoV-2 shedding dynamics across the respiratory tract, sex, and disease severity for adult and pediatric COVID-19. eLife. 2021;10 doi: 10.7554/eLife.70458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davies N.G., Jarvis C.I., Edmunds W.J., Jewell N.P., Diaz-Ordaz K., Keogh R.H. Increased mortality in community-tested cases of SARS-CoV-2 lineage B. 1.1. 7. Nature. 2021;593(7858):270–274. doi: 10.1038/s41586-021-03426-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- He X., Lau E.H., Wu P., Deng X., Wang J., Hao X., Lau Y.C., Wong J.Y., Guan Y., Tan X., et al. Temporal dynamics in viral shedding and transmissibility of COVID-19. Nat. Med. 2020;26(5):672–675. doi: 10.1038/s41591-020-0869-5. [DOI] [PubMed] [Google Scholar]
- Hellewell J., Russell T., The SAFER Investigators and Field Study Team P., et al. Estimating the effectiveness of routine asymptomatic PCR testing at different frequencies for the detection of SARS-CoV-2 infections. 2021;19:106. doi: 10.1186/s12916-021-01982-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu B., Guo H., Zhou P., Shi Z.-L. Characteristics of SARS-CoV-2 and COVID-19. Nat. Rev. Microbiol. 2021;19(3):141–154. doi: 10.1038/s41579-020-00459-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jones T.C., Mühlemann B., Veith T., Biele G., Zuchowski M., Hofmann J., Stein A., Edelmann A., Corman V.M., Drosten C. 2020. An analysis of SARS-CoV-2 viral load by patient age. [DOI] [Google Scholar]
- Ke R., Martinez P., Smith R.L., Gibson L., Achenbach C., McFall S., Qi C., Jacob J., Dembele E., Bundy C., et al. 2021. Longitudinal analysis of SARS-CoV-2 vaccine breakthrough infections reveal limited infectious virus shedding and restricted tissue distribution. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ke R., Martinez P.P., Smith R.L., Gibson L.L., Mirza A., Conte M., Gallagher N., Luo C.H., Jarrett J., Conte A., et al. 2021. Daily sampling of early SARS-CoV-2 infection reveals substantial heterogeneity in infectiousness. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kissler S.M., Fauver J.R., Mack C., Olesen S.W., Tai C., Shiue K.Y., Kalinich C.C., Jednak S., Ott I.M., Vogels C.B., Wohlgemuth J., Weisberger J., DiFiori J., Anderson D.J., Mancell J., Ho D.D., Grubaugh N.D., Grad Y.H. 2020. SARS-CoV-2 viral dynamics in acute infections. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kissler S.M., Fauver J.R., Mack C., Tai C., Breban M., Watkins A.E., Samant R., Anderson D., Ho D., Grubaugh N.D., et al. 2021. Densely sampled viral trajectories suggest longer duration of acute infection with B. 1.1. 7 variant relative to non-b. 1.1. 7 SARS-CoV-2. [Google Scholar]
- Kissler S.M., Fauver J.R., Mack C., Tai C.G., Breban M.I., Watkins A.E., Samant R.M., Anderson D.J., Metti J., Khullar G., et al. 2021. Viral dynamics of SARS-CoV-2 variants in vaccinated and unvaccinated individuals. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lauer S.A., Grantz K.H., Bi Q., Jones F.K., Zheng Q., Meredith H.R., Azman A.S., Reich N.G., Lessler J. The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: estimation and application. Ann. Internal Med. 2020;172(9):577–582. doi: 10.7326/M20-0504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Long Q.-X., Tang X.-J., Shi Q.-L., Li Q., Deng H.-J., Yuan J., Hu J.-L., Xu W., Zhang Y., Lv F.-J., et al. Clinical and immunological assessment of asymptomatic SARS-CoV-2 infections. Nat. Med. 2020;26(8):1200–1204. doi: 10.1038/s41591-020-0965-6. [DOI] [PubMed] [Google Scholar]
- Meyerowitz-Katz G., Merone L. A systematic review and meta-analysis of published research data on COVID-19 infection-fatality rates. Int. J. Infect. Diseases. 2020 doi: 10.1016/j.ijid.2020.09.1464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Noh J., Danuser G. Estimation of the fraction of COVID-19 infected people in U.S. states and countries worldwide. PLoS One. 2021;16(2):1–10. doi: 10.1371/journal.pone.0246772. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Office for national statistics, ., COVID-19 infection survey. https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/latest.
- Office for national statistics, ., Population estimates. https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates.
- Ong S.W.X., Chiew C.J., Ang L.W., Mak T.-M., Cui L., Toh M.P.H., Lim Y.D., Lee P.H., Lee T.H., Chia P.Y., et al. Clinical and virological features of SARS-CoV-2 variants of concern: A retrospective cohort study comparing b. 1.1. 7 (alpha), b. 1.315 (beta), and b. 1.617. 2 (delta) Clin. Infect. Diseases: Offi. Publ. Infect. Diseases Soc. Am. 2021 doi: 10.1093/cid/ciab721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Poletti P., Tirani M., Cereda D., Trentini F., Guzzetta G., Sabatino G., Marziano V., Castrofino A., Grosso F., Del Castillo G., et al. Association of age with likelihood of developing symptoms and critical disease among close contacts exposed to patients with confirmed sars-cov-2 infection in italy. JAMA Netw. Open. 2021;4(3) doi: 10.1001/jamanetworkopen.2021.1085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pritchard E., Matthews P.C., Stoesser N., Eyre D.W., Gethings O., Vihta K.-D., Jones J., House T., VanSteenHouse H., Bell I., et al. Impact of vaccination on new SARS-CoV-2 infections in the united kingdom. Nat. Med. 2021:1–9. doi: 10.1038/s41591-021-01410-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Public Health England, ., Investigation of novel SARS-CoV-2 variant. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/950823/Variant_of_Concern_VOC_202012_01_Technical_Briefing_3_-_England.pdf.
- Reese H., Iuliano A.D., Patel N.N., Garg S., Kim L., Silk B.J., Hall A.J., Fry A., Reed C. Estimated Incidence of Coronavirus Disease 2019 (COVID-19) Illness and Hospitalization—United States, February–September 2020. Clin. Infect. Dis. 2020;72(12):e1010–e1017. doi: 10.1093/cid/ciaa1780. arXiv:https://academic.oup.com/cid/article-pdf/72/12/e1010/38649660/ciaa1780.pdf. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Russell T.W., Golding N., Hellewell J., Abbott S., Wright L., Pearson C.A., van Zandvoort K., Jarvis C.I., Gibbs H., Liu Y., et al. Reconstructing the early global dynamics of under-ascertained COVID-19 cases and infections. BMC Med. 2020;18(1):1–9. doi: 10.1186/s12916-020-01790-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sah P., Fitzpatrick M.C., Zimmer C.F., Abdollahi E., Juden-Kelly L., Moghadas S.M., Singer B.H., Galvani A.P. Asymptomatic SARS-CoV-2 infection: A systematic review and meta-analysis. Proc. Natl. Acad. Sci. 2021;118(34) doi: 10.1073/pnas.2109229118. arXiv:https://www.pnas.org/content/118/34/e2109229118.full.pdf. URL https://www.pnas.org/content/118/34/e2109229118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sanderson T. 2021. New-variant compatibility in the ONS infection survey. https://theo.io/post/2021-01-22-ons-data/, accessed on 01/02/2021. [Google Scholar]
- Sheikh A., Robertson C., Taylor B. BNT162b2 and ChAdOx1 nCoV-19 vaccine effectiveness against death from the delta variant. N. Engl. J. Med. 2021;385(23):2195–2197. doi: 10.1056/NEJMc2113864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shioda K., Lau M.S., Kraay A.N., Nelson K.N., Siegler A.J., Sullivan P.S., Collins M.H., Weitz J.S., Lopman B.A. Estimating the cumulative incidence of SARS-CoV-2 infection and the infection fatality ratio in light of waning antibodies. Epidemiology (Cambridge, Mass.) 2021;32(4):518. doi: 10.1097/EDE.0000000000001361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith R.L., Gibson L.L., Martinez P.P., Ke R., Mirza A., Conte M., Gallagher N., Conte A., Wang L., Fredrickson R., Edmonson D.C., Baughman M.E., Chiu K.K., Choi H., Jensen T.W., Scardina K.R., Bradley S., Gloss S.L., Reinhart C., Yedetore J., Owens A.N., Broach J., Barton B., Lazar P., Henness D., Young T., Dunnett A., Robinson M.L., Mostafa H.H., Pekosz A., Manabe Y.C., Heetderks W.J., McManus D.D., Brooke C.B. Longitudinal Assessment of Diagnostic Test Performance Over the Course of Acute SARS-CoV-2 Infection. J. Infect. Diseases. 2021;224(6):976–982. doi: 10.1093/infdis/jiab337. arXiv:https://academic.oup.com/jid/article-pdf/224/6/976/40406390/jiab337.pdf. [DOI] [PMC free article] [PubMed] [Google Scholar]
- UK coronavirus dashboard, ., cases. https://coronavirus.data.gov.uk/details/cases.
- UK coronavirus dashboard, ., tests. https://coronavirus.data.gov.uk/details/tests.
- UK coronavirus dashboard, ., deaths. https://coronavirus.data.gov.uk/details/deaths.
- Wang P., Liu L., Iketani S., Luo Y., Guo Y., Wang M., Yu J., Zhang B., Kwong P.D., Graham B.S., Mascola J.R., Chang J.Y., Yin M.T., Sobieszczyk M., Kyratsous C.A., Shapiro L., Sheng Z., Nair M.S., Huang Y., Ho D.D. 2021. Increased resistance of SARS-CoV-2 variants b.1.351 and b.1.1.7 to antibody neutralization. bioRxiv. arXiv:https://www.biorxiv.org/content/early/2021/01/26/2021.01.25.428137.1.full.pdf. URL https://www.biorxiv.org/content/early/2021/01/26/2021.01.25.428137.1. [DOI] [PubMed] [Google Scholar]
- Wölfel R., Corman V.M., Guggemos W., Seilmaier M., Zange S., Müller M.A., Niemeyer D., Jones T.C., Vollmar P., Rothe C., et al. Virological assessment of hospitalized patients with COVID-2019. Nature. 2020;581(7809):465–469. doi: 10.1038/s41586-020-2196-x. [DOI] [PubMed] [Google Scholar]
- Xin H., Wong J.Y., Murphy C., Yeung A., Taslim Ali S., Wu P., Cowling B.J. The Incubation Period Distribution of Coronavirus Disease 2019: A Systematic Review and Meta-analysis. Clin. Infect. Dis. 2021 doi: 10.1093/cid/ciab501. arXiv:https://academic.oup.com/cid/advance-article-pdf/doi/10.1093/cid/ciab501/39542078/ciab501.pdf. ciab501. [DOI] [PubMed] [Google Scholar]
- Yonker L.M., Boucau J., Regan J., Choudhary M.C., Burns M.D., Young N., Farkas E.J., Davis J.P., Moschovis P.P., Kinane T.B., Fasano A., Neilan A.M., Li J.Z., Barczak A.K. Virologic features of SARS-CoV-2 infection in children. J. Infect. Diseases. 2021 doi: 10.1093/infdis/jiab509. arXiv:https://academic.oup.com/jid/advance-article-pdf/doi/10.1093/infdis/jiab509/40640056/jiab509.pdf. jiab509. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data and code are available at https://github.com/EwanColman/Estimating_SARS-CoV-2_case_ascertainment.








