Skip to main content
Wellcome Open Research logoLink to Wellcome Open Research
. 2022 Sep 22;6:127. Originally published 2021 May 25. [Version 3] doi: 10.12688/wellcomeopenres.16748.3

Revealing the extent of the first wave of the COVID-19 pandemic in Kenya based on serological and PCR-test data

John Ojal 1,2,a, Samuel P C Brand 3,4,b, Vincent Were 5, Emelda A Okiro 6, Ivy K Kombe 1, Caroline Mburu 1, Rabia Aziza 3,4, Morris Ogero 5, Ambrose Agweyu 1, George M Warimwe 1, Sophie Uyoga 1, Ifedayo M O Adetifa 1,2, J Anthony G Scott 1,2, Edward Otieno 5, Lynette I Ochola-Oyier 1, Charles N Agoti 1,7, Kadondi Kasera 8, Patrick Amoth 8, Mercy Mwangangi 8, Rashid Aman 8, Wangari Ng’ang’a 9, Benjamin Tsofa 1, Philip Bejon 1,10, Edwine Barasa 5,10, Matt J Keeling 3, D James Nokes 1,3,4
PMCID: PMC9511207  PMID: 36187498

Version Changes

Revised. Amendments from Version 2

In this version we respond to two comments from the reviewer. First, our assertion in the abstract that 30-50% attack rate in Kenya after the first wave would not be sufficiacint to prevent a further wave was not based on the expectation of a new immune-evading variant being introduced, but rather on a presumption of heterogeneity in population structure and mixing rates. We investigate this explanation in a subsequent publication. Second, the revewier called into question the comparison with a very high seroprevalence estimate from a South American study, which we agreed with and hence removed the sentence.

Abstract

Policymakers in Africa need robust estimates of the current and future spread of SARS-CoV-2. We used national surveillance PCR test, serological survey and mobility data to develop and fit a county-specific transmission model for Kenya up to the end of September 2020, which encompasses the first wave of SARS-CoV-2 transmission in the country. We estimate that the first wave of the SARS-CoV-2 pandemic peaked before the end of July 2020 in the major urban counties, with 30-50% of residents infected. Our analysis suggests, first, that the reported low COVID-19 disease burden in Kenya cannot be explained solely by limited spread of the virus, and second, that a 30-50% attack rate was not sufficient to avoid a further wave of transmission.

Keywords: SARS-CoV-2, Kenya, dynamic model, serology, PCR cases

Introduction

The potential risk from severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) to Africa was identified early in the global pandemic 1 . As the epicenter of transmission moved from East Asia to West Asia and Europe and then to North America, there was speculation as to the likely impact of the pandemic on the African continent with its young populations, high infectious disease burden, undernutrition and fragile health infrastructure. However, as health systems and economies of high-income countries strained, the reported burden of COVID-19 cases and associated deaths in Africa remained low with the exception of South Africa and Northern Africa 2 . The question is whether this is the result of lower risk due to demographic structure (young age 3 , either cross-reacting immunity (e.g. pre-existing SARS-CoV-2 cross-reactive T cells 4 ) or dampened immunological over-reaction 5 , a low reproduction number from rapidly imposed interventions (such as school closures and lockdowns 6 ), environmental conditions (e.g. temperature and humidity 7 ), or under-reporting. The reason this remains a conundrum is, at least in part, a paucity of good quality data to reveal the probable extent of SARS-CoV-2 spread in African populations.

Following the first confirmed coronavirus disease 2019 (COVID-19) case in Kenya on 13th March 2020, the Kenyan Government moved rapidly, closing international borders, schools, restaurants, bars and nightclubs, banning meetings and social gathering, and imposing a dusk to dawn curfew and movement restrictions in the two major city counties, Nairobi and Mombasa 8 . The major concerns from unmitigated spread were a limited surge capacity of the Kenyan health system 9 and groups of the Kenyan population identified as potentially highly vulnerable to infection, due to socio-economic factors such as crowded households or lack of access to handwashing, and/or severe disease, due to epidemiological factors such as higher rates of obesity and hypertension 10 . Throughout the months of April, May and into June 2020 few people in Kenya were reported SARS-CoV-2 test positive by polymerase chain reaction (PCR), or severely diseased or dying with COVID-19 as the established cause 11 . There followed a relaxation of some measures in June and July including controlled opening of restaurants and places of worship and the removal of travel restrictions into and out of Mombasa and Nairobi counties. As of 30th September 2020, there were 45,795 laboratory-confirmed positive swab tests out of over 340,000 tests (about 13.5%), and 749 deaths with a positive test result in Kenya 11 . This should be compared with the 200–250,000 cases and 30–40,000 deaths attributable to SARS-CoV-2 for similar sized countries in Europe (France, Italy, UK) by the end of September 12 .

The reason for this apparently low level of COVID-19 disease in Kenya is unknown; one possible explanation is that SARS-CoV-2 had not widely spread among the Kenyan population by the end of September. However, two pieces of information suggest that SARS-CoV-2 had already spread extensively by the end of September. First, a regionally-stratified seroprevalence study of 3098 Kenyan blood donors sampled between May and June reported a national estimate of 4.3% (adjusted to reflect the population distribution by age, sex and region) 13 . Sero-prevalence was higher in Nairobi (7.6%) and Mombasa (8.3%). These levels of seropositivity are comparable to those reported in May in the United Kingdom (UK) 14 , April/May in Spain 15 , and March/April in some United States (US) cities 16 , where high numbers of PCR-positive cases, hospitalizations and deaths have also been reported, in contrast to Kenya. Second, we noticed that test-positive PCR cases, and daily reported test-positive deaths, were declining first in Mombasa (from early July 2020) and then Nairobi (from early August 2020); respectively Kenya’s second and first largest cities. In Europe, declining case and mortality rates have been closely associated with non-pharmaceutical interventions (NPIs) 17 . However, in Kenya this went counter to evidence of increased mixing, and hence reproduction potential, arising from Google Mobility data for these cities which showed a steady reversion in mobility towards pre-COVID-19 intervention levels since early April (Fig. S1). These observations, in turn, lead to the conclusion that either a smaller than expected proportion of infected individuals have had severe disease, and/or, that there has been under-reporting of severe disease.

To investigate these findings, we developed a simple SEIR (susceptible-exposed-infectious-recovered) compartmental mechanistic and data-driven transmission model for Kenya, which integrates three sources of longitudinal data: national time series polymerase chain reaction (PCR) tests, the Kenyan serological survey and Google mobility behavioural data. The overall modelling approach is similar to Flaxman et al. 17 ; that is we use time-to-event lag distributions, and the daily incidence time series, and, both models generate the daily incidence time series using a simple deterministic transmission model with the key unknowns being initial numbers of infected individuals and R(t). Where we differ in approach from Flaxman et al. 17 is that, instead of using reported test-positive deaths as the most reliable data for inferring underlying transmission patterns, we use a combination of PCR test-positive and serological data. The PCR test-positive data informs the model on the epidemic trajectory but does not account for likely under-detection of cases. This under-detection of cases is inferred from the proportion exposed to SARS-CoV-2 evidenced by the seroprevalence estimates, hence scaling the incidence estimation. Finally, the mobility data, as a proxy for the contact rate, determines the contribution of the intervention (which acts to alter contact patterns) relative to other factors that alter incidence and the effective reproduction number, the most important of which is the susceptible proportion of the population. Our aim is to derive a coherent picture of the SARS-COV-2 epidemiology in Kenya in the first wave and reveal the historic and future patterns of spread across the country and by county. Reported deaths are not used as primary data for inference, but rather the trend in changing rates of reported deaths is used as a validation data set for model predictive accuracy (see supporting information for description of model validation). Reported deaths may be subject to substantial under-reporting, and we assume that the bias in under-reporting is consistent over time.

Results

Underlying transmission rates in Mombasa and Nairobi during the first wave

As at 30th September, a substantial proportion of PCR positive tests have been samples from the capital Nairobi (25,182 positive tests), while Kenya’s second largest city, Mombasa, has reported the next highest number of PCR positive tests (2,056). We infer that the underlying rate of new infections peaked on May 18th 2020 (CI May 16th - May 21st) in Mombasa and July 9th 2020 (CI July 7th - July 10th) in Nairobi, and subsequently declined from peak transmission ( Figure 1 H, G). The model suggests that the PCR test and serology data can be explained by the initial presence of <200 infected individuals in both Mombasa and Nairobi on 21st February, three weeks before the first reported case in Kenya. Thereafter, growth of transmission was rapid in both counties. In early March, the reproductive ratio was estimated to be 1.94 (CI 1.89-1.98) and 2.00 (CI 1.97-2.02) in Mombasa and Nairobi, respectively, with associated doubling-time of 4.84 and 4.59 days, respectively. After March, the transmission curves flattened substantially. This change is consistent with the introduction of containment measures by the Kenyan government, and evidence of substantial reduction in mobility (see Google Mobility data Fig. S1). However, we should note that there was very limited PCR testing available in Kenya before April 2020, and our estimates of R(t) pre-April 2020 rely on the assumption that R(t) dropped by ~45% in late March, in parallel to the drop in mobility data ( see Methods and supporting data).

Figure 1. SARS-CoV-2 PCR positive swab tests, seroprevalence and deaths in Nairobi and Mombasa, Kenya, with model forecasting.

Figure 1.

( A) and ( B) Weekly reported positive PCR positive swab tests (green dots) for Nairobi ( A) and Mombasa ( B), model prediction of mean weekly detection during both sampling periods when negative PCR test data was unavailable (blue curve), and available (orange curve). ( C) and ( D) Monthly seropositivity of Kenya National Blood Transfusion Service (KNBTS) blood donors in Nairobi ( C) and Mombasa ( D) (green dots), model predictions for population percentage of seropositivity (green curve), exposure to SARS-CoV-2 (red curve), and uninfected (blue curve). ( E) and ( F) Daily deaths with a positive SARS-CoV-2 test in Nairobi ( E) and Mombasa ( F) by date of death (black dots), and model prediction for daily deaths (black curve). Inset plots in ( E) and ( F) indicate cumulative reported deaths and model prediction. ( G) and ( H) Model estimates for rate of new infections per day in Nairobi ( G) and Mombasa ( H). Background shading indicates 95% central credible intervals. Dates for all graphs mark the 1st of each month.

From late April, through May and June, and into July the evidence suggests movement restrictions became steadily less effective. The waning effectiveness of movement restrictions results in an inferred increase in R(t) across Kenyan counties and an increased rate of epidemic growth ( Figure 2). The increasing R(t) estimates are broadly in line with predicted trends from Google mobility data (supporting information), although it should be noted that the R(t) estimates exhibit secondary fluctuations around the increasing mobility trend ( Figure 2). In Nairobi and Mombasa we predict that reduction in susceptibility of the population ( Figure 1C,D) caused the effective reproductive ratio (R eff ; the mean number of secondary cases accounting for reduced susceptibility) to drop significantly below the basic R value from June onwards ( Figure 2). However, other counties, where the epidemic did not establish itself as early as Mombasa and Nairobi, and where a substantial majority of the population are likely to still be susceptible, now have R(t) estimates which we estimate rebounded to the original levels estimated as occurring before Kenyan public health measures ( Figure 2).

Figure 2. Estimated basic and effective reproductive numbers in Kenya since Feb 21st 2020.

Figure 2.

The posterior mean reproductive number for Nairobi (red curves), Mombasa (green curves), and the inter-quartile range (IQR) over mean reproductive number estimates for all other Kenyan counties (blue curve and shading). Shown are both the basic reproductive numbers (expected secondary infections in a susceptible population adjusted for mobility changes since the epidemic start; solid curves), and effective reproductive numbers (expected secondary infections accounting for depletion of susceptible prevalence in the population; dotted curves). The effective reproductive number varied significantly from county to county and is not shown except for Mombasa and Nairobi. Restrictions aimed at reducing mobility in risky transmission settings (black dotted lines) are labelled in groups. The chronologically ordered restrictions in each group are: 1) First PCR-confirmed case in Kenya, suspension of all public gatherings, closure of all schools and universities, and retroactive quarantine measures for recent returnees from foreign travel, 2) suspension of all inbound flights for foreign nationals, imposition of a national curfew, and regional lockdowns of Kilifi, Kwale, Mombasa and Nairobi counties, and 3) additional no-movement restriction of worst affected areas within Mombasa and Nairobi, and, closure of the border with Somalia and Tanzania. There were two relaxation of measures in this time frame: the end of no-movement restriction to Mombasa and Nairobi, and, the resumption of international air travel.

By accounting for the delay of an average of 19 days between infection and death (supporting information for details on infection to death distribution) we find the transmission curve, estimated from PCR tests and serology, generates a good prediction of the observed trend in daily deaths in Nairobi and Mombasa ( Figure 1 E, F). We did not use mortality data in transmission model inference, therefore the good fit to the observed trend in deaths with a PCR-confirmed test result represents an out-of-sample validation of the modelling 18 . Note, it is the distribution of deaths over time, rather than the absolute numbers, that we consider to be a good fit. In accord with observations, we estimate a peak of positive PCR test samples occurred at the end of July or early August in Nairobi and earlier, mid-June, in Mombasa. The lag between transmission peak and positive swab testing peak being explained by both the delay between infection and becoming detectable by PCR, and the period after an infected individual has ceased being actively infectious but remains detectable by PCR 19 ( Figure 1 G,H and A,B). As of the end of September 2020 we estimate that about 35.4% (CI 29.0%-40.4%) of the Nairobi population, and 30.3% (CI 23.6-36.7%) of the Mombasa population were serologically positive with SARS-CoV-2, ( Figure 1 C,D). This estimated level of seropositivity is substantially higher than has been estimated in some countries that have been hit hard by the pandemic 1416 . However, they are in broad agreement with a study in Niger state, Nigeria, from June 2020 20 , as well as seropositivity rates reported from the hard-hit city of Manaus, Brazil, in May 2020 21 . Note that these estimates of seropositivity at the end of September assume both that waning seropositivity would not have had a significant effect on serological observations by late September, and furthermore that waning immunity leading to re-infection remained insignificant by late September.

SARS-CoV-2 attack rates in the first wave in Kenyan counties and the estimated crude infection-to-fatality ratio

Accounting for the sensitivity of the serological assay, and the delay between infection and seroconversion, we estimate that the actual exposure of the population to SARS-CoV-2 by September 30th was 43.3% (CI 35.3%-49.5%) in Nairobi and 37.6% (CI 29.2%-45.7%) in Mombasa ( Figure 1 C,D). Such levels of population exposure are predicted to be associated with decreased rates of new cases due to reduced numbers of susceptible individuals in these urban populations, although this is also influenced by the estimated reproductive number and effective population size at risk of exposure ( P eff ). The effective population size accounts for the impact of heterogeneity in the susceptibility, transmissibility and social interactivity in the population (supporting information for more details on effective population size in transmission modelling); for Nairobi it was inferred as 81.8% of actual population size (CI 66.7%-93.2%), for Mombasa 71.9% (CI 56.3%-86.5%). The effective population size estimates rest upon inferred variation in risk across the population. There remains a possibility of future increase in transmission if population mobility continues to rise, if population mixing patterns alter leading to changed risk heterogeneity or if immunity is short lived, leading to a rebound in reported cases. One or more of these factors could lead either to lengthening the tail after the first peak in cases/deaths, or even to a secondary increase in cases and/or deaths.

The inferred IFR crude values for both Nairobi ( IFR crude = 0.019% (CI 0.014%-0.024%) and Mombasa ( IFR crude = 0.022% (CI 0.016%-0.027%)) are substantially lower than the age-adjusted IFR expected for Kenya under full ascertainment from the age-specific IFR estimated given by Verity et al. ( IFR verity = 0.26% 22 ; and supporting information). This is a crude observational value for the infection to fatality ratio, since we do not currently have an estimate of the reporting bias of deaths of individuals infected with SARS-CoV-2. Therefore, our estimate of IFR crude potentially reflects lower detection in Kenya compared to China, as well as any lower mortality risk due to fewer comorbidities.

We extended our model-based inference to each of the 47 counties in Kenya (see dataset S1 for parameter estimates, peak time estimates and IFR crude estimates for each county). We find that, in addition to the two main Kenyan city counties, more than 25–30% of the population in each of the semi-urban counties neighbouring Nairobi (Kiambu, Kajiado, and Machakos) had been infected. However, the infection rate is predicted to be either lower than 25% and/or subject to high uncertainty in other counties (with high uncertainty defined as a prediction standard error of > 10% of county population size; Figure 3).

Figure 3. Predicting peak timing of transmission rate by Kenyan county, and forecasting of Kenya-wide PCR positive swab tests and reported deaths.

Figure 3.

( A) Posterior mean estimates for the attack rate (% of population) in each county. Solid shaded counties have a posterior standard deviation in their attack rate estimate of less than 10%, candy-stripe shaded counties have greater uncertainty associated with their attack rate estimate. ( B) Kenya total positive swab tests collected by day of sample (blue dots) with model prediction of daily positive swab test trend (red curve). ( C) Kenya total reported deaths with a positive swab test (black dots), with model prediction of reported death rates (black curve). Inset plot indicates cumulative reported deaths with model prediction of cumulative deaths. Dates on ( B) and ( C) mark 1st of the month.

Due to the lag between infection and the observability of the infected person (whether by swab PCR test, serology test, or death), we estimate that both daily PCR positive test detections and daily observed deaths attributed to COVID-19 across the two main cities, and semi-urban counties neighbouring Nairobi had a peak in early August 2020 ( Figure 3 B,C). Hospitalisation rates are not available for all Kenyan hospitals. However, sentinel clinical surveillance of severe acute respiratory infection (SARI), with or without a PCR test for SARS-CoV-2, at 14 county hospitals suggests an increasing rate of adult admissions in June and July 2020 23 . However, SARI admissions were lower in the early phase of the Kenyan epidemic than observed counts from the same months in 2018 and 2019 23 and the apparent rise in SARI admissions could represent a reversion towards pre-COVID numbers; this observation underlines the difficulties in using hospital data to understand the penetration of SARS-CoV-2 in Kenya.

Conclusions and discussion

Our modelling analysis provides a coherent account of the SARS-CoV-2 pandemic in Kenya up to end September 2020. Limitations include lacking information on the PCR testing denominators for the full time frame, the limited serological survey and that we have applied a simple dynamic model. In mitigation similar results were obtained when excluding all negative tests, and the dynamic model is transparently a fit to the data where the availability of the latter is a key strength of our study.

Our analysis suggests that 30–50% of the urban population were already exposed by the end of September, and that the first wave of the Kenya epidemic peaked in the urban and semi-urban counties during a period of relatively little restrictions or physical distancing. This level of exposure however was not sufficient to prevent a second wave which came shortly after the first (October to December 2020), which we assume to have resulted from heterogeneous spread of the virus, perhaps due to variation in population susceptibility, transmissibility or social interactivity

Whilst the full picture of the epidemiology in Kenya will not be established until cause-specific mortality data become available (e.g. from resumption of Demographic Surveillance System and verbal autopsy activities), our model, fitted to three sources of nationwide longitudinal data, suggests that the number of symptomatic COVID-19 cases reported and the mortality attributed to the SARS-CoV-2 epidemic are substantially lower in Kenya than in Europe and the USA at a similar stage of the epidemic. This would remain the case even if reported deaths accounted for just 1/10th of the true value. However, there is insufficient data for speculating on the degree of under-reporting and previous estimates of 1 in 4 deaths occurring in hospital may not be generalizable to the hospital access during the COVID-19 pandemic 24 .

Late 2020 saw the spread of COVID-19 to more rural areas of Kenya, with less infrastructure and access to public health facilities and a second wave of SARS-CoV-2. This second wave needs to be dissected and understood. Policy makers need to balance the direct and indirect health and socio-economic consequences of any control measures; a balance that becomes more precise as we develop a better understanding of SARS-CoV-2 dynamics in Kenya.

Methods

Transmission model definition

The dynamics of transmission in each Kenyan county were assumed to follow a SEIR transmission model with an effective population size parameter ( P eff ) 25 . The SEIR model with effective population size is an extension of the homogeneous SEIR model 26 with the additional flexibility that P effN out of a total population size N in each county is at risk of contracting SARS-CoV-2. P eff = 1 recovers the homogeneous SEIR model, whereas, P eff < 1, recovers the effect of underlying heterogeneity in the transmission potential and risk in the population of the county on the aggregate dynamics of epidemic. This aspect of heterogeneous models of transmission has been widely investigated, for example, in the context of comparing vaccination coverage thresholds for elimination between uniform and targeted vaccination policies 27 . In the context of the SARS-CoV-2 pandemic modelling literature, the possible role of population heterogeneity in decoupling estimates of R 0 from predictions of the "herd-immunity" threshold and final attack rate has again been identified 28, 29 . In this study, rather than make strong assumptions about the mechanism of population heterogeneity, e.g. differential susceptibility, differential rates of social mobility etc., we have taken a phenomenological approach; the effect of heterogeneity in the population was encoded in the effective population parameter P eff , and this parameter was inferred jointly with R 0. Our a priori belief was that the most probable value was P eff = 1. We assumed that P eff was constant over the period of inference.

The model dynamics for each Kenyan county were represented as a system of ordinary differential equations,

S˙(t)=γRtS(t)I(t)PeffN,E˙(t)=γRtS(t)I(t)PeffNσE,I˙(t)=σE(t)γI(t),R˙(t)=γI(t),C˙(t)=γRtS(t)I(t)PeffN.(1)

With initial conditions (time 0 is the calendar date 21st Feb 2020 and all rates are per day),

S(0)=PeffNE0I0,E(0)=E0,I(0)=I0,R(0)=0,C(0)=0.(2)

Where the dynamic variables S( t), E( t), I( t), R( t) were the numbers of susceptibles-at-risk, exposed (but not yet infectious), infectious, and, recovered individuals in the county. The full number of susceptibles in the county at any time was (1 − P eff ) N + S( t). C( t) was the cumulative numbers of infected individuals in the county.

The incubation-to-infectious rate was σ = 1 /3.1 per day, and the recovery rate was γ = 1 /2.4 per day, implying a mean generation time of 5.5 days (see Supporting information for a comparison to the generation distribution inferred by Ferretti et al. 30 ). The instantaneous reproductive ratio R t = R 0 β t decomposed into a basic reproductive ratio R 0 and an effective contact rate β t, where β t = 1 represents a pre-pandemic baseline contact rate in the population.

Transmission model inference

We used a mixed Bayesian and maximum a-posteriori (MAP) approach to parameter inference for each of the 47 Kenyan counties, based on daily observations of positive and negative PCR and serology tests in each county. The likelihood of individuals being detectable on any given day was based on whether they had been infected before that day, and, the number of days since their infection. The number of new infections on each day n, was denoted ι n . For a given set of model parameters ι n was generated by solving the ODE system ( 1), giving,

ιn=C(n+1)C(n),(3)

for each day n. Given the daily numbers of new infections, the number of people in the county on each day n who are detectable by PCR testing, denoted ( P +) n , and serological testing, ( S +) n , were given by convolving the new infection time series with the probability of (respectively) being detectable by a PCR or serological test τ days after infection, Q PCR ( τ) and Q sero ( τ):

(P+)n=[ιQPCR](n),(S+)n=[ιQsero](n).(4)

The log-likelihood function for each county has the form,

Where, ln f PCR (( ObsP +) n |( P +) n , θ OM ), and, ln f sero (( ObsS +) n |( S +) n , θ OM ), were, respectively, the log-probability of observing ( ObsP +) n PCR test-positives and ( ObsS +) n serological test positives on days n = 1,..., T given the model prediction of numbers of PCR and serological detectable people in the population, and the observation model parameters θ OM . Day n = 1 corresponded to the calendar date 21st February 2020, and, day n = T = 223 corresponded to 30th September 2020.

The underlying transmission prediction depended only on parameters relevant to infection (e.g. basic pre-measures reproductive ratio etc), however, the statistical modelling of the observation of evidence of these infections varied by type of test and availability of negative PCR test data. Together these form a likelihood function, which integrates the different data sources, since they are all, ultimately, generated by the same underlying infection process. The three statistical models of observation data were:

  • Serological tests : On each day that serological samples were collected, the log-probability of the observed number of positive tests (ln f sero (( ObsS +) n |( S +) n , θ OM )) was assumed to be that of a Beta-Binomial distribution with unbiased sampling of the underlying proportion of serologically detectable people in the county (( S +) n / N). The extra dispersion compared to a Binomial sample being due to uncertainty in the underlying sensitivity of the serological assay (see supporting information in supporting data).

  • PCR swab positive tests when no data on negative PCR tests was available : Negative PCR swab tests were not available in every county on every day of simulation. When negative swab tests were not available we assumed that the log-probability of the daily observed PCR test positives was from a Negative-binomial distribution:

μn=ptestTR(n)(P+)n(ObsP+)nNegBin(μ^=μn,α^=α).(5)
  • Where the mean number of daily observed test positives, conditional on the model prediction of PCR-detectable people in the population, is based on sampling a fraction p testTR( n) .p test was an observation parameter that was jointly inferred during inference, and TR( n) was a normalized testing rate based on nationally reported data (see supporting information in supporting data). α was a clustering factor for negative-binomial sampling, jointly inferred with other model parameters.

  • PCR swab positive tests when data on negative PCR tests was available : When both positive and negative PCR test data was available, we assumed that the fraction of positive samples reflected a biased observation of the underlying true fraction of PCR-detectable individuals in the population, e.g. being infected with SARS-CoV-2 could be expected to influence the odds of someone seeking a PCR test. We assumed that the daily detection of PCR test positives could be modelled as samples from a Beta-Binomial distribution with two parameters to infer: 1) The bias of a PCR-detectable individual being PCR tested compared to a PCR-undetectable individual ( χ), and, 2) the effective sample size parameter ( M PCR ). M PCR → ∞ recovered a Binomial distribution for the number of positive PCR tests were observed among the tests conducted that day, M PCR < ∞ allowed the model to infer much greater variance in daily proportion of test positives than would be expected from a Binomial distribution. On days where negative swab tests were available, we connect the observable status of epidemic to the data thus,

pn=χ(P+)n(χ1)(P+)n+N(ObsP+)nBetaBin(Ns^=NPCR,n,p^=pn,M^=MPCR).(6)
  • Where N PCR,n ) is the total number of PCR swab samples collected on day n and p n is the proportion of tests performed returning positive expected by the model, accounting for bias in the sampling regime. The bias parameter χ = 1 recovers an unbiased sample of PCR positives from the underlying population.

Supporting information gives further details on the data sources and the log-likelihood calculation including a full description of all observation model parameters and the functional forms and underlying evidence for Q PCR and Q sero . The data sources used were: The Kenya Ministry of Health National linelist, the Kenya Medical Research Institute Wellcome Trust Research Programme (KEMRI-WTRP) serological surveillance programme and Google mobility data 31 . The full Kenyan SARS-CoV-2 line list contains sensitive personal information that could potentially allow the identification of individual cases. The analysis performed in this study only required an aggregated dataset derived from the Kenyan linelist. Other data used in this paper was openly available. All data is available in the main text or as underlying data 32 .

We assumed that β t was piece-wise constant on days, and, therefore, could be reconstructed from daily effective contact rates ( β n ) n=1,..., T . For any fixed estimate of the effective contact rate β t, we used Hamiltonian Markov-chain Monte Carlo (HMC) 33 to estimate the posterior distribution for the transmission model parameters; that is the initial condition values ( E 0, I 0) and fixed parameters ( P eff , R 0) jointly with the observation model parameters θ OM . Prior distributions for parameters were chosen for groups of counties (e.g. largely rural counties had different priors to major urban conurbations like Nairobi and Mombasa; see supporting information for further details). Starting from an initial estimate that β t followed daily Google mobility trends 31 for the whole period, we sequentially improved our β t estimate using the expectation-maximisation (EM) algorithm 34 . The E-step corresponding to posterior distribution estimation using HMC, and the M-step corresponding to optimising the daily effective contact rate estimates ( β n ) n=41,..., T using the popular stochastic gradient descent algorithm ADAM 35 . The first 40 days of effective contact rate estimates ( β n ) n=1,...,40 were assumed to be fixed to their Google estimate; this improved identifiability jointly with R 0 and captured the observed sharp drop in mobility in response to Kenyan public health measures following the first identified case on 13th March 2020. See supporting information for further details on the use of Google mobility data and the EM algorithm method used in this study.

After inference of transmission parameters, the model implied a prediction of the expected number of daily deaths due to COVID, ( X +) n , based on an overall population infection-to-fatality ratio (IFR), and, the delay distribution between infection and death, p ID ,

E(X+)n=IFR[ιpID](n).(6)

In this study, we assume that the IFR is constant for each county over the period of inference, which allows us to construct a Bayesian estimator of the crude IFR, IFR crude , by fitting to the observed daily numbers of test-positive deaths, ( ObsX +) n (see supporting information for details and background data informing p ID ). Because the observed test-positive deaths were not used in inferring model parameters, we treat the log-predictive density of deaths from the model as an out-of-sample validation metric for the model. However, we emphasise that the out-of-sample comparison is to the trend of daily deaths, because this is invariant to the IFR crude estimator, which is itself sensitive to under-reporting of COVID deaths. Supporting information gives full details on the Bayesian model validation used in this study.

This study was approved by the Kenya Medical Research Institute Scientific and Ethics Review Unit (KEMRI-SERU) with approval numbers KEMRI/SERU/CGMR-C/203/4085 and KEMRI/SERU/CGMR-C/203/3426 for the modelling and serosurvey studies respectively.

Acknowledgements

We thank all members of Kenya’s county rapid response teams (who collected swab samples), and testing centres (who conducted the laboratory PCR assays) and of Kenya National Blood Transfusion Service Centres. This study is published with the Permission of the Director of the Kenya Medical Research Institute.

An earlier version of this article can be found on medRxiv (doi: https://doi.org/10.1101/2020.09.02.20186817)

Funding Statement

This work was supported by the Foreign, Commonwealth and Development Office and Wellcome Trust [220985/Z/20/Z]; National Institute for Health Research (NIHR) [17/63/82] using UK aid from the UK Government to support global health research; Wellcome Trust Intermediate Fellowship awards [201866, 107568] EAO, LIOO; MRC/DFID African Research Leader Fellowship [MR/S005293/1] IMOA, CM; NIHR Global Health Research Unit on Mucosal Pathogens [16/136/46] JO; DFID/MRC/NIHR/Wellcome Trust Joint Global Health Trials Award [MR/R006083/1] AA; Wellcome Trust Senior Research Fellowship [214320] and the NIHR Health Protection Research Unit in Immunisation JAGS. The views expressed in this publication are those of the author(s) and not necessarily those of any of the funding agencies.

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

[version 3; peer review: 2 approved, 1 approved with reservations]

Data availability

Underlying data

Zenodo: Revealing the extent of the first wave of the COVID-19 pandemic in Kenya based on serological and PCR-test data. https://doi.org/10.5281/zenodo.4705244 32 This project contains the following underlying data:

  • Data S4 (The number of positive, and negative where available, PCR-confirmed swab tests for each county by date of sample collection (21st Feb to 30th September)).

  • Data S5. (The number of positive and negative sero-logical results for each county by date of sample collection (21st Feb to 6th August)). This is from the Kenyan Ministry of Health National linelist.

  • Data S6. (The number of deaths with a PCR-confirmed swab test for each county by recorded date of death (21st Feb to 30th September)).

  • Data S7. (Summary data of Kenyan epidemic, including reported total number of test performed in Kenya.

  • supp material.docx (A more detailed description of the data)

Software availability

The analysis code was written in Julia language version 1.4.

References

  • 1. Gilbert M, Pullano G, Pinotti F, et al. : Preparedness and vulnerability of African countries against importations of COVID-19: a modelling study. Lancet. 2020;395(10227):871–877. 10.1016/S0140-6736(20)30411-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Cabore JW, Karamagi HC, Kipruto H, et al. : The potential effects of widespread community transmission of SARS-CoV-2 infection in the World Health Organization African Region: a predictive model. BMJ Glob Health. 2020;5(5):e002647. 10.1136/bmjgh-2020-002647 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Diop BZ, Ngom M, Biyong CP, et al. : The relatively young and rural population may limit the spread and severity of COVID-19 in Africa: a modelling study. BMJ Glob Health. 2020;5(5):e002699. 10.1136/bmjgh-2020-002699 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Braun J, Loyal L, Frentsch M, et al. : SARS-CoV-2-reactive T cells in healthy donors and patients with COVID-19. Nature. 2020;587(7833):270–274. 10.1038/s41586-020-2598-9 [DOI] [PubMed] [Google Scholar]
  • 5. Mbow M, Lell B, Jochems SP, et al. : COVID-19 in Africa: Dampening the storm? Science. 2020;369(6504):624–626. 10.1126/science.abd3902 [DOI] [PubMed] [Google Scholar]
  • 6. Hale T, Webster S, Petherick A, et al. : Oxford covid-19 government response tracker. Blavatnik School of Government.2020;25. Reference Source [Google Scholar]
  • 7. Ma Y, Zhao Y, Liu J, et al. : Effects of temperature variation and humidity on the death of COVID-19 in Wuhan, China. Sci Total Environ. 2020;724:138226. 10.1016/j.scitotenv.2020.138226 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Kenyan Ministry of Health: COVID-19 situation reports.2021. Reference Source [Google Scholar]
  • 9. Barasa E, Ouma P, Okiro E: Assessing the Hospital Surge Capacity of the Kenyan Health System in the Face of the COVID-19 Pandemic. medRxiv. 2020;1–24. 10.1101/2020.04.08.20057984 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Macharia PM, Joseph NK, Okiro EA: A vulnerability index for COVID-19: spatial analysis to inform equitable response in Kenya. medRxiv. 2020;1–26. 10.1101/2020.05.27.20113803 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Kenyan Ministry of Health: Press statement on the update of the coronvirus in the country and response measure.2020;1–3. Reference Source [Google Scholar]
  • 12. Roser M, Ritchie H, Ortiz-Ospina E, et al. : Coronavirus pandemic (COVID-19). Our Worldin Data.2020. Reference Source [Google Scholar]
  • 13. Uyoga S, Adetifa IMO, Karanja HK, et al. : Seroprevalence of anti-SARS-CoV-2 IgG antibodies in Kenyan blood donors. Science. 2021;371(6524):79–82. 10.1126/science.abe1916 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Ward H, Atchison C, Whitaker M, et al. : Antibody prevalence for SARS-CoV-2 following the peak of the pandemic in England: REACT2 study in 100,000 adults. medRxiv. 2020;1–20. 10.1101/2020.08.12.20173690 [DOI] [Google Scholar]
  • 15. Pollán MM, Pérez-Gómez B, Pastor-Barriuso R, et al. : Prevalence of SARS-CoV-2 in Spain (ENE-COVID): a nationwide, population-based seroepidemiological study. Lancet. 2020;396(10250):535–544. 10.1016/S0140-6736(20)31483-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Havers FP, Reed C, Lim T, et al. : Seroprevalence of Antibodies to SARS-CoV-2 in 10 Sites in the United States, March 23-May 12, 2020. JAMA Intern Med. 2020; 1–11. 10.1001/jamainternmed.2020.4130 [DOI] [PubMed] [Google Scholar]
  • 17. Flaxman S, Mishra S, Gandy A, et al. : Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe. Nature. 2020;584(7820):257–261. 10.1038/s41586-020-2405-7 [DOI] [PubMed] [Google Scholar]
  • 18. Gelman A, Hwang J, Vehtari A: Understanding predictive information criteria for Bayesian models. Stat Comput. 2014;24(6):997–1016. 10.1007/s11222-013-9416-2 [DOI] [Google Scholar]
  • 19. Zhou F, Yu T, Du R, et al. : Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet. 2020;395(10229):1054–1062. 10.1016/S0140-6736(20)30566-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Majiya H, Aliyu-Paiko M, Balogu VT, et al. : Seroprevalence of COVID-19 in Niger State. medRxiv. 2020;1–24. Reference Source [Google Scholar]
  • 21. Buss LF, Prete CA, Jr, Abrahim CMM, et al. : Three-quarters attack rate of SARS-CoV-2 in the Brazilian Amazon during a largely unmitigated epidemic. Science. 2021;371(6526):288–292. 10.1126/science.abe9728 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Verity R, Okell LC, Dorigatti I, et al. : Estimates of the severity of coronavirus disease 2019: a model-based analysis. Lancet Infect Dis. 2020;20(6):669–677. 10.1016/S1473-3099(20)30243-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. KEMRI-Wellcome Trust Research Programme: Status of the COVID-19 Pandemic in Kenya: Evidence from serological and clinical surveillance, and predictive modelling. Technical report,2020. Reference Source [Google Scholar]
  • 24. Ong’ayo G, Ooko M, Wang’ondu R, et al. : Effect of strikes by health workers on mortality between 2010 and 2016 in Kilifi, Kenya: a population-based cohort analysis. Lancet Glob Health. 2019;7(7):e961–e967. 10.1016/S2214-109X(19)30188-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Li M, Dushoff J, Bolker BM: Fitting mechanistic epidemic models to data: A comparison of simple Markov chain Monte Carlo approaches. Stat Methods Med Res. 2018;27(7):1956–1967. 10.1177/0962280217747054 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Keeling MJ, Rohani P: Modeling Infectious Diseases in Humans and Animals. Princeton University Press,2008. 10.2307/j.ctvcm4gk0 [DOI] [Google Scholar]
  • 27. Anderson RM, May RM, Anderson B: Infectious Diseases of Humans: Dynamics and Control.Oxford University Press, Oxford,1991. [Google Scholar]
  • 28. Aguas R, Gonçalves G, Ferreira MU, et al. : Herd immunity thresholds for SARS-CoV-2 estimated from unfolding epidemics. medRxiv. 2020;1–42. 10.1101/2020.07.23.20160762 [DOI] [Google Scholar]
  • 29. Tkachenko AV, Maslov S, Elbanna A, et al. : Persistent heterogeneity not short-term overdispersion determines herd immunity to COVID-19. medRxiv. 2020;1–10. Reference Source [Google Scholar]
  • 30. Ferretti L, Wymant C, Kendall M, et al. : Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing. Science. 2020;368(6491):eabb6936. 10.1126/science.abb6936 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Google LLC: Google COVID-19 Community Mobility Reports. Reference Source [Google Scholar]
  • 32. Brand S, Ojal: ojal/KenyaSerology: First release (Version v1.0.0). Zenodo. 2021. 10.5281/zenodo.4705244 [DOI] [Google Scholar]
  • 33. Betancourt M: A Conceptual Introduction to Hamiltonian Monte Carlo. arXiv.org.2017. Reference Source [Google Scholar]
  • 34. Dempster AP, Laird NM, Rubin DB: Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Series B. 1977;39(1):1–38. Reference Source [Google Scholar]
  • 35. Kingma DP, Ba J: Adam: A method for stochastic optimization.CoRR abs/1412.6980.2014. 10.48550/arXiv.1412.6980 [DOI] [Google Scholar]
Wellcome Open Res. 2022 Sep 23. doi: 10.21956/wellcomeopenres.20370.r52507

Reviewer response for version 3

Benjamin J Cowling 1

No further comments.

Is the work clearly and accurately presented and does it cite the current literature?

Yes

If applicable, is the statistical analysis and its interpretation appropriate?

Yes

Are all the source data underlying the results available to ensure full reproducibility?

Yes

Is the study design appropriate and is the work technically sound?

Yes

Are the conclusions drawn adequately supported by the results?

Yes

Are sufficient details of methods and analysis provided to allow replication by others?

Yes

Reviewer Expertise:

Infectious disease epidemiology

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Wellcome Open Res. 2022 Jun 27. doi: 10.21956/wellcomeopenres.19588.r51227

Reviewer response for version 2

Benjamin J Cowling 1

This is a nice study. The authors used national surveillance PCR test data, serological data and mobility data to develop and fit a county-specific transmission model for Kenya up to the end of September 2020, which encompasses the first wave of SARS-CoV-2 transmission in the country. Authors estimated that the first wave of the SARS-CoV-2 pandemic peaked before the end of July 2020 in the major urban counties, with 30-50% of residents infected.

This is an important study and likely has implications for other neighbouring countries in Africa as well. Data on COVID-19 from the African continent are very limited. I encourage indexing of this revised submission and I just had two minor comments:

  • Abstract final phrase - "further wave of transmission" do you mean specifically a further wave of transmission with an antigenically-different strain such as a new variant which can escape the population immunity that has built up? Or you mean that the first wave was controlled before population immunity reached a herd immunity threshold? Even so, if the same strain circulates again, one wouldn't expect a large wave because of the existing immunity from the first wave. Waning immunity in medium-term could also play a role in allowing subsequent epidemics.

  • Conclusions - I don't find the Manaus estimate of 75% particularly compelling due to methodological issues in that study. There should be other locations with less extreme first-year serological data?

Is the work clearly and accurately presented and does it cite the current literature?

Yes

If applicable, is the statistical analysis and its interpretation appropriate?

Yes

Are all the source data underlying the results available to ensure full reproducibility?

Yes

Is the study design appropriate and is the work technically sound?

Yes

Are the conclusions drawn adequately supported by the results?

Yes

Are sufficient details of methods and analysis provided to allow replication by others?

Yes

Reviewer Expertise:

Infectious disease epidemiology

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Wellcome Open Res. 2022 Sep 5.
James Nokes 1

We thank the reviewer for useful review and comments.

Point 1 “Abstract final phrase - "further wave of transmission" do you mean specifically a further wave of transmission with an antigenically-different strain such as a new variant which can escape the population immunity that has built up? Or you mean that the first wave was controlled before population immunity reached a herd immunity threshold? Even so, if the same strain circulates again, one wouldn't expect a large wave because of the existing immunity from the first wave. Waning immunity in medium-term could also play a role in allowing subsequent epidemics.”

We were not, at this early stage in the pandemic, suggesting a further wave from a new variant. Instead, we inferred some heterogeneity in population susceptibility, transmissibility or social interactivity, encapsulated by the phenomenological term, the effective population size at risk of exposure (Peff). This population heterogeneity put a break on virus spread in the first wave but made possible a second wave that moved into less infected sections of the population.  This was unmeasured and not well understood at the time. However, in our subsequent paper (DOI:  10.1126/science.abk0414 ) we were able to explicitly account for this heterogeneity as differences in mobility of lower (high transmission in wave 1) and higher (low transmission in wave 1) socio-economic classes, particularly in the urban setting.

No change has been made to the manuscript.

Point  2. "Conclusions - I don't find the Manaus estimate of 75% particularly compelling due to methodological issues in that study. There should be other locations with less extreme first-year serological data?"

We agree with the reviewer and revise the section of text referring to the estimate of 75% (see appended). We do already make other comparisons from serosurveys from that period from other settings including Spain, England, United States and Niger. 

The revised text is 'Our analysis suggests that 30–50% of the urban population were already exposed by the end of September, and that the first wave of the Kenya epidemic peaked in the urban and semi-urban counties during a period of relatively little restrictions or physical distancing. This level of exposure however was not sufficient to prevent a second wave which came shortly after the first (October to December 2020), which we assume to have resulted from heterogeneous spread of the virus, perhaps due to variation in population susceptibility, transmissibility or social interactivity.'

Wellcome Open Res. 2021 Nov 1. doi: 10.21956/wellcomeopenres.18470.r46411

Reviewer response for version 1

Amy Wesolowski 1

In ‘Revealing the extent of the first wave of the COVID-19 pandemic in Kenya based on serological and PCR-test data’ the authors use a range of statistical and mechanistic approaches to investigate the first wave of the pandemic in Kenya using both serological and PCR-test data. Overall, the authors have done an excellent job at integrating appropriate methods with available data sets to provide a useful investigation into transmission patterns in Kenya. Given the dearth of analyses from Sub-Saharan Africa countries, this manuscript is a welcome addition. Overall, I have few comments, the majority of which are minor and outlined below.

The authors have analyzed transmission patterns across all counties in Kenya, yet only show those for the two most populous (Nairobi and Mombasa) and a country-level result. While this is reasonable for the main text, I believe that adding in results by county to the supplementary information would help the interpretation of the country-level results. Particularly given likely differences in testing and reporting by county, a better understanding of the overall estimates of model predictions and Rt estimates by county would be incredibly valuable. For example, the authors provide the percentage infected by county in Figure 3, however these values are different to interpret in context, particularly without seeing the data and estimates by county.

In Figure 1 (C,D) the model prediction CIs are incredibly narrow, which seems surprising. Does this occur across the country? Or is this mainly due to the higher quality data in both Nairobi and Mombasa?

In Figure 2, it would be helpful (perhaps in the supplement or in this main figure) to provide context on when restrictions were lifted in addition to when they were put in place. Further, it is interesting that the IQR for the Rt estimates early in the pandemic seem exceptionally narrow. Additional elaborations on these points (is it likely due to overfitting? Some factors associated with the model fitting? Etc.) should be included. In addition, the Rt estimates for Mombasa in August/September seem substantially different than the rest of the country. Can the authors provide additional context? And do they see similar patterns across the coastal counties during this time?

The authors do an excellent job appropriately combining different data sets, which is well explained in the supplementary information. Some of these details would be incredibly helpful to move to the main text, in particular additional detail on how the authors treat the serological versus PCR data (and when there were both negative and positive PCR results) and how these two pieces of evidence are integrated.

Finally, in the supplement the authors use the phrase ‘Chinese epidemic’, but a small point that it may be more appropriate to say ‘epidemic in China’.

Is the work clearly and accurately presented and does it cite the current literature?

Yes

If applicable, is the statistical analysis and its interpretation appropriate?

Yes

Are all the source data underlying the results available to ensure full reproducibility?

Yes

Is the study design appropriate and is the work technically sound?

Yes

Are the conclusions drawn adequately supported by the results?

Yes

Are sufficient details of methods and analysis provided to allow replication by others?

Yes

Reviewer Expertise:

I am an infectious disease epidemiology focused on transmission modeling.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Wellcome Open Res. 2022 Feb 3.
James Nokes 1

In ‘Revealing the extent of the first wave of the COVID-19 pandemic in Kenya based on serological and PCR-test data’ the authors use a range of statistical and mechanistic approaches to investigate the first wave of the pandemic in Kenya using both serological and PCR-test data. Overall, the authors have done an excellent job at integrating appropriate methods with available data sets to provide a useful investigation into transmission patterns in Kenya. Given the dearth of analyses from Sub-Saharan Africa countries, this manuscript is a welcome addition. Overall, I have few comments, the majority of which are minor and outlined below.

Thank you for your kind comments on this piece of research.

The authors have analyzed transmission patterns across all counties in Kenya, yet only show those for the two most populous (Nairobi and Mombasa) and a country-level result. While this is reasonable for the main text, I believe that adding in results by county to the supplementary information would help the interpretation of the country-level results. Particularly given likely differences in testing and reporting by county, a better understanding of the overall estimates of model predictions and Rt estimates by county would be incredibly valuable. For example, the authors provide the percentage infected by county in Figure 3, however these values are different to interpret in context, particularly without seeing the data and estimates by county.

This is a very good point. In our original analysis we plotted model prediction intervals against actual data by county as part of model diagnostic. We have now improved our diagnostic visualisations to match the format of the main manuscript plots for Nairobi and Mombasa, and generated county-specific plots for model-based prediction/credible intervals for (i) PCR swab test positives, (ii) population exposure, (iii) deaths, (iv) R(t) against data (where available). All 188 plots (4 x 47 counties) are available in the data and code repository associated with this paper https://github.com/ojal/KenyaSerology .

In Figure 1 (C,D) the model prediction CIs are incredibly narrow, which seems surprising. Does this occur across the country? Or is this mainly due to the higher quality data in both Nairobi and Mombasa?

This is correct, the reasonably high model certainty about seroprevalence in Nairobi and Mombasa was because of higher data quality in the main cities in Kenya. Other counties had much wider Cis for model predicted seroprevalence, in Figure 3 we attempted to visualise this by candy-striping the county shading for counties with a posterior standard deviation in model prediction of population exposure of greater than 10%; that is the counties where a >10% deviation from the posterior mean estimate of population exposure would not be highly unexpected. We have now added population exposure plots for every county, including credible intervals for the population seropositivity.

In Figure 2, it would be helpful (perhaps in the supplement or in this main figure) to provide context on when restrictions were lifted in addition to when they were put in place.

Over the time scale this paper is concerned with (February – October 2020) there were only two significant relaxations, however, this included lifting the movement restrictions on travel out of Nairobi and Mombasa (6 th July 2020), and, therefore, was an oversight to not include in Figure 2. We have now revised Figure 2 to include the timing of relaxation of targeted movement restrictions and the reopening of international flight into Kenya (1 st August 2020). These have been added to a revised Figure 2 and mentioned in the caption of figure 2.

Further, it is interesting that the IQR for the Rt estimates early in the pandemic seem exceptionally narrow. Additional elaborations on these points (is it likely due to overfitting? Some factors associated with the model fitting? Etc.) should be included.

The early tight estimate for Rt reflected (i) fairly tight estimates that Rt ~ 1.1-1.2 in most counties in late April 2020, and (ii) our assumption that for the first 40 days of the simulation (20 th Feb 2020 – 31 st March 2020) Rt was proportional to Google data derived estimates of mixing in indoor settings outside the home. In the early stages of the epidemic in Kenya there was very limited testing capacity for detection of SARS-CoV-2 transmission in the community (first positive test result was on 12 th March 2020). This means that we were forced to make an assumption about the effective relative contact rates, because we could not infer them from epidemiological data. The Google data suggested a fairly uniform 40-45% decrease in mixing in inside settings (e.g., the workplace, etc) outside the home by mid-April 2020, by which time Rt ~ 1.1 - 1.2 in most counties. Because of our modelling assumption, this confidence in Rt in April-May 2020 was translated into confidence in Rt ~ 1.8 – 2.0 during an unobserved epidemic in February 2020.

We have added this sentence to the opening paragraph of the Underlying transmission rates in Mombasa and Nairobi during the first wave section:

“However, we should note that there was very limited PCR testing available in Kenya before April 2020, and our estimates of R(t) pre-April 2020 rely on the assumption that R(t) dropped by ~45% in late March, in parallel to the drop in mobility data (see Methods and supporting data).”

In addition, the Rt estimates for Mombasa in August/September seem substantially different than the rest of the country. Can the authors provide additional context? And do they see similar patterns across the coastal counties during this time?

This is an interesting observation. Upon inspection, other counties in coastal province (Kilifi, Lamu, Tana River and Taita Taveta, but not the Tanzanian border county Kwale) also have a spike in Rt estimates in late August/early September 2020 (see county specific plots). A possibility is that this is connected to the relaxation of movement into Mombasa, however, there is a delay of greater than 4 weeks between that relaxation and the Rt increase.

The authors do an excellent job appropriately combining different data sets, which is well explained in the supplementary information. Some of these details would be incredibly helpful to move to the main text, in particular additional detail on how the authors treat the serological versus PCR data (and when there were both negative and positive PCR results) and how these two pieces of evidence are integrated.

Thank you very much, the aim was not to overwhelm a non-specialist audience, whilst providing full details within the supporting information. We have now added a further paragraph in the Methods section which we hope makes our methodology clearer to the reader.

The underlying transmission prediction depended only on parameters relevant to infection (e.g. basic pre-measures reproductive ratio etc), however, the statistical modelling of the observation of evidence of these infections varied by type of test and availability of negative PCR test data. Together these form a likelihood function, which integrates the different data sources, since they are all, ultimately, generated by the same underlying infection process. The three statistical models of observation data were:

  • Serological tests : On each day that serological samples were collected, the log-probability of the observed number of positive tests (ln f sero (( ObsS +) n |( S +) n , θ OM )) was assumed to be that of a Beta-Binomial distribution with unbiased sampling of the underlying proportion of serologically detectable people in the county ( ( S +) n /N). The extra dispersion compared to a Binomial sample being due to uncertainty in the underlying sensitivity of the serological assay (see supporting information in supporting data).

  • PCR swab positive tests when no data on negative PCR tests was available : Negative PCR swab tests were not available in every county on every day of simulation. When negative swab tests were not available we assumed that the log-probability of the daily observed PCR test positives was from a Negative-binomial distribution:

    μ p testTR(n)( P + ) n

    (Obs P + ) ∼ NegBin( μ ^ = μ n ^  = α) (5)

Where the mean number of daily observed test positives, conditional on the model prediction of PCR-detectable people in the population, is based on sampling a fraction p test TR( n) . p test  was an observation parameter that was jointly inferred during inference, and TR(n) was a normalized testing rate based on nationally reported data (see supporting information in supporting data). α was a clustering factor for negative-binomial sampling, jointly inferred with other model parameters.

  • PCR swab positive tests when data on negative PCR tests was available : When both positive and negative PCR test data was available, we assumed that the fraction of positive samples reflected a biased observation of the underlying true fraction of PCR-detectable individuals in the population, e.g. being infected with SARS-CoV-2 could be expected to influence the odds of someone seeking a PCR test. We assumed that the daily detection of PCR test positives could be modelled as samples from a Beta-Binomial distribution with two parameters to infer: 1) The bias of a PCR-detectable individual being PCR tested compared to a PCR-undetectable individual ( χ), and, 2) the effective sample size parameter ( M PCR ). M PCR → ∞ recovered a Binomial distribution for the number of positive PCR tests were observed among the tests conducted that day, M PCR  < ∞ allowed the model to infer much greater variance in daily proportion of test positives than would be expected from a Binomial distribution. On days where negative swab tests were available, we connect the observable status of epidemic to the data thus,

    p n = χ( P + ) n / (χ-1)( P + ) n   + N  

    (Obs P + ) ∼ BetaBin(( N s) ^  N PCR,n p ^ = p n , M ^  M PCR ). (6)

Where N PCR,n  is the total number of PCR swab samples collected on day n and p n  is the proportion of tests performed returning positive expected by the model, accounting for bias in the sampling regime. The bias parameter χ=1 recovers an unbiased sample of PCR positives from the underlying population.“

Finally, in the supplement the authors use the phrase ‘Chinese epidemic’, but a small point that it may be more appropriate to say ‘epidemic in China’.

Noted, and we have changed our language in the supporting information.

Wellcome Open Res. 2021 Jul 14. doi: 10.21956/wellcomeopenres.18470.r44753

Reviewer response for version 1

Mark Kimathi 1

The ideas presented in the paper are plausible, resulting to a model, although a simple SEIR model, that is robust and adequately reproduces the observed trajectory of infections. 

It was captivating to see the authors express the force of infection in terms of the case data time series (which is as a result of "successful" contact between susceptible and infected individuals). This is arguably the only way a compartmental model can reliably reproduce the waves of infections observed in data. Also the notion of, only a proportion of the population is at risk, is a quite realistic.

There are a few typos, as indicated below:

  • On page 3, column 2, in line 10 write as...declining, first in...

  • On page 3, column 1, in line 20 write as ...SARS-CoV-2.

  • On page 3, column 1, in third paragraph at line 5 write as ...already...

Is the work clearly and accurately presented and does it cite the current literature?

Yes

If applicable, is the statistical analysis and its interpretation appropriate?

Yes

Are all the source data underlying the results available to ensure full reproducibility?

Yes

Is the study design appropriate and is the work technically sound?

Yes

Are the conclusions drawn adequately supported by the results?

Yes

Are sufficient details of methods and analysis provided to allow replication by others?

Yes

Reviewer Expertise:

Mathematical modeling of infectious disease dynamics, fluid flows and other natural phenomena using differential equations.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Wellcome Open Res. 2022 Feb 3.
James Nokes 1

We thank the reviewer for these comments. The typographical errors noted have been corrected in the revised manuscript.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Availability Statement

    Underlying data

    Zenodo: Revealing the extent of the first wave of the COVID-19 pandemic in Kenya based on serological and PCR-test data. https://doi.org/10.5281/zenodo.4705244 32 This project contains the following underlying data:

    • Data S4 (The number of positive, and negative where available, PCR-confirmed swab tests for each county by date of sample collection (21st Feb to 30th September)).

    • Data S5. (The number of positive and negative sero-logical results for each county by date of sample collection (21st Feb to 6th August)). This is from the Kenyan Ministry of Health National linelist.

    • Data S6. (The number of deaths with a PCR-confirmed swab test for each county by recorded date of death (21st Feb to 30th September)).

    • Data S7. (Summary data of Kenyan epidemic, including reported total number of test performed in Kenya.

    • supp material.docx (A more detailed description of the data)


    Articles from Wellcome Open Research are provided here courtesy of The Wellcome Trust

    RESOURCES