Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2021 Mar 3;17(3):e1008811. doi: 10.1371/journal.pcbi.1008811

Bridging the gap: Using reservoir ecology and human serosurveys to estimate Lassa virus spillover in West Africa

Andrew J Basinski 1,*, Elisabeth Fichet-Calvet 2, Anna R Sjodin 3, Tanner J Varrelman 4, Christopher H Remien 1, Nathan C Layman 3, Brian H Bird 5, David J Wolking 5, Corina Monagin 5, Bruno M Ghersi 5, Peter A Barry 6, Michael A Jarvis 7, Paul E Gessler 8, Scott L Nuismer 3
Editor: Amy Wesolowski9
PMCID: PMC7959400  PMID: 33657095

Abstract

Forecasting the risk of pathogen spillover from reservoir populations of wild or domestic animals is essential for the effective deployment of interventions such as wildlife vaccination or culling. Due to the sporadic nature of spillover events and limited availability of data, developing and validating robust, spatially explicit, predictions is challenging. Recent efforts have begun to make progress in this direction by capitalizing on machine learning methodologies. An important weakness of existing approaches, however, is that they generally rely on combining human and reservoir infection data during the training process and thus conflate risk attributable to the prevalence of the pathogen in the reservoir population with the risk attributed to the realized rate of spillover into the human population. Because effective planning of interventions requires that these components of risk be disentangled, we developed a multi-layer machine learning framework that separates these processes. Our approach begins by training models to predict the geographic range of the primary reservoir and the subset of this range in which the pathogen occurs. The spillover risk predicted by the product of these reservoir specific models is then fit to data on realized patterns of historical spillover into the human population. The result is a geographically specific spillover risk forecast that can be easily decomposed and used to guide effective intervention. Applying our method to Lassa virus, a zoonotic pathogen that regularly spills over into the human population across West Africa, results in a model that explains a modest but statistically significant portion of geographic variation in historical patterns of spillover. When combined with a mechanistic mathematical model of infection dynamics, our spillover risk model predicts that 897,700 humans are infected by Lassa virus each year across West Africa, with Nigeria accounting for more than half of these human infections.

Author summary

The 2019 emergence of SARS-CoV-2 is a grim reminder of the threat animal-borne pathogens pose to human health. Even prior to SARS-CoV-2, the spillover of pathogens from animal reservoirs was a persistent problem, with pathogens such as Ebola, Nipah, and Lassa regularly but unpredictably causing outbreaks. Machine-learning models that anticipate when and where pathogen transmission from animals to humans is likely to occur would help guide surveillance efforts and preemptive countermeasures like information campaigns or vaccination programs. We develop a novel machine learning framework that uses datasets describing the distribution of a virus within its host and the range of its animal host, along with data on spatial patterns of human immunity, to infer rates of animal-to-human transmission across a region. By training the model on data from the animal host alone, our framework allows rigorous validation of spillover predictions using human data. We apply our framework to Lassa fever, a viral disease of West Africa that is spread to humans by rodents, and use the predictions to update estimates of Lassa virus infections in humans. Our results suggest that Nigeria is most at risk for the emergence of Lassa virus, and should be prioritized for outbreak-surveillance.

Introduction

Emerging infectious diseases (EIDs) pose a persistent threat to public health. Approximately 60% of EIDs are caused by pathogens that normally circulate in wild or domestic animal reservoirs (i.e., zoonotic pathogens) [1]. Prior to full scale emergence, interaction between humans and wildlife creates opportunities for the occasional transfer, or spillover, of the zoonotic pathogen into human populations [2]. These initial spillover infections, in turn, represent newly established pathogen populations in human hosts that are subject to evolutionary pressures and may potentially lead to increased transmission among humans [2, 3]. Consequently, a key step in preempting the threat of EIDs is careful monitoring of when and where spillover into the human population occurs. However, because the majority of EIDs from wildlife originate in low and middle income regions with limited disease surveillance, accurately estimating the rate and geographical range of pathogen spillover, and therefore the risk of new EIDs, is a major challenge [1].

Machine learning techniques have shown promise at predicting the geographical range of spillover risk for several zoonotic diseases including Lassa fever [46], Ebola [7, 8], and Leishmaniases [9]. Generally, these models are trained to associate environmental features with the presence or absence of case reports in humans or the associated reservoir. Once inferred from the training process, the learned relationships between disease presence and the environment can be extended across a region of interest. Using these techniques, previous studies of Lassa fever (LF) have derived risk maps that assess the likelihood of human LF cases being present in different regions of West Africa [4, 5]. However, because these forecasts combine case-reports from both rodents and humans in the training process, they conflate attributes of the human and reservoir populations that increase spillover risk. Consequently, these approaches shed little light on aspects of reservoir or human populations that determine the magnitude of spillover at a location and thus miss opportunities to identify effective interventions.

We develop a multi-layer machine learning framework that accounts for the differences between how data involving a wildlife reservoir, and data from human serosurveys, can simultaneously inform spillover risk in people and rigorously assess whether predicted risk quantifies the rate of new infections in humans. Our approach uses machine learning algorithms that, when trained on data from the wildlife reservoir alone, estimate the likelihood that the reservoir and the zoonotic pathogen are present in an area. These predictions are then combined into a composite estimate of spillover risk to humans. Next, our framework uses estimates of human pathogen seroprevalence, as well as estimates of human population density, to translate the composite risk estimate into a prediction of the realized rate of zoonotic spillover into humans. Omitting human seroprevalence data from the training process of the risk-layer has several advantages. First, in the case of LF, due to modern transportation and the longevity of Lassa virus antibodies in humans, a general concern is that the reported location of individual cases of human disease or Lassa virus antibody detection is not the site at which the infection occurred [1012]. If the dispersal ability of the reservoir is small, training the risk layer on reservoir infections alone helps the model avoid these biases when learning the spatial variation of spillover risk. Secondly, in our framework, human seroprevalence estimates provide an ultimate test of the risk layer’s ability to correlate with spatial variation in the cumulative human exposure to the pathogen. The seroprevalence data, in turn, stem from population-based surveys at a site and are therefore much less likely to be influenced by the movement of individuals.

We apply our framework to Lassa virus (formally Lassa mammarenavirus [LASV]), a bi-segmented, single-stranded ambisense RNA virus in the Arenaviridae family and the causative agent of LF in West Africa [11, 13]. Though LASV can transmit directly between humans and often does so in hospital settings [14], rodent-to-human transmission accounts for the majority of new LASV infections [11, 15]. Specifically, the multimammate rat Mastomys natalensis is believed to be responsible for most of the transmission into the human population, either through consumption of food contaminated by rodent feces and urine or through hunting and consumption of the rodent reservoir itself [16]. What remains largely unknown, however, is the extent to which spatial patterns of spillover are driven by spatial variation in the abundance of M. natalensis and viral prevalence within M. natalensis relative to spatial variation in other contributing factors such as human behavior, housing materials, or other rodent reservoirs. An additional unknown is the true magnitude of spillover into the human population outside of the few areas in Sierra Leone and Nigeria where hospitals with Lassa diagnostic capacity exist. As a consequence, most estimates for the magnitude of Lassa virus spillover rely on longitudinal serosurveys conducted in the 1980s in Sierra Leone [17], yielding estimates of between 100,000 and 300,000 LASV infections each year across West Africa. Here, we use our framework to fill these important gaps in our current understanding of Lassa virus spillover within West Africa.

Data and study region

We used online data repositories and literature sources to collect three types of data in West Africa spanning the time-range 1970—2017: 1) capture-locations of M. natalensis, as well as occurrence locations of non-Mastomys murids; 2) locations and outcomes of LASV surveys conducted in M. natalensis; and 3) locations and measured seroprevalence of human LASV serosurveys. The focal region from which our data originate, shown in Fig 1, was chosen as the intersection of West Africa and the International Union for Conservation of Nature (IUCN) range map for Mastomys natalensis [18, 19]. Though M. natalensis is widely distributed across all of Africa, the species consists of multiple clades that likely differ in their ability to serve as hosts to LASV [20, 21]. By limiting the spatial extent of the study region to West Africa we focus on the region occupied by the A-I clade of M. natalensis that is believed to transmit LASV [22]. Our M. natalensis capture data, as well as all of the LASV survey data, originate from within this region, thus providing a discrete bound on the area of Africa in which the learned relationships of the model apply. For these analyses, this study region was divided into 0.05°x0.05° pixels (approximately 5 km by 5 km at the equator).

Fig 1. Map of the study region.

Fig 1

The dashed blue line indicates the study region from which rodent and human data originate. Dots indicate locations at which Lassa virus or arenavirus antibodies have been sampled in rodents or humans. Each rodent point shows the outcome of a serological or PCR test. Each human population point shows the location of a serosurvey.

The first two datasets generate response variables for the model layers that predict LASV risk. The human seroprevalence data are used to evaluate the combined LASV risk layer for its ability to predict LASV spillover in humans and are also used to calibrate the stage of the model that predicts human LASV spillover. Our full dataset and the script files used to fit the models are available in a github repository [23].

Mastomys natalensis presence data and background

We collected data on historical captures of M. natalensis from various sources. First, several sources were used to identify all countries of West Africa that contain M. natalensis [2426]. Next, rodent and mammal databases, as well as literature sources, were cross-referenced to fill in details regarding the year of capture, latitude/longitude coordinates, and the method of identification for each location at which M. natalensis was documented [17, 20, 2742]. Because M. natalensis is morphologically similar to other rodents in the study region (e.g., Mastomys erythroleucus), we only include those presences that have been confirmed with genetic methods or skull morphology. We found 167 locations with confirmed M. natalensis captures. All M. natalensis captures occurred in the time-range 1977—2017.

Fitting the model requires supplementing the presence-only data with background points, also called pseudo-absences [43, 44]. Background points serve as an estimate of the distribution of sampling effort for the organism being modeled [45]. We used background points chosen from locations where rodents in the family Muridae had been captured in West Africa from the Global Biodiversity Information Facility (GBIF) website [46]. We filtered the original dataset to reduce the likelihood of including M. natalensis rodents that were misidentified as M. erythroleucus and vice versa. Namely, we omit from the collection all Murid occurrences that are within the genus Mastomys. In addition, to ensure that the GBIF captures are concurrent with captures of M. natalensis, we only retained captures that occurred in the time-frame of the M. natalensis captures. Finally, we only included records that are within the study region depicted in Fig 1 and that fall outside of any pixel that contains a documented M. natalensis. The resulting GBIF dataset spans the years 1977—2015.

These data were used to categorize the subset of the pixels that contained one or more captures into two exclusive categories: those in which at least one M. natalensis had been captured (termed presences), and those with only non-Mastomys rodent occurrences (termed pseudo-absences). In total, our dataset classified 155 unique pixels as capture-positive for M. natalensis, and 252 pixels as background (Table 1).

Table 1. Summary of rodent captures used in the reservoir layer.

Country Year # Pseudoabsences # Presences
Benin 2001-2017 12 7
Burkina Faso 1977-2008 3 15
Ghana 1999-2011 13 9
Guinea 1996-2012 71 12
Guinea-Bissau 2013 1 0
Ivory Coast 1978-2010 21 8
Liberia 1980-2013 18 0
Mali 1979-2012 58 47
Niger 1977-2007 16 14
Nigeria 1977-2015 7 13
Senegal 1990-2005 0 13
Sierra Leone 1977-2014 31 17
Togo 1982 1 0
Aggregate 1977-2017 252 155

# Pseudoabsences shows the number of unique 0.05 × 0.05° pixels in the GBIF dataset for which only non-Mastomys rodents were captured. # Presences indicates the number of pixels in which one or more M. natalensis was captured.

Surveys of Mastomys natalensis for Lassa virus

We compiled a dataset that contains occurrences of LASV in rodents or humans. The dataset was established by an extensive review of LASV literature. Primary sources were found by PubMed and GenBank searches of the terms “Lassa”, “Lassa fever”, “Lassa virus”, “Lassa arenavirus”, and “Lassa mammarenavirus” [47]. Data from these primary sources was organized into an Excel workbook.

From the full LASV dataset, we collected published studies that sampled M. natalensis rodents for indicators of LASV. For each study, we found the sampling location for each tested rodent (either latitude/longitude or a locality name for which coordinates could be obtained). In total, we compiled thirteen rodent studies [17, 30, 34, 36, 39, 41, 42, 4853] that tested M. natalensis for LASV and contained latitude/longitude coordinates. The resulting test locations originate from six countries and span the years 1972–2014.

Because the prevalence of LASV in rodents varies seasonally [54], and because of the sparsity of time-series data that might otherwise allow the average LASV prevalence in rodents to be estimated, we used the collected data to broadly classify pixels into the categories “Lassa positive” or “Lassa negative”. Specifically, a pixel was defined as Lassa positive if, at some point, a M. natalensis rodent was captured within the pixel, and the rodent tested positive for LASV using a RT-PCR assay or viral isolation. Because arenavirus antibodies cross-react, a positive LASV antibody test in an individual rodent only indicates past infection with an arenavirus, and not necessarily LASV. In an effort to reduce the frequency of false positives in the training data, pixels that only contain LASV seropositive tests of rodents, and no positive LASV viral detection, were not used as training data. These criteria led to the omission of eight pixels from the training data. Fitting the model with these eight pixels included as presences is an option in the code on the github repository, but does not substantially affect the overall fit of the model [23].

Although serosurveys of rodents cannot specifically show that LASV is present, they can indicate the absence of LASV (along with all other arenaviruses). Pixels were classified as Lassa negative if five or more M. natalenis rodents in total were tested for infection with LASV by RT-PCR, or tested for any previous arenavirus exposure using a serological assay, and all rodents tested were negative. We chose a threshold of five to help reduce the chance of including false negatives (i.e. sites that have LASV but in which only non-exposed rodents were captured). This procedure allowed us to classify 62 unique pixels in total: 27 were classified as Lassa negative, and 35 were classified as Lassa positive (Table 2 and Fig 1).

Table 2. Summary of LASV positive and LASV negative pixels used in the pathogen layer.

Country Year # Pixels # Neg. Pixels # Pos. Pixels
Ghana 2010-2011 7 7 0
Guinea 2003-2014 19 6 13
Ivory Coast 2003-2013 4 3 1
Mali 2004-2012 11 7 4
Nigeria 1972-2012 6 3 3
Sierra Leone 1972-2009 15 1 14
Aggregate 1972-2014 62 27 35

Each row aggregates literature and GenBank data sources over a country. # Pos. Pixels indicates the number of unique pixels that had one or more LASV-infected rodents. # Neg. Pixels is the number of pixels in which five or more rodents were tested and found negative for LASV infection or antibody.

Human seroprevalence data

From our full LASV dataset described in the previous section, we collected literature sources that describe the prevalence of arenavirus antibodies in human populations of West Africa. As with the rodent LASV infection data, arenavirus antibodies are not specific to LASV. However, because human serosurveys were often conducted in LASV endemic areas or near documented locations of LASV-infected rodents, these serosurveys likely measured the fraction of humans with previous LASV infection, rather than exposure to another arenavirus. We required that each literature source include information on the diagnostic method that was used to test individuals (e.g., ELISA, IFA) and broad details of the survey design. We only included survey studies that were designed to estimate the seroprevalence in the local community population. This criterion excluded surveys of hospitals, for example, as well as surveys of missionaries.

Each datum contains latitude and longitude of the serosurvey, the number of individuals tested, and the number of individuals determined to have arenavirus antibodies. In total, we collected 94 serosurveys from seven studies (Fig 1) [17, 5560]. These serosurveys were conducted between 1970 and 2015 and are located in five countries in West Africa (Table 3 and Fig 1).

Table 3. Summary of human arenavirus serosurveys used in the model.

Country Year # Sites Method # Tested % Seropositive Reference
Ghana 2010-2011 10 ELISA 657 5 [57]
Guinea 2000 30 IFA 977 11 [55]
Guinea 1990-1993 28 ELISA 3276 23 [56]
Liberia 1980-1982 7 IFA 1848 5 [59]
Mali 2015 3 ELISA 600 33 [58]
Sierra Leone 1977-1983 14 IFA 5098 23 [17]
Sierra Leone 1970-1972 2 CF 255 6 [60]
Aggregate 1970-2015 94 12,711 19

Each row is an individual literature source. For each study, # Sites shows the number of locations at which arenavirus surveys were performed, # Tested indicates the total number of individuals tested across sites, and % Seropositive shows the percentage of individuals that tested positive across all sites.

Predictors

We include predictors that are broadly hypothesized to influence the distributions of M. natalensis and LASV. M. natalensis is widely distributed across sub-Saharan Africa in savanna and shrubland environments. Within such environments, M. natalensis is commonly associated with small rural communities and is considered a serious agricultural pest [19, 54]. To allow the model the possibility to learn these relationships, we include predictors that describe MODIS land cover features as predictors, and also include human population density within each pixel. We also include elevation in meters. Because climate seasonality and crop maturation affect the breeding season of M. natalensis, we include various measures of the seasonality of the vegetative index (NDVI), precipitation, and temperature [61]. See S1 Appendix for a complete list of environmental variables. LASV is often associated with M. natalensis, so we use the same set of predictors for the pathogen layer.

Methods

We developed a model that predicts the rate of LASV infection in humans within individual 0.05°x0.05° pixels across West Africa. An overview of the model framework is depicted in Fig 2. Outputs from the model are generated in two stages. The first stage uses environmental features to estimate different layers of LASV spillover risk. The layers of risk, in turn, are described by: 1) DM, a classification score indicating the likelihood that a pixel contains the primary rodent reservoir, M. natalensis, and 2) DL, a score indicating the likelihood that LASV circulates within the M. natalensis population, conditioned on the rodent being present. Depending on the layer, the response variable for this stage is generated from documented occurrences of M. natalensis (DM layer), or evidence of past LASV infection in M. natalensis (DL layer). These layers are used to define a composite layer of spillover risk DX, the product of DM and DL, that describes the likelihood that a pixel simultaneously contains M. natalensis and LASV. The second stage of our framework uses a generalized linear model to regress the estimates of human arenavirus seroprevalence onto the DX layer. Lastly, we used an epidemiological model to estimate human incidence from the predictions of seroprevalence.

Fig 2. Overview of the model.

Fig 2

Ellipses represent datasets, circles represent models, and rectangles represent model predictions.

LASV risk layers

Each risk layer of the first stage is generated by a separate boosted classification tree (BCT). The BCT, in turn, uses environmental features within a pixel to infer a classification score, between zero and one, that indicates how likely it is that the pixel is positive for M. natalensis (DM layer) or LASV in M. natalensis (DL layer). BCTs use a stage-wise learning algorithm that, at each stage, trains a new tree model to the residuals of the current model iteration. Each newly fitted tree is added to the ensemble model, thereby reducing the residual deviance between the model predictions and a training set [62]. Boosted trees are commonly used in species and disease distribution models because they are simultaneously resistant to over-fitting in scenarios where many feature variables are implemented and are also able to model complex interactions among features [63].

Prior to inclusion in the model-fitting procedure, each feature variable was vetted for its ability to distinguish between presences and absences in each of the layers. Specifically, for each risk layer’s binary response variable, we performed a Mann-Whitney U-test on each candidate feature. In doing so, we test the null hypothesis that the distribution of a feature is the same between pixels that are classified as a presence or (pseudo) absence. We only include predictors for which the null hypothesis is rejected at the α = 0.05 level.

For a given training set, we fit the BCT model using the gbm.step function of the “dismo” package in the statistical language R [64]. This specific function uses 10-fold cross-validation to determine the number of successive trees that best model the relationship between response and features without over-fitting the data [64]. The learning rate parameter, which determines the weight given to each successive tree, was set to small values (DM: 10−2, DL: 10−3) that encourage a final model that is composed of many small incremental improvements. A smaller learning rate was used in the DL layer because the corresponding dataset was smaller. The parameter that describes the maximum number of allowable trees was set to a large value (107) to ensure that the cross-validation fitting process was able to add trees until no further improvement occurred [62].

For the DM layer, we trained 25 boosted classification trees to learn how environmental predictors influence the suitability of a habitat for M. natalensis. Each model was fit by selecting 155 presence pixels and pairing these with 155 background pixels in which only non-Mastomys murids were found. Both presences and background pixels were chosen with replacement. By choosing equal numbers of presences and background pixels for each training set, we encourage each model to learn patterns in features that allow presences to be discriminated from background pixels, rather than having the model learn the (likely biased) distribution of presences and background pixels that are available in the overall dataset [44].

For each model fit for the DM layer, presence and pseudo-absence pixels that were not used to train the model (i.e., out-of-bag data) were used to test the model using the area-under-the-receiver-curve (AUC). The AUC measures a classifier’s ability to assign a high classification score to presences, and a low score to background pixels. A score of one indicates a perfect classifier, and a score of 0.5 indicates a classifier that is no better than chance. A pairwise-distance sampling scheme was used to pair an equal number of test-background pixels to the out-of-bag presences that together comprise the test set. Specifically, for each test presence point, the pairwise distance sampling method chooses a test background point so that the minimum spatial distance between the training presences and test presence is similar to the minimum distance between the test background point and training presences [65]. Compared to random selection of test background points, pairwise distance sampling oftentimes results in a lower AUC score that more accurately measures the model’s ability to generalize to new regions [65].

The DL layer is generated by the averaged predictions of 25 boosted classification tree models, each of which is trained to discriminate between pixels that are Lassa positive or Lassa negative. We trained each model on a dataset comprised of 27 absence locations and 27 presence locations, sampled from the full dataset with replacement. The estimation of error in the DL layer is similar to that described in the DM layer. Specifically, we calculate the AUC of the fitted model on an equal number of out-of-bag presences and absences.

Next, we combined the DM and DL layers into a composite feature, denoted by DX, that is indicative of whether a pixel simultaneously has environmental features that are suitable for M. natalensis, as well as LASV in M. natalensis. The combined feature is defined as DX = DM × DL and summarizes the realized risk of LASV spillover to humans within the local environment.

Connection to human seroprevalence and incidence

To connect the new risk parameter DX to human arenavirus seroprevalence, and to evaluate the ability of the DX layer to explain historical LASV spillover in humans, we regressed seroprevalence from human arenavirus serosurveys on the DX layer and an intercept. In doing so, we test whether human seroprevalence is significantly associated with the probability DX that a pixel contains LASV-infected M. natalensis. We used quasi-binomial regression to account for overdispersion in seroprevalence measurements that could otherwise contaminate hypothesis tests on model coefficients [66]. More details on the motivation behind the quasi-binomial regression can be found in the S1 Appendix. In the regression, each seroprevalence estimate is weighted by the number of individuals tested in the serosurvey.

Next, we used an epidemiological model, based on the classic susceptible-infected-recovered framework, to derive an equation that relates a given LASV spillover rate into humans and the resulting seroprevalence in a human population. Throughout, we assume that the seroprevalence measures that were obtained from historical serosurveys describe LASV infection at steady state (i.e., are unchanging in time). This derivation, in turn, is used to translate the regression model’s predictions of LASV seroprevalence into spillover infections per year in humans. For the model, we employ several assumptions: 1) humans within each 0.05x0.05° pixel constitute a closed population with constant per-capita death rate d. To facilitate steady-state analysis, we assume that new individuals are born in the pixel at a density-independent birth rate b. Within each pixel, humans are compartmentalized into three non-overlapping classes: susceptible (S), infected with LASV (I), and recovered from LASV infection (R). The size of the human population is assumed to be large enough so that stochastic events (LASV extinction) do not occur. 2) All LASV infections in humans are caused by contact with infectious rodents. Though human-to-human transmission of LASV is common in nosocomial outbreaks, rodent-to-human transmission is believed to be the primary pathway by which the virus is spread outside of hospital environments [15]. 3) Susceptible humans become infected with LASV at a constant rate FS, where F denotes the rate of infectious contact between a human and infected M. natalensis (i.e., the force of infection). Any seasonal fluctuation in the contact rate between humans and rodents, as well as fluctuation in the prevalence of LASV infection in rodents, is assumed to average out over the decades-long timescales we consider. LASV-infected humans transition out of the infected class at per-capita rate γ; a fraction μ die from illness associated with Lassa Fever. 4) The remaining fraction 1 − μ of individuals recover from infection and gain immunity from LASV.

The duration of LASV immunity in humans is not fully understood. Studies suggest that LASV immunity is the result of a combination of antibodies and a cell-mediated immune response [17, 67]. Anecdotal cases have shown that LASV IgG antibodies can remain in the blood of individuals for decades [10]. However, other studies have indicated that the level of LASV antibodies, as well as the extent to which an individual is protected against subsequent LASV infection, can wane with time [17, 67]. Preliminary analyses indicated that the possibility of waning immunity substantially influenced our model’s estimates of LASV infections per year. Because of this uncertainty, we model the general scenario in which recovered individuals lose immunity to LASV at per-capita rate λ and transition back into the susceptible class. This more general model structure includes the scenario of lifelong immunity in the case that λ = 0.

Within each pixel across West Africa, the assumptions above lead to a system of equations that describes the number of humans in each of the classes:

dSdt=b-dS-FS+λR,dIdt=FS-dI-γI,dRdt=γ(1-μ)I-dR-λR. (1)

We assume that, within each pixel, the dynamical system given by System (1) is at steady state. Consequently, the net rate of mortality is equal to the constant birth rate b, and each of the classes S, I, and R are not changing with time. The corresponding steady-state values are found by setting the left-hand-side of Eq (1) to zero, and solving the resulting algebraic equations for each state variable. This yields the steady-state values

S*=b(γ+d)(d+λ)dλ(γ+d+F)+d(γ+d)(d+F)+γFλμ,I*=bF(d+λ)dλ(γ+d+F)+d(γ+d)(d+F)+γFλμ,R*=bγF(1-μ)dλ(γ+d+F)+d(γ+d)(d+F)+γFλμ. (2)

At steady state, the total population size in a pixel is P* = S* + I* + R*. We can write P* in terms of the model parameters by plugging in the steady-state values given by Eq (2):

P*=bγλ+d2+d(γ+F+λ)+F(γ+λ-γμ)dλ(γ+d+F)+d(γ+d)(d+F)+γFλμ. (3)

By dividing R* by the total population size at steady state, P*, we derive an equation for the steady-state seroprevalence, denoted Ω*:

Ω*=γF(1-μ)γλ+d2+d(γ+F+λ)+F(γ+λ-γμ). (4)

Now we solve for the total LASV spillover rate FS, given that the steady-state LASV seroprevalence is Ω*. Solving Eq (4) for F in terms of Ω* yields:

F=-Ω*(γ+d)(d+λ)Ω*d+γ(-Ω*μ+Ω*+μ-1)+Ω*λ. (5)

The rate of new infections is given by

ηFS*=P*Ω*(d+γ)(d+λ)γ(1-μ). (6)

These analyses were derived using Mathematica. The notebook file is available in the github repository [23].

By substituting our prediction of human LASV seroprevalence for Ω*, we can estimate the total human infection rate using Eq (6). Calculating these estimates requires values for d, γ, μ, λ, and P*. We chose parameters that are broadly in line with the epidemiology of LASV and the demography of humans in West Africa.

We use values of death rate d derived from country-specific lifespan estimates obtained from WorldBank [68]. For a pixel within a given country, d is set to be the reciprocal mean lifespan of that country’s 2018 life expectancy at birth. Studies indicate that the duration of LASV infection is typically about one month, so that γ = 12 yr−1 across all pixels [11]. LASV infection causes mortality in a fraction μ = 0.02 of non-nosocomial infections [17].

The rate of seroreversion is difficult to estimate empirically. McCormick et al. (1987) estimated that λ = 0.064 yr−1 using a longitudinal study of IgG immune markers in individuals. However, it is unclear whether their results indicated true seroreversion, or whether the reduction of LASV immune markers below detectable levels made it appear as though seroreversion occurred. To better understand the potential consequences of seroreversion in our infection-rate estimates, we focus on two scenarios. In the first, any individual that has recovered from LASV infection remains seropositive for the remainder of their life (λ = 0 yr−1). In the second scenario, seroreversion occurs at the rate estimated by McCormick et al. (1987) (λ = 0.064 yr−1). In this case, an individual recovered from LASV is assumed to produce antibodies and maintain LASV immunity for an average duration of 15.6 years. We use the unprocessed WorldPop 2020 population data (described in S1 Appendix) as an estimate of the steady-state population size, P*, within each pixel of the original 0.0083° resolution.

Results

LASV risk layers

The DM layer is constructed by averaging the predictions of 25 boosted classification tree models. Across all 25 bootstrap fits, the average out-of-bag AUC was 0.68, with a standard deviation of 0.05. This AUC indicates that the model has a modest ability to correctly discriminate pixels in which M. natalensis has been captured from background pixels, and is similar to out-of-bag AUC scores obtained in another study with a similar assessment criterion [5]. The algorithm assigned a high likelihood of occurrence to regions with a strong seasonal pattern of vegetation as well as specific levels of rainfall (S1 Appendix). Across 25 fitted models that made up the DL layer, the average AUC was 0.85, with a standard deviation of 0.08. This indicates a model that is good at discriminating between Lassa presences and absences. The algorithm primarily used precipitation contingency to determine whether or not a pixel is suitable for endemic LASV in M. natalensis (S1 Appendix).

Fig 3A–3C show maps of each of the fitted risk layers, as well as the combined layer of realized risk, DX. As indicated by the IUCN range map for M. natalensis [19], most countries of West Africa are predicted to harbor this primary rodent reservoir of LASV (Fig 3A). However, the rodent is predicted to be less prevalent along coastal areas of West Africa and southern Nigeria. Similar to other Lassa risk maps [4, 5], our DL layer predictions indicate that the risk of LASV in rodents is primarily concentrated in the eastern and western extremes of West Africa (Fig 3B). The combined risk, shown in Fig 3C, indicates that environmental features suitable for rodent-to-human LASV transmission are primarily located in Sierra Leone, Guinea, and Nigeria.

Fig 3. Calculating the combined risk layer.

Fig 3

(A) Map shows the likelihood that each 0.05° pixel in West Africa contains the primary reservoir of Lassa virus, M. natalensis. Pink dots indicate locations of captures that were used to train the model. Black line indicates the IUCN M. natalensis range map. (B) Predicted distribution of Lassa virus in M. natalensis. Dots indicate locations in which M. natalensis were surveyed for the virus. (C) Combined risk, defined as the product of the above two layers.

Connection to human seroprevalence and spillover

A quasi-binomial regression indicated a significant, positive association between the combined LASV risk predictor DX, and the human arenavirus seroprevalence measured in serosurveys (coefficient: 1.50, p = 0.000123, Fig 4). The model also indicated the presence of substantial overdispersion in the human seroprevalence dataset (φ = 15.1). More information on the GLM output can be found in the S1 Appendix. By applying the general linear model to the combined LASV risk layer, we extrapolate the human LASV seroprevalence across West Africa (Fig 5). Our results indicate that human LASV seroprevalence is greatest in the eastern and western regions of West Africa, with especially high seroprevalence in Central Guinea, Sierra Leone, and Nigeria.

Fig 4. Human arenavirus seroprevalence vs the combined risk layer.

Fig 4

Each circle represents a different serosurvey. The size of the circle indicates the number of humans that were tested. Solid black line shows the quasi-binomial prediction of seroprevalence, and the red dashed lines show the 95% confidence intervals. Confidence intervals were obtained by fitting the model 1000 times on random samples taken from the dataset with replacement.

Fig 5. Predicted human seroprevalence of Lassa virus in West Africa.

Fig 5

Dots show locations of human serosurveys, and dot color indicates the residual of the predicted seroprevalence. White dots indicate locations for which measured seroprevalence fell within 0.1 of the prediction. Measured seroprevalence at red dots was 0.1 or more greater than that predicted, and seroprevalence at blue dots was 0.1 or more below the prediction.

Furthermore, by assuming that our predictions are representative of LASV infection at steady state, we can derive the number of LASV infections per year in humans. If the DX layer accurately describes the spatial heterogeneity of LASV seroprevalence in humans, and if LASV antibody production upon recovery is lifelong, our framework estimates that 897,700 new human infections occur each year. Between 664,300–843,800 (i.e., 74–94%) of these infections are expected to be sub-clinical or asymptomatic, leaving 53,900–233,400 infections that might require hospitalization [17]. Given that 2% of all infections result in fatality, our estimates imply that 18,000 individuals die of Lassa Fever in West Africa each year. Though our model does not account for differences of LASV risk by sex or age, research suggests that hospitalizations may be skewed towards females, and fatalities will be biased towards individuals under 29 years of age but not skewed by gender [69].

Table 4 shows the number of LASV infections per year by country, ordered by number of infections. Our predictions indicate that more than half of new human LASV infections (531,700) in West Africa will occur in Nigeria (Fig 6). This distribution of LASV infection is largely due to the greater population size within Nigeria, as the per person spillover rates do not differ dramatically between countries (Table 4). After Nigeria, Ghana (60,200 infections per year) and the Ivory Coast (57,700 infections per year), respectively, are predicted to have the highest incidence of human LASV infections. Sierra Leone, Nigeria, and Guinea are predicted to have the highest per-capita rates of LASV infection (Table 4).

Table 4. Predicted annual number of Lassa virus infections and infection rate.

Country 1000’s of infections Rate
Nigeria 531.7 2.6
Ghana 60.2 2.0
Ivory Coast 57.7 2.3
Niger 46.9 2.0
Burkina Faso 44.4 2.1
Mali 44.3 2.2
Guinea 35.0 2.5
Benin 27.0 2.2
Sierra Leone 20.7 2.9
Togo 17.9 2.2
Liberia 9.9 2.0
Mauritania 1.0 1.9
Senegal 0.8 2.0

Infection rate is in units of number of infections per year per 1000 people. Estimates in the table are derived assuming seroreversion and reinfection do not occur.

Fig 6. Predicted spatial density of Lassa virus infections in humans.

Fig 6

Map shows the predicted infections per km2. Yellow colors, representing a high number of infections, tend to occur in areas with high human population density and a high predicted seroprevalence.

The above estimates are based on the premise that, upon recovery from LASV infection, an individual produces antibodies for the remainder of their life. If, instead, LASV antibody production ceases after an average of 15.6 years as suggested by some longitudinal serosurveys [17], then a given level of seroprevalence implies almost five times as many infections compared to the scenario with lifelong antibody production. Specifically, allowing for seroreversion and subsequent LASV reinfection in the model implies 4,383,600 infections occur each year. Inclusion of reinfection does not change the ranking of countries in Table 4.

Discussion

Machine learning approaches that forecast the spatial risk of emerging infectious diseases such as Lassa virus are often not designed to explain how aspects of the environment translate into realized pathogen spillover into human populations [4, 5]. Models that specifically predict attributes of the reservoir from the environment, and use these predictions to quantify spillover into humans, offer a more mechanistic understanding of the current and future spatial variation in human disease [70]. Our forecasting framework advances these approaches by generating predictions of spillover risk based only on data from the primary rodent reservoir of LASV, and rigorously assessing our risk predictions on realized human spillover as measured by human arenavirus serosurveys. As indicated by a generalized linear regression, our reservoir-based model of spillover risk explains a modest and statistically significant amount of the spatial variation in human arenavirus seroprevalence.

Using this framework, we are able to generate predictions of the number of new LASV infections within different regions of West Africa. Our results indicate that Nigeria contributes the greatest number of new human infections each year, and that the magnitude of new infections in Nigeria is driven primarily by its greater human population density, rather than an increased per-capita risk. An assumption that drives this result is the density-dependent form of spillover in the model (i.e., FS), in which rodent-human interactions increase with human population density. This form is appropriate if rodent interactions are well-mixed in the human population. For example, if increases in human density were reflected in an increased number of humans per dwelling, then the LASV risk posed by single rodent in a household would increase with human population size. If these assumptions are correct, Nigeria is likely to represent the greatest risk of LASV emergence because the large number of annual spillover events allows for extensive sampling of viral strain diversity and repeated opportunities for viral adaptation to the human population [71].

Our approach allows us to highlight the regions that contribute most to pathogen spillover, and suggest locations for further surveillance. Our model indicates that the highest per-capita risk to humans occurs in Sierra Leone, Guinea, and Nigeria. Given the data that are currently available, our model suggests that these countries should be prioritized for surveillance of LASV emergence in rodents and at-risk human populations. Human serosurveys of the general population are notably lacking in Nigeria, but have the potential to clarify the true magnitude of LASV spillover in West Africa. Although it is known that certain broad regions of West Africa have a long history of LASV spillover (e.g., Sierra Leone, Guinea, Nigeria), relatively little is known about the prevalence of LASV in rodents or humans in other regions (e.g., Togo, Benin, Mali, Burkina Faso). Our model suggests that Lassa virus infections occur regularly in these under-sampled areas. Human serosurveys and rodent LASV testing from these regions could help modeling approaches clarify the spatial distribution of Lassa fever across West Africa.

In addition to identifying the regions most at risk for viral emergence, our model framework provides updated estimates for the rate of LASV spillover across West Africa. Previous estimates of 100,000–300,000 infections per year were based on longitudinal studies from communities in Sierra Leone conducted in the 1980s [17]. Using seroprevalence data from studies across West Africa, our model predicts between 897,700–4,383,600 LASV infections in humans occur each year. As demonstrated by past research focused on estimating LASV infection in humans, where the true value lies within this range depends on whether or not seroreversion and subsequent LASV reinfection are regular features of human LASV epidemiology, and therefore reinforces the need to better understand the scope for LASV reinfection [72]. It is important to realize that our predictions include both symptomatic and asymptomatic infections. Thus, because many human LASV infections result in mild flu-like symptoms or are asymptomatic, it is unsurprising that our predicted values exceed the reported number of confirmed LF cases in Nigeria [73, 74]. Several factors may contribute to the discrepancy between previous estimates of LASV spillover, and our revised estimates. McCormick et al. (1987) used seroconversion data from a 15 month period to infer a rate of LASV infection across West Africa. However, the population of West Africa has increased by a factor of 2.4 since that time, making these estimates outdated [75]. Furthermore, our estimates are based on human seroprevalence data that comes from five countries in West Africa and spans a 45 year time period. Because our dataset was obtained from a broader spatial and temporal range, our estimates are less likely to be biased by sporadic extremes in LASV spillover.

Accurate risk predictions could help guide risk-reduction and behavior-change communication campaigns, the distribution of future human LASV vaccines, and countermeasures directed at the rodent reservoir. In addition to vaccines that prevent infection in humans, new vaccine designs are currently being investigated for various wildlife pathogens as well, including pathogens in rodents [76, 77]. Wildlife vaccination campaigns that use vaccine baits have proven to be effective in the control of rabies in red fox (Vulpes vulpes) over large land areas, but require substantial planning and surveillance of the reservoir population [78]. Rodent population management could be another method of attenuating the risk of LASV in an area. Pinpointing areas that are most in need of spillover intervention will help overcome the logistical challenges that are associated with vaccine distribution to humans or wildlife on large scales. In addition to guiding intervention to specific regions, mechanistic forecasts similar to ours could help plan the logistics of such operations.

Our framework sheds light on the connection between LASV spillover in humans and the environmental conditions favorable to pathogen and reservoir. The reservoir layer of our model identified strong seasonal trends in vegetation (NDVI) as the primary explanatory variable that determines where the rodent M. natalensis occurs. This builds on other work that identified properties of vegetation as important predictors of the range of M. natalensis [5]. In conjunction with a strong seasonality of vegetation, our model identified a range of mean and maximum rainfall values that limit the distribution of the LASV reservoir. This is in line with previous ecological studies showing that seasonal patterns of precipitation and vegetation are important drivers of seasonal breeding in M. natalensis [79]. Our model indicates that M. natalensis do not occur in areas associated with too much rainfall or areas without a clear wet/dry seasonality, resulting in a lower risk of LASV spillover in coastal areas of West Africa and southern Nigeria. The pathogen layer of our model also indicates that strong seasonal precipitation patterns are the leading environmental feature that is associated with LASV in M. natalensis and the main driver of the LASV’s occurrence in only western and eastern West Africa. Though the mechanism by which rainfall affects viral prevalence is unclear, it has been hypothesized, for example, that wetter conditions might facilitate the virus’ ability to survive outside the host [4].

Our model of spillover risk predicts a significant, but small amount of the spatial variation in arenavirus seroprevalence studies in humans. The modest relationship between human LASV spillover and predicted risk might be due to the binary classifiers’ coarse description of the magnitude of LASV risk. As more longitudinal data become available, these binary models can be upgraded with more nuanced models that predict the time-varying density of M. natalensis and the prevalence of LASV among the rodent population. Alternatively, the low correlation could indicate that other predictors like human factors have a large influence on LASV spillover. Geographic differences in housing, cultural practices, and diet likely influence the extent of LASV spillover but are not included in our model. For instance, the use of rodent-proof housing materials (e.g., concrete vs mud) and abstaining from rodent hunting and consumption are known to affect the extent to which LASV is able to transmit between rodents and humans [16, 80]. The residuals of seroprevalence predictions from our model could help guide understanding of which human factors mitigate or facilitate LASV spillover. If human factors like housing type can be readily identified from serosurvey locations within West Africa, they could be incorporated in the human stage of the model that connects spillover risk to human seroprevalence.

Geographic variation in LASV and its primary reservoir may also be responsible for the modest fit of our model. For instance, across West Africa LASV consists of several clades [22]. If certain clades are better at infecting humans, then our model will tend to underestimate the rate of human infections in regions where such highly-infectious clades occur. Similarly, the M. natalensis reservoir is also divided into multiple clades [20]. Different M. natalensis clades may differ in their contact rates with humans or in their suitability as reservoir, further reducing our model’s ability to predict spillover into humans. Some evidence for this latter possibility comes from arenavirsues that preferentially infect certain clades of M. natalensis [21]. Because our study region only includes West Africa, it is likely that the M. natalensis occurrences that our model is trained on are only from the A-I clade [20]. However, our forecast should be interpreted with caution in eastern Nigeria, where the transition zone into the A-II clade occurs. Future work integrating these factors may help improve our understanding of the spatial variation in human seroprevalence that is due to the spatial patterning of LASV and reservoir clades.

Another factor that could influence our model fit is the possibility that rodent species other than M. natalensis serve as reservoirs or interact with the primary reservoir in ways that decrease or increase risk. Though M. natalensis is believed to be the primary reservoir that contributes to human infection, several species of rodents are known to be capable of harboring the virus [48]. Understanding the relationship between the habitat suitability of different rodent reservoirs and human LF burden may help determine whether M. natalensis is the host at which intervention strategies should always be directed. Furthermore, other species of rodent may displace M. natalensis and therefore lower the overall spillover risk of LASV into humans. The layered framework we have developed can be easily adapted to include additional reservoir species and systematically investigate these possibilities.

Our model is constructed to learn and explain spatial variation in the average historical spillover of LASV, and does not include temporal trends of spillover risk. Due to the sparsity of available longitudinal data, our model assumes that the human population in West Africa, human LASV seroprevalence, and the rate of LASV spillover, are all constant in time. Over decades-long timescales, the rate of LASV spillover is likely increasing due to increasing rates of human-rodent interaction that come with urban growth, deforestation, or climate change [11, 70]. Estimating the combined temporal and spatial variation of infection will require long-term longitudinal studies in both rodents and humans across West Africa. With this data, for example, more advanced models could mechanistically associate an increasing rate of spillover with changes to land cover.

Another important temporal simplification of our current modeling work is the absence of seasonality in LASV spillover. In Sierra Leone, Guinea, and Nigeria, hospital admissions attributable to LASV infection generally peak late in the dry season [54, 69, 81]. In these regions, the mechanism of seasonal spillover likely involves a combination of seasonal rainfall and land use practices, such as crop-harvesting and subsequent burning of agricultural fields, that drive rodents into domestic dwellings in search of food-stuffs [54, 82]. It is not understood whether these factors operate uniformly across all of West Africa. Temporal fluctuations in the density of the reservoir population, due to seasonal cycles of reproduction, are another potentially important factor that could drive a seasonal spike of human LF cases. However, it is unclear whether the density fluctuations that have been observed outside of the LASV geographic range (e.g., Tanzania [79]) also occur within West Africa. At least in Guinea and Sierra Leone, research on the population dynamics of M. natalensis indicates that density fluctuations are much weaker than those in East Africa [83]. In the case of rodent vaccination, understanding population dynamics is particularly important because distributing vaccines at seasonal population lows in wildlife demographic cycles can, in theory, substantially increase the probability of pathogen elimination [83, 84].

Although the methods we have used here make efficient use of available data, the accuracy of our risk forecasts remains difficult to rigorously evaluate due to the limited availability of current data from human populations across West Africa. The sparseness of modern human data arises for two reasons: 1) the lack of robust surveillance and testing across much of the region where LASV is endemic and 2) the absence of publicly available databases reporting human cases in those countries that do have relatively robust surveillance in place (i.e., Nigeria). Improving surveillance for LASV across West Africa and developing publicly available resources for sharing the resulting data would allow more robust risk predictions to be developed and facilitate risk reducing interventions. Despite these limitations of existing data, the structured machine-learning models we develop here provide insight into what aspects of environment, reservoir, and virus, contribute to spillover, and the potential risk of subsequent emergence into the human population. By understanding these connections, we can design and deploy more effective intervention and surveillance strategies that work in tandem to reduce disease burden and enhance global health security.

Supporting information

S1 Appendix. Details on the predictors used in the model and model fits.

(PDF)

Data Availability

Our full data-set and the script files used to fit the models are available in the github repository: https://github.com/54481andrew/pathogen-spillover-forecast.

Funding Statement

Funding was provided by DARPA grant no. D18AC00028 (to BHB, PAB, MAJ, SLN) and NIH grant no. R01GM122079 (to SLN). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Jones KE, Patel NG, Levy MA, Storeygard A, Balk D, Gittleman JL, et al. Global trends in emerging infectious diseases. Nature. 2008;451(7181):990–993. 10.1038/nature06536 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Plowright RK, Parrish CR, McCallum H, Hudson PJ, Ko AI, Graham AL, et al. Pathways to zoonotic spillover. Nat Rev Microbiol. 2017;15(8):502–510. 10.1038/nrmicro.2017.45 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Hughes JM, Wilson ME, Pike BL, Saylors KE, Fair JN, LeBreton M, et al. The origin and prevention of pandemics. Clin Infect Dis. 2010;50(12):1636–1640. 10.1086/652860 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Fichet-Calvet E, Rogers DJ. Risk maps of Lassa fever in West Africa. PLoS Negl Trop Dis. 2009;3(3):e388. 10.1371/journal.pntd.0000388 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Mylne AQ, Pigott DM, Longbottom J, Shearer F, Duda KA, Messina JP, et al. Mapping the zoonotic niche of Lassa fever in Africa. Trans R Soc Trop Med Hyg. 2015;109(8):483–492. 10.1093/trstmh/trv047 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Pigott DM, Deshpande A, Letourneau I, Morozoff C, Reiner RC Jr, Kraemer MU, et al. Local, national, and regional viral haemorrhagic fever pandemic potential in Africa: a multistage analysis. Lancet. 2017;390(10113):2662–2672. 10.1016/S0140-6736(17)32092-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Pigott DM, Golding N, Mylne A, Huang Z, Henry AJ, Weiss DJ, et al. Mapping the zoonotic niche of Ebola virus disease in Africa. Elife. 2014;3:e04395. 10.7554/eLife.04395 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Pigott DM, Millear AI, Earl L, Morozoff C, Han BA, Shearer FM, et al. Updates to the zoonotic niche map of Ebola virus disease in Africa. Elife. 2016;5:e16412. 10.7554/eLife.16412 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Pigott DM, Bhatt S, Golding N, Duda KA, Battle KE, Brady OJ, et al. Global distribution maps of the leishmaniases. Elife. 2014;3:e02851. 10.7554/eLife.02851 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Bond N, Schieffelin JS, Moses LM, Bennett AJ, Bausch DG. A historical look at the first reported cases of Lassa fever: IgG antibodies 40 years after acute infection. Am J Trop Med Hyg. 2013;88(2):241–244. 10.4269/ajtmh.2012.12-0466 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Gibb R, Moses LM, Redding DW, Jones KE. Understanding the cryptic nature of Lassa fever in West Africa. Pathog Glob Health. 2017;111(6):276–288. 10.1080/20477724.2017.1369643 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Peterson TA, Moses LM, Bausch DG. Mapping transmission risk of Lassa fever in West Africa: the importance of quality control, sampling bias, and error weighting. PLoS One. 2014;9(8):e100711. 10.1371/journal.pone.0100711 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Maes P, Alkhovsky SV, Bào Y, Beer M, Birkhead M, Briese T, et al. Taxonomy of the family Arenaviridae and the order Bunyavirales: update 2018. Arch Virol. 2018;163(8):2295–2310. 10.1007/s00705-018-3843-5 [DOI] [PubMed] [Google Scholar]
  • 14. Fisher-Hoch S, Tomori O, Nasidi A, Perez-Oronoz G, Fakile Y, Hutwagner L, et al. Review of cases of nosocomial Lassa fever in Nigeria: the high price of poor medical practice. BMJ. 1995;311(7009):857–859. 10.1136/bmj.311.7009.857 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Iacono GL, Cunningham AA, Fichet-Calvet E, Garry RF, Grant DS, Khan SH, et al. Using modelling to disentangle the relative contributions of zoonotic and anthroponotic transmission: the case of Lassa fever. PLoS Negl Trop Dis. 2015;9(1):e3398. 10.1371/journal.pntd.0003398 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Ter Meulen J, Lukashevich I, Sidibe K, Inapogui A, Marx M, Dorlemann A, et al. Hunting of peridomestic rodents and consumption of their meat as possible risk factors for rodent-to-human transmission of Lassa virus in the Republic of Guinea. Am J Trop Med Hyg. 1996;55(6):661–666. 10.4269/ajtmh.1996.55.661 [DOI] [PubMed] [Google Scholar]
  • 17. McCormick JB, Webb PA, Krebs JW, Johnson KM, Smith ES. A prospective study of the epidemiology and ecology of Lassa fever. J Infect Dis. 1987;155(3):437–444. 10.1093/infdis/155.3.437 [DOI] [PubMed] [Google Scholar]
  • 18.United Nations. Geographical Regions; 2020. https://unstats.un.org/unsd/methodology/m49/#geo-regions.
  • 19.Granjon L. The IUCN red list of threatened species 2016: e.T12868A115107375. 2016;.
  • 20. Colangelo P, Verheyen E, Leirs H, Tatard C, Denys C, Dobigny G, et al. A mitochondrial phylogeographic scenario for the most widespread African rodent, Mastomys natalensis. Biol J Linn Soc. 2013;108(4):901–916. 10.1111/bij.12013 [DOI] [Google Scholar]
  • 21. Gryseels S, Baird SJ, Borremans B, Makundi R, Leirs H, Goüy de Bellocq J. When viruses don’t go viral: the importance of host phylogeographic structure in the spatial spread of arenaviruses. PLoS Pathog. 2017;13(1):e1006073. 10.1371/journal.ppat.1006073 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Olayemi A, Fichet-Calvet E. Systematics, Ecology, and Host Switching: Attributes Affecting Emergence of the Lassa Virus in Rodents across Western Africa. Viruses. 2020;12(3):312. 10.3390/v12030312 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Basinski AJ. Pathogen Spillover Forecast; 2020. Github repository https://github.com/54481andrew/pathogen-spillover-forecast.git.
  • 24. Wilson DE, Reeder DM. Mammal species of the world: a taxonomic and geographic reference. vol. 1. JHU Press; 2005. [Google Scholar]
  • 25.Happold D, Happold D. Mammals of Africa. Volume III: Rodents, hares and rabbits. 2013;.
  • 26. Granjon L, Duplantier JM, Catalan J, Britton-Davidian J. Systematics of the genus Mastomys (Thomas, 1915)(Rodentia: Muridae). A review. Belgian Journal of Zoology (Belgium). 1997;. [Google Scholar]
  • 27.Van de Perre F, Adriaensen F, Terryn L, Pauwels O, Leirs H, Gilissen E, et al. African mammalia; 2019. http://projects.biodiversity.be/africanmammalia.
  • 28.Centre de Biologie pour la Gestion des Populations. Database on Sahelo-Sudanian rodents; 2017. http://vminfotron-dev.mpl.ird.fr/bdrss/bdrsspub_form.php.
  • 29. Brouat C, Loiseau A, Kane M, Bâ K, Duplantier JM. Population genetic structure of two ecologically distinct multimammate rats: the commensal Mastomys natalensis and the wild Mastomys erythroleucus in southeastern Senegal. Mol Ecol. 2007;16(14):2985–2997. 10.1111/j.1365-294X.2007.03353.x [DOI] [PubMed] [Google Scholar]
  • 30. Coulibaly-N’Golo D, Allali B, Kouassi SK, Fichet-Calvet E, Becker-Ziaja B, Rieger T, et al. Novel arenavirus sequences in Hylomyscus sp. and Mus (Nannomys) setulosus from Côte d’Ivoire: implications for evolution of arenaviruses in Africa. PloS One. 2011;6(6):e20893. 10.1371/journal.pone.0020893 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Dobigny G, Nomao A, Gautun J. A cytotaxonomic survey of rodents from Niger: implications for systematics, biodiversity and biogeography. Mammalia. 2002;66(4):495–524. 10.1515/mamm.2002.66.4.495 [DOI] [Google Scholar]
  • 32. Duplantier JM, Britton-Davidian J, Granjon L. Chromosomal characterization of three species of the genus Mastomys in Senegal. J Zool Syst Evol Res. 1990;28(4):289–298. 10.1111/j.1439-0469.1990.tb00383.x [DOI] [Google Scholar]
  • 33. Granjon L, Duplantier JM. Les rongeurs de l’Afrique sahélo-soudanienne; 2009. [Google Scholar]
  • 34. Kronmann KC, Nimo-Paintsil S, Guirguis F, Kronmann LC, Bonney K, Obiri-Danso K, et al. Two novel arenaviruses detected in pygmy mice, Ghana. Emerg Infect Dis. 2013;19(11):1832. 10.3201/eid1911.121491 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Lecompte E, Brouat C, Duplantier JM, Galan M, Granjon L, Loiseau A, et al. Molecular identification of four cryptic species of Mastomys (Rodentia, Murinae). Biochemical Systematics and Ecology. 2005;33(7):681–689. 10.1016/j.bse.2004.12.015 [DOI] [Google Scholar]
  • 36. Lecompte E, Fichet-Calvet E, Daffis S, Koulémou K, Sylla O, Kourouma F, et al. Mastomys natalensis and lassa fever, West Africa. Emerg Infect Dis. 2006;12(12):1971. 10.3201/eid1212.060812 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Olayemi A, Obadare A, Oyeyiola A, Fasogbon S, Igbokwe J, Igbahenah F, et al. Small mammal diversity and dynamics within Nigeria, with emphasis on reservoirs of the Lassa virus. System Biodivers. 2018;16(2):118–127. 10.1080/14772000.2017.1358220 [DOI] [Google Scholar]
  • 38. Robbins CB, Krebs JW Jr, Johnson KM. Mastomys (Rodentia: Muridae) species distinguished by hemoglobin pattern differences. Am J Trop Med. 1983;32(3):624–630. 10.4269/ajtmh.1983.32.624 [DOI] [PubMed] [Google Scholar]
  • 39. Yadouleton A, Agolinou A, Kourouma F, Saizonou R, Pahlmann M, Bedié SK, et al. Lassa Virus in Pygmy Mice, Benin, 2016–2017. Emerg Infect Dis. 2019;25(10):1977. 10.3201/eid2510.180523 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Olayemi A, Akinpelu A. Diversity and distribution of murid rodent populations between forest and derived savanna sites within south western Nigeria. Biodivers Conserv. 2008;17(10):2411. 10.1007/s10531-008-9389-1 [DOI] [Google Scholar]
  • 41. Safronetz D, Lopez JE, Sogoba N, Traore SF, Raffel SJ, Fischer ER, et al. Detection of Lassa virus, Mali. Emerg Infect Dis. 2010;16(7):1123. 10.3201/eid1607.100146 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Safronetz D, Sogoba N, Lopez JE, Maiga O, Dahlstrom E, Zivcec M, et al. Geographic distribution and genetic characterization of Lassa virus in sub-Saharan Mali. PLoS Negl Trop Dis. 2013;7(12). 10.1371/journal.pntd.0002582 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Elith J, Graham CH, Anderson RP, Dudík M, Ferrier S, Guisan A, et al. Novel methods improve prediction of species’ distributions from occurrence data. Ecography. 2006;29(2):129–151. 10.1111/j.2006.0906-7590.04596.x [DOI] [Google Scholar]
  • 44. Barbet-Massin M, Jiguet F, Albert CH, Thuiller W. Selecting pseudo-absences for species distribution models: how, where and how many? Methods Ecol Evol. 2012;3(2):327–338. 10.1111/j.2041-210X.2011.00172.x [DOI] [Google Scholar]
  • 45. Phillips SJ, Dudík M, Elith J, Graham CH, Lehmann A, Leathwick J, et al. Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. Ecol Appl. 2009;19(1):181–197. 10.1890/07-2153.1 [DOI] [PubMed] [Google Scholar]
  • 46.GBIF occurrence download; 2020-10-30. 10.15468/dl.tbe47y. [DOI]
  • 47. Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, et al. GenBank. Nucleic Acids Res. 2017;45(D1):D37–D42. 10.1093/nar/gkw1070 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Fichet-Calvet E, Becker-Ziaja B, Koivogui L, Günther S. Lassa serology in natural populations of rodents and horizontal transmission. Vector Borne Zoonotic Dis. 2014;14(9):665–674. 10.1089/vbz.2013.1484 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Fichet-Calvet E, Ölschläger S, Strecker T, Koivogui L, Becker-Ziaja B, Camara AB, et al. Spatial and temporal evolution of Lassa virus in the natural host population in Upper Guinea. Scientific reports. 2016;6(1):1–6. 10.1038/srep21977 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Leski TA, Stockelman MG, Moses LM, Park M, Stenger DA, Ansumana R, et al. Sequence variability and geographic distribution of Lassa virus, Sierra Leone. Emerg Infect Dis. 2015;21(4):609. 10.3201/eid2104.141469 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Monath TP, Newhouse VF, Kemp GE, Setzer HW, Cacciapuoti A. Lassa virus isolation from Mastomys natalensis rodents during an epidemic in Sierra Leone. Science. 1974;185(4147):263–265. 10.1126/science.185.4147.263 [DOI] [PubMed] [Google Scholar]
  • 52. Olayemi A, Oyeyiola A, Obadare A, Igbokwe J, Adesina AS, Onwe F, et al. Widespread arenavirus occurrence and seroprevalence in small mammals, Nigeria. Parasites Vectors. 2018;11(1):416. 10.1186/s13071-018-2991-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Wulff H, Fabiyi A, Monath T. Recent isolations of Lassa virus from Nigerian rodents. Bull World Health Organ. 1975;52(4-6):609. [PMC free article] [PubMed] [Google Scholar]
  • 54. Fichet-Calvet E, Lecompte E, Koivogui L, Soropogui B, Doré A, Kourouma F, et al. Fluctuation of abundance and Lassa virus prevalence in Mastomys natalensis in Guinea, West Africa. Vector Borne Zoonotic Dis. 2007;7(2):119–128. 10.1089/vbz.2006.0520 [DOI] [PubMed] [Google Scholar]
  • 55. Kernéis S, Koivogui L, Magassouba N, Koulemou K, Lewis R, Aplogan A, et al. Prevalence and risk factors of Lassa seropositivity in inhabitants of the forest region of Guinea: a cross-sectional study. PLoS Negl Trop Dis. 2009;3(11). 10.1371/journal.pntd.0000548 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Lukashevich I, Clegg J, Sidibe K. Lassa virus activity in Guinea: Distribution of human antiviral antibody defined using enzyme-linked immunosorbent assay with recombinant antigen. J Med Virol. 1993;40(3):210–217. 10.1002/jmv.1890400308 [DOI] [PubMed] [Google Scholar]
  • 57. Nimo-Paintsil SC, Fichet-Calvet E, Borremans B, Letizia AG, Mohareb E, Bonney JH, et al. Rodent-borne infections in rural Ghanaian farming communities. PloS One. 2019;14(4). 10.1371/journal.pone.0215224 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Sogoba N, Rosenke K, Adjemian J, Diawara SI, Maiga O, Keita M, et al. Lassa virus seroprevalence in sibirilia commune, Bougouni District, Southern Mali. Emerg Infect Dis. 2016;22(4):657. 10.3201/eid2204.151814 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Yalley-Ogunro J, Frame J, Hanson A. Endemic Lassa fever in Liberia. VI. Village serological surveys for evidence of Lassa virus activity in Lofa County, Liberia. Trans R Soc Trop Med Hyg. 1984;78(6):764–770. 10.1016/0035-9203(84)90013-0 [DOI] [PubMed] [Google Scholar]
  • 60. Fraser DW, Campbell CC, Monath TP, Goff PA, Gregg MB. Lassa fever in the eastern province of Sierra Leone, 1970–1972. Am J Trop Med Hyg. 1974;23(6):1131–1139. 10.4269/ajtmh.1974.23.1131 [DOI] [PubMed] [Google Scholar]
  • 61. Leirs H, Verhagen R, Verheyen W. Productivity of different generations in a population of Mastomys natalensis rats in Tanzania. Oikos. 1993; p. 53–60. 10.2307/3545308 [DOI] [Google Scholar]
  • 62. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media; 2009. [Google Scholar]
  • 63. Elith J, Leathwick JR, Hastie T. A working guide to boosted regression trees. J Anim Ecol. 2008;77(4):802–813. 10.1111/j.1365-2656.2008.01390.x [DOI] [PubMed] [Google Scholar]
  • 64.Hijmans RJ, Phillips S, Leathwick J, Elith J. dismo: Species Distribution Modeling; 2017. Available from: https://CRAN.R-project.org/package=dismo.
  • 65. Hijmans RJ. Cross-validation of species distribution models: removing spatial sorting bias and calibration with a null model. Ecology. 2012;93(3):679–688. 10.1890/11-0826.1 [DOI] [PubMed] [Google Scholar]
  • 66. McCullagh P, Nelder JA. Generalized linear models. 2nd ed. CRC Press; 1989. [Google Scholar]
  • 67. Fisher-Hoch S, Hutwagner L, Brown B, McCormick J. Effective vaccine for Lassa fever. Journal of virology. 2000;74(15):6777–6783. 10.1128/jvi.74.15.6777-6783.2000 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.WorldBank. Life expectancy at birth, total (years); 2020. https://data.worldbank.org/indicator/SP.DYN.LE00.IN.
  • 69. Shaffer JG, Grant DS, Schieffelin JS, Boisen ML, Goba A, Hartnett JN, et al. Lassa fever in post-conflict Sierra Leone. PLoS Negl Trop Dis. 2014;8(3):e2748. 10.1371/journal.pntd.0002748 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Redding DW, Moses LM, Cunningham AA, Wood J, Jones KE. Environmental-mechanistic modelling of the impact of global change on human zoonotic disease emergence: a case study of Lassa fever. Methods Ecol Evol. 2016;7(6):646–655. 10.1111/2041-210X.12549 [DOI] [Google Scholar]
  • 71. Antia R, Regoes RR, Koella JC, Bergstrom CT. The role of evolution in the emergence of infectious diseases. Nature. 2003;426(6967):658–661. 10.1038/nature02104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72. Richmond JK, Baglole DJ. Lassa fever: epidemiology, clinical features, and social consequences. BMJ. 2003;327(7426):1271–1275. 10.1136/bmj.327.7426.1271 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.CDC. NCDC Lassa cases; 2020. https://ncdc.gov.ng/data.
  • 74. Yun NE, Walker DH. Pathogenesis of Lassa fever. Viruses. 2012;4(10):2031–2048. 10.3390/v4102031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Worldometers. Western Africa Population;. https://www.worldometers.info/world-population/western-africa-population/.
  • 76. Cross M, Buddle B, Aldwell F. The potential of oral vaccines for disease control in wildlife species. Vet J. 2007;174(3):472–480. 10.1016/j.tvjl.2006.10.005 [DOI] [PubMed] [Google Scholar]
  • 77. Murphy AA, Redwood AJ, Jarvis MA. Self-disseminating vaccines for emerging infectious diseases. Expert Rev Vaccines. 2016;15(1):31–39. 10.1586/14760584.2016.1106942 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78. Freuling CM, Hampson K, Selhorst T, Schröder R, Meslin FX, Mettenleiter TC, et al. The elimination of fox rabies from Europe: determinants of success and lessons for the future. Philos Trans R Soc Lond, B, Biol Sci. 2013;368(1623):20120142. 10.1098/rstb.2012.0142 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79. Leirs H, Verheyen W, Michiels M, Verhagen R, Stuyck J. The relation between rainfall and the breeding season of Mastomys natalensis (Smith, 1834) in Morogoro, Tanzania. In: Annales de la société Royale Zoologique de Belgique. vol. 119; 1989. p. 59–64. [Google Scholar]
  • 80. Bonner PC, Schmidt WP, Belmain SR, Oshin B, Baglole D, Borchert M. Poor housing quality increases risk of rodent infestation and Lassa fever in refugee camps of Sierra Leone. Am J Trop Med Hyg. 2007;77(1):169–175. [PubMed] [Google Scholar]
  • 81. Bausch DG, Demby AH, Coulibaly M, Kanu J, Goba A, Bah A, et al. Lassa fever in Guinea: I. Epidemiology of human disease and clinical observations. Vector Borne Zoonotic Dis. 2001;1(4):269–281. 10.1089/15303660160025903 [DOI] [PubMed] [Google Scholar]
  • 82. Akhmetzhanov AR, Asai Y, Nishiura H. Quantifying the seasonal drivers of transmission for Lassa fever in Nigeria. Philos Trans R Soc Lond, B, Biol Sci. 2019;374(1775):20180268. 10.1098/rstb.2018.0268 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83. Nuismer SL, Remien CH, Basinski AJ, Varrelman T, Layman N, Rosenke K, et al. Bayesian estimation of Lassa virus epidemiological parameters: Implications for spillover prevention using wildlife vaccination. PLoS Negl Trop Dis. 2020;14(9):e0007920. 10.1371/journal.pntd.0007920 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84. Schreiner CL, Nuismer SL, Basinski AJ. When to vaccinate a fluctuating wildlife population: is timing everything? J Appl Ecol. 2020;. 10.1111/1365-2664.13539 [DOI] [PMC free article] [PubMed] [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008811.r001

Decision Letter 0

Nina H Fefferman, Amy Wesolowski

27 Sep 2020

Dear Dr. Basinski,

Thank you very much for submitting your manuscript "Bridging the gap: Using reservoir ecology and human serosurveys to estimate Lassa virus incidence in West Africa" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. A number of the reviewers have raised a number of very important and justified concerns that would be need to be addressed. 

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Amy Wesolowski

Associate Editor

PLOS Computational Biology

Nina Fefferman

Deputy Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: In this paper, the authors take a machine learning approach to predicting rates of Lassa virus (LASV) spillover from its reservoir host, Mastomys natalensis, to human populations in West Africa. The authors use an approach that is distinct from previous studies by training their model exclusively on the reservoir data to derive a metric of regional spillover risk from (a) presence/absence studies of the reservoir rodent species, Mastomys natalensis, combined with (b) field studies testing for the presence/absence of LASV infection in M. natalensis in the same locality. The authors then use the pixel-by-pixel output layer from these combined predictions to derive a measure of the force of infection that drives zoonotic spillover of rodent-borne LASV to the human population. They then solve an SIRS model at endemic equilibrium for human populations at each corresponding locality, assuming infection to be due to spillover alone and compare the estimated “Recovered” population proportion at endemic equilibrium to previous measures of human seroprevalence in the region. The authors find a weak correlation between their predictions for human steady state seroprevalence by region and the actual data.

The paper does an admirable job attempting to actually test some of these machine learning predictions about “spillover risk” with actual data in the human population—a practice that has not been previously attempted. However, I am not convinced that their test measure (seroprevalence in the human population) has any real biological significance, and I am concerned about the steady state assumptions of the ODE, as well as the extremely high case predictions generated under assumptions of waning immunity when much of the LASV literature (e.g. Bausch et al 2013) suggests that human immunity to LASV infection is longterm. My main concerns can be summarized as follows:

1. On line 436 of the Discussion, the authors raise the issue of contrasting dynamics of multiple genotypes of LASV. This is a critically important consideration for LASV and also deserves consideration from the perspective of the reservoir host, since previous work has shown that that different sub-species of M. natalensis carry genetically distinct different arenaviruses all closely related to LASV but with varying capacity for infecting humans (see Gryseels et al. 2017 Plos Pathogens). I would imagine that human serological responses to a number of these arenaviruses may be impossible to differentiate, but modeling their dynamics collectively amongst a variety of hosts is not correct. At the very least, the authors need to include some discussion of these considerations. Better, I would like to see the sub-species of M. natalensis reported or at least summarized so that we can better understand the accuracy of these risk layers that treat all M. natalensis as equal.

2. The authors highlight that their spatial ‘risk’ model is likely to be more accurate than previous because it is limited to reservoir data only, and rodents are much more restricted in their movements than humans. That obviously could explain why their risk predictions only show a weak association with regional human seroprevalence (e.g. the humans are moving around), but it is important to note too that there is vast uncertainty in the regions where essentially no surveillance has taken place, as rodents are likely to be sampled (and in particular tested for LASV) in regions where humans are known to have been previously infected. In particular, the authors include Togo and Benin in their risk map, but it appears that neither country has reported any serosurveillance of rodents for LASV, which does not mean that these regions are low risk but rather that we don’t know anything about the.

3. The second layer pixel map for whether a region is deemed to be LASV positive vs. negative is derived from PCR tests of M. natalensis captures for LASV. Intriguingly, the authors choose to represent the region as LASV negative if either serology or PCR-based results in rodents show no evidence of LASV infection. This is reasonable, given that a seronegative result shows no back history of exposure. However, the authors only designate a region as LASV positive if there is PCR-based detection of LASV infection in M. natalensis, and no mention is made of what is done if there is not PCR-based evidence but if the rodent population showed seropositivity. I am concerned that PCR-based detection is unlikely to pick up on seasonal shedding of virus (which we know is important for LASV – see Akhmetzhanov et al. 2019 Phil Trans Roy Soc), but the metadata posted to github appear to not show the seasonality of infection. In fact, some of the diagnostic methods in the data are listed as “various” which provides no information at all. I would like to understand if there were positive serological data in the rodents that were excluded and why.

4. Point #3 leads me to also highlight the absence of incorporation of any signature of seasonality in infection risk in this paper – risk is measured exclusively as a spatial effect under equilibrium assumptions which just is not accurate. We know full well that there is a risk season for Lassa which takes place in the Dec-Jan months that correspond to the dry season and the rodent breeding season and this is barely even discussed, let alone explored in detail.

5. In addition, building on point 4, the authors use a somewhat strange approach to predicting seroprevalence spatially, assuming the absence of spatial fadeouts of human infection which are almost a certainty for a rare zoonosis with a distinctive seasonality. It would be much more compelling if they could make this a catalytic model predicting age-seroprevalence (e.g. Muench 1959) and a constant hazard of seroconverting across an individual’s lifespan. I did not go through all the reported human data in detail, but at least a subset of it appears to have age data associated. This paper would be much stronger if the authors made an attempt at this.

6. Additionally, the authors report estimated seroprevalence and incidence for assumptions of both SIRS and SIR. There are certainly compelling studies out there suggesting that human Ab immunity to LASV can be lifelong (e.g. Bausch et al. 2013 ASTMH) which contradicts the SIRS assumptions and provides a more reasonable (i.e. lower) estimate of cases. I would like to at least see the emphasis on SIRS justified and it would be useful to see some comparison of a boosting model where low Ab titers wane but can be boosted upon re-exposure. Given that humans are not modeled as infectious, it should not change the dynamics but it might fit the data better… Which leads me to ask—how well do these models fit the data? I would love to see some statistical output of that line’s fit to the data in Figure 4.

A few additional line-by-line comments are listed below:

Author Summary:

- SARS-2 coronavirus is incorrect terminology. It is SARS-CoV-2 and coronavirus is already represented in the middle of the term

Main text

- Line 44: It is important to caution that these assumptions probably don’t hold for all reservoir species (e.g. bats are highly mobile)

- Line 49: I know that ambisense viruses are still classed within the negative sense viruses for taxonomic convenience but it is a bit strange here to see that you class Lassa as both negative sense and ambisense. Suggestion to just stick to ambisense here.

- Line 91-92: What about sub-species? This is super important to Lassa virus dynamics – see comments above

- Line 114: PCR tests really need a seasonal component to pull off this Lassa positive/negative risk map. Why are serological studies not included here but are included in evaluation of Lassa negative regions? See comments above.

- Line 254: Spillover risk is modeled as FS, where F is the regional FOI from the reservoir and S is the human population density, which assumes this to be a density-dependent transmission process. Can you justify this assumption? This is what leads you to conclude that Nigeria is at the highest risk (they have the most people), but that does not necessarily mean that there will be a higher number of human-rodent contacts. Unless rodent density increases with human density (which it may), it seems like a single rodent would have a fairly limited movement range and likely a discrete number of likely human contacts.

- Line 258: This lack of mortality effect is a strong assumption in a disease that has known mortality. I accept the need to model it this way, but how might consideration of this change results?

- Figure 1: It seems that you can’t say anything really about the risk of LASV spillover in Mauritania, Burkina Faso, Senegal, Niger, Benin, or Togo since there are no rodent infection data to actually evaluate.

- Figure 3: As in above, this figure highlights that the “risk map” of potential spillover for Lassa is really just an absence of surveillance. The regions where infected Mastomys have been previously tested pop up and those regions where they have not do not.

- Figure 4: Can you provide some statistical output on the findings in this figure?

- Line 345: This is extrapolating way too far – I would advocate for restrict the projections to the more conservative spatial estimates. It seems like this is reported here just to have a dramatic number.

- Great job with the SI and the github repo. Code and data are nicely organized and easily accessible.

Reviewer #2: Note: review comments also uploaded as a PDF, which is formatted better so may be preferable for the authors to work through.

Reviewer summary

This paper presents a method to predict zoonotic spillover risk from environmental data. Boosted classification trees were used to model the relationship between environmental variables, the reservoir population and animal Lassa virus prevalence to ultimately predict the incidence of human Lassa virus infection. The authors propose that the methodological novelty lies in the distinction between the vector distribution and pathogen reservoir layers (used to generate predictions) and the human seroprevalence layer (used to model the link between predictions and observations). The model predicts a far greater number of Lassa virus infections than current, outdated estimates and predicts their distribution across countries in sub-Saharan Africa. These results imply that existing surveillance likely misses the majority of symptomatic and asymptomatic infections and highlights key geographical areas for enhanced virus surveillance. Much of the discussion and implications pertain to future work and routes to understand the (often substantial) remaining variance between predicted and observed human seroprevalence. To me, the main hypothesis being tested is “does the presence of Lassa infected M. natalensis predict magnitude of human seroprevalence?”, and the results suggest “not very well on its own”.

I am not an expert on zoonosis nor Lassa fever, so I am reviewing this from the perspective of an interested infectious disease epidemiologist with a computational background. I apologize if my questions or comments on the disease natural history seem obvious or silly!

The paper is very well written; the flow between sections is great, the introduction sets the scene nicely, and I left the methods feeling qualified to understand the remaining sections. The code sharing and accompanying readme are clear, and I was able to re-run their analyses with no problems (though I do not have a Mathematica license).

This paper appears to be an important next step in a fairly sparse literature (at least from my cursory searching). However, right now it is predominantly an eloquent presentation of a method (with some novel ideas) but with only a cursory consideration for the biological implications. I am also a little confused by the methods: combining the two layers by multiplying them, then using the binary classifier score as a predictor for a continuous outcome in the GLM seems like a weird pipeline. Though this may be my misunderstanding. Given that the binary classification score (from the first two layers) appears to have little explanatory power in the GLM, I am not sure what the results add over the referenced Mylne et al. paper. I think these methodological concerns need to be clarified and the implications of the results (rather than just the unrealized potential of the method) made clearer if this is to warrant publication in PLoS Comp Biol.

Major comments

- The authors claim to be modelling spillover risk, but I am not convinced that is what their model is predicting. I am probably missing something crucial, but it seems odd to train the model on binary data (presence/absence of LASV or M. natalensis) and to then use the predictions as a quantitative estimate of spillover risk. The model may predict a high classification score for presence of LASV infected rodents in a pixel, but does that necessarily mean that it is predicting higher LASV prevalence in the reservoir population? Is it distinguishing between a small rodent population with low prevalence and a large population with high prevalence? It seems like those data points are weighted the same here. Therefore, from what I understand, the method is only predicting the binary presence or absence of at least 1 LASV infected rodent which is what Mylne et al. did (they call it suitability index rather than spillover risk). If so, what do the authors think the GLM would look like if Mylne et al’s suitability index were used instead?

- The GLM clearly does work given that it explains some variance and the slope is significant. However, if the predictor is only binary, then isn’t the comparison to magnitude of human seroprevalence somewhat flawed? For example, a classifier for “absence, low LASV prevalence in rodents, high prevalence LASV prevalence in rodents” would make more sense if the aim is to predict magnitude of risk. Is a lot of quantitative information being thrown away converting to a binary prediction, and is this why the GLM appears to explain such little variance? Maybe I am trying to be too mechanistic, but I think this would be helped if the methods and results clarified why this approach makes sense.

- Figure 2 and associated methods: I have been trying to intuit why simply multiplying the likelihoods makes sense, but something feels off. The D_M layer is predicting the likelihood of M. natalensis presence given the environmental variables, whereas the D_L layer is predicted the likelihood of LASV circulation given the environmental variables and the presence of M. natalensis. I would recommend adding some text to explain the intuition behind this. A bit of a half-baked thought, is it related to the conditional probability P(LASV | rodents) * P(rodents)? If so, is multiplying the likelihoods the right approach?

- Discussion of the implications: does this new denominator change our understanding of the Lassa fever clinical spectrum and epidemiology? For example, if the IFR is 2%, then 3 million infections implies 60,000 deaths. Is this consistent with underreporting of LF-associated deaths, or does it imply that the IFR is set too high?

- A positive comment – combining a SIRS model with predicted seroprevalence and immune waning estimates to get incidence is really neat!

- Figure 4 is pretty underwhelming: it leaves the reader feeling unconvinced that the modelled relationship is real or useful. The plot should at least have confidence intervals on the model line. Also is this from the weighted or unweighted version? And relating to Figure 5 (see minor comments), is this actually doing much better than assigning each pixel the mean seroprevalence? The authors derive a null model for Lassa seroprevalence but don’t really use it in this way.

- The discussion is well written and raises a lot of interesting points. However, it does not discuss the results from the D_M and D_L layers at all. For example, it is not clear how to interpret the result that maximum precipitation and precipitation contingency are the 2 most important predictors in both layers. Are the key environmental predictors the same or different to previous work, and are there any novel biological insights there? L467 hints at this, but I didn’t feel like the discussion clarified what these insights are. There is some text in the supplement, so this should at least be referred to.

Minor comments

- The omission of a time component in the model seems important. The reader doesn’t know when e.g., the rodents were captured relative to e.g., when seroprevalence was measured. Is there some evidence to support the assumption of steady-state seroprevalence rather than increasing seroprevalence over time? Are there any longitudinal or repeated cross-sectional studies?

- I think there is an opportunity to further test the sensitivity of the D_M layer. You have an independent dataset for true positive M. natalensis presence: the data used to train the D_L layer. Does the D_M layer accurately predict the presence of M. natalensis in those locations? This might be a useful small supplementary analysis to support the accuracy of the D_M layer.

- The AUC for the same predictors (absence/presence of M. natalensis or Lassa) looks almost identical to those in Table 2 of Mylne et al. Does this mean anything, given that the top environmental predictors are the different?

- L41: It isn’t clear to me how the issue of human seroprevalence not coinciding spatially with location of infection is resolved here. When comparing the model predictions to observed seroprevalence in the GLM, is this issue not being re-introduced anyway? Consider clarifying.

- L70+ The date range for the animal and LASV data is not mentioned, but it is for the human seroprevalence data. Would be good to see this to understand how comparable the dates are.

- L70+: Similarly, the survey data are not described in much detail. I understand that there are so many sources that this would take up too much text, but a supplementary table, particularly for the human seroprevalence data, describing sample sizes, demography, protocol etc might be useful.

- L121: The text suggests that the human seroprevalence data are surveys of all arenaviruses antibodies, not specifically Lassa. Perhaps this is obvious to someone with experience with these pathogens, but is there a reason why the included studies were not LASV specific? Are the assays highly cross-reactive or are there no other arenaviruses that infect humans? This should be clarified.

- L130: I understand the decision to only include land types that have been consistent for 20 years from a methodological perspective, but does this not ignore a major source of spillover risk – increased activity at the animal-human interface? It’s not my area so I may be wrong, but if this is important, its implications and limitations should be discussed in the main text. From the abstract of the cited Gibb et al. paper: “Although the recent increase in LF case reports is likely due to improved surveillance, recent studies suggest that future socio-ecological changes in West Africa may drive increases in LF burden”.

- Related, L444 mentions time but not in the context of land type. Perhaps mention this here.

- L177: Is a Bonferroni correction for multiple comparisons appropriate here? Probably not given that this is just an arbitrary pre-filtering step, but something to consider.

- L207: Maybe I’m being slow, but I don’t follow the logic for how false negatives make the classifier conservative. Please clarify.

- L209: Dumb question, but is this spatial distance or minimum distance across environmental variables?

- L258: Is it not a fairly easy addition to the model to assume some deaths? 2% mortality is a lot of deaths when you’re predicting millions of infections! How would this relate to how we understand LF epidemiology now? Though I understand if this would mess up eq 2-4.

- L266: is there evidence that arenavirus seroprevalence is at steady state or is this an assumption?

- L280: Refers to the data section, but there is no information on the WorldPop data there. Is this text missing or is the reference erroneous? (I think it should refer to S1 Appendix).

- L282: Assumed a mean lifespan of 50 years for all locations. Why not use country specific estimates?

- L281: It took me an embarrassingly long time to intuit these equations. The equations are correct, but I couldn’t get my head around the idea of there being a constant “steady-state population size” if the real population size is growing (ie. when b > dN). I think I see now that this is a possible equilibrium because i) the incidence rate does not depend on the I compartment (as in a textbook example) and ii) crucially, the birth rate b is assumed to be not density dependent. I couldn’t understand how the incidence of new infections could be constant while the population is growing and the proportion in each compartment stays the same. But I think it works because b just feeds in a constant number of new susceptibles (rather than an increasing number if the term were bN, which I think would be more standard). Nothing to be done if this is a trivial realization, but some text to explain the intuition for why this steady state exists and the rational for a flat birth rate might help a confused and pedantic reader like me!

o I found page 16 of this: https://arxiv.org/pdf/2004.04675.pdf and this paper which might be helpful: https://doi.org/10.3390/math5010007. Not suggesting to cite.

o Also page 88 of the “an introduction to infectious disease modelling” by Vynnycky and White was useful.

- L292: At the moment, the authors present the immune waning estimate from McCormick et al. 1987 as an upper bound. Is there any evidence to suggest that waning might be even faster than the point estimate from McCormick 1987 (e.g., were there confidence intervals on lambda in that paper)? If so, this would be a useful scenario to present.

- L319: missing space before the bracket.

- Figure 5: a histogram of residuals might be a useful subplot here. It’s hard to assess how well the model is doing from this plot.

- L376: This is a really key paragraph, but a bit more information on the surveys, for example, predominant housing type in particular outliers above/below the prediction line, could be useful. Was this information not available to include in the GLM here?

- I don’t know this literature, but it looks like there may be a few relevant studies tackling a similar problem that are not discussed, e.g., https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.12549 which also provides quantitative spillover estimates

Comments on code

Really nicely documented, well written and clean code. Great job. I ran everything line by line and found no obvious bugs! Only small comment is to not use reserved words (eg. data, grid) for user defined objects.

Generate_Reservoir_Layer.r

L79: in the object `classi.dat.rod`, none of the pseudo-absence variables have countries or confidence. Just checking that this doesn’t impact anything downstream.

Reviewer #3: Improved methods to track and investigate determinants of zoonotic spillovers will enhance our ability to mitigate their risk to public health. I appreciate the authors’ efforts to improve the methods we use to understand these phenomenon, and to unravel remaining questions about the distribution of risk for human Lassa fever infections. Their inclusion of serosurveys into estimates is very reasonable and the approach could be of interest to readers.

The paper would benefit from a framing that includes more about what we do and do not know about the distribution of human risk for Lassa, and what we would do differently if we did know these things to motivate the study. Currently, the introduction includes elements from the methods, results, and discussion and the paper would be easier to follow if the role of each section of the paper were more clearly delineated.

More specific comments on each section are below:

Abstract: The abstract is very slim on results and offers few conclusions about what the next steps should be, given the new findings from this study. Consider cutting back on the background and rationale to make room for results and some discussion.

Introduction

Lines 3-4: Not all zoonotic pathogens circulate in a wildlife reservoir.

Lines 28-29: “As a result, the extent to which predicted risk explains the realized variation in human exposure to the pathogen is unclear.” Might one way to evaluate this also be surveys to better understand exposures?

The introduction would be more compelling if it focused on what is and is not known about risk to humans, a summary of what we already know about the frequency and distribution of human infections, and the gaps in our understanding of risk. Limits of the published models to estimate this risk are important to mention, but should also be placed within the context of measured seroprevalence. A clear statement of the study objective(s) would also be helpful for the reader.

Data

It would be useful to say something about when data were collected and the geographical extent they represent. It becomes clear as the paper goes along that the data are obtained through literature review, but it would be useful to have this stated upfront, perhaps with some of the methods used to identify the papers with data applicable to the analysis. I’m not clear on why the description of where the data come from is not included in the section called methods.

The secondary data used in this study span many decades – it would be useful for the reader to have a table outlining the location and year(s) for the data used in the models.

Line 94: The study area is unclear.

How are pixels defined?

Lines 116-118: Can the authors provide some rationale for the definition of positive and negative pixels? Are there any limitations to the approach authors used to define positive and negative pixels?

Lines 122-123: Regarding the requirement for individuals be sampled randomly, would it not be sufficient for households within a village to be sampled at random, and individuals from those households included? Additional details about the literature reviews, including the papers that were excluded from the study would be useful for the reader to understand the process. (See comment about summary table above, which could also possibly include details about sampling methods.)

Authors have restricted the serosurveys they included in the models, for good reason, to those that were conducted in some kind of random sample of the population. However, would be good to compare the known occurrences of infections with the resulting model predictions to see how they correspond.

What are the limitations of serosurveys in humans? For example, how good are the assays that measure previous infection and how long do antibodies last? Some information about the antibody response would be useful to inform interpretations of the yearly incidence estimates. If the authors’ primary objective is to estimate the number of infections, I would assume that limitations in the measurement of past infection and assumptions about interpretations of these results should be discussed in detail – both in the methods and discussion section. Since the resulting estimates of the numbers of cases each year from this analysis have a very wide range, might it be useful to conduct some sensitivity analyses around some of the key assumptions about what a serologic response means? The authors correctly note that greater insight into the natural history of infection would be useful, but which points in particular would be most important to understand better for this kind of model?

Results

Figure 5: There is little variation in the predicted seroprevalence across West Africa – although the ‘hotspots’ have a predicted 18-20% seroprevalence, 12% prevalence in other areas also seems high. Most of the serosurveys used to inform this model have estimates that are outside of the entire predicted range.

Lines 340-341: It’s unclear why the authors calculate estimated yearly infections using the as assumption of uniform risk.

Lines 345-346: What assumptions were made about duration of antibody protection to estimate reinfections? Additionally, information about the natural history of Lassa infections in humans would also be useful. How many infections are expected to be symptomatic? What proportion of people infected will die?

Are there age or gender differences in risk or in seroprevalence? How were these accounted for in the model?

Predicted seroprevalence is 12% even in areas without the known reservoir host. This seems like a limitation in the ability of the model to predict human risk. Indeed, the fit of the model isn’t great – would be good to include more about this in the discussion section.

Discussion

This section could be improved by reducing the overlap with the results.

Lines 383-385: Demographic and Health Surveys are routinely collected from West Africa and these include home building materials. These data are publicly available and could be incorporated into the model, as the authors suggest.

Lines 394-396: While the authors identified areas of relatively higher risk, or higher numbers of infections because of larger population sizes, it’s unclear why other areas that still have significant risk shouldn’t be targeted for surveillance and risk mitigation strategies. Why are relative differences in risk more important than absolute risk?

I would be interested in hearing more from the authors about the modest correlations with the human seroprevalence data. Indeed, many of those studies are very old and it is likely that the conditions that drive Lassa transmission have changed in the intervening years. What are the pros and cons of the data used for the models and why is the correlation so modest? What else are we missing about this disease system?

Lines 392-394: The authors mention multiple times the use of vaccines for rodents. As far as I know, there are no rodent vaccines for Lassa. There are efforts underway to develop human vaccines. It seems that part of the rationale for this study is to inform prevention; if true, it would be useful to provide a more informed discussion of the possible strategies and how their deployment might be facilitated by this research.

Line 444: The authors bring up the idea of Lassa elimination which seems to be an unrealistic goal for a pathogen endemic in a wildlife host. Are there examples of elimination from other disease systems that the authors can reference to support this idea?

The authors state (467-469): “…the structured machine-learning models we develop here provide insight into what aspects of environment, reservoir, and virus, contribute to spillover, and the potential risk of subsequent emergence into the human population.” It would be good to more completely discuss each of these points in terms of what was previously known and what we learned new from this study for each of these points to highlight the added value of this work.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: James Alexander Hay

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions, please see http://journals.plos.org/compbiol/s/submission-guidelines#loc-materials-and-methods

Attachment

Submitted filename: PCB_Lassa.pdf

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008811.r003

Decision Letter 1

Nina H Fefferman, Amy Wesolowski

31 Dec 2020

Dear Dr. Basinski,

Thank you very much for submitting your manuscript "Bridging the gap: Using reservoir ecology and human serosurveys to estimate Lassa virus incidence in West Africa" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we still have concerns that key points in the requested resubmission were adequately addressed. In particular, there are three key points: 1) addressing how the very staggering number of estimated LASV human infections per year can be validated/justified to be realistic, 2) a human infection model using an age-seroprevalence model and 3) the use of SIRS dynamics over SIR. If these points can be addressed adequately, we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. 

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Amy Wesolowski

Associate Editor

PLOS Computational Biology

Nina Fefferman

Deputy Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: I reviewed this paper the first time some months back and raised the following main concerns:

1. Inability of the authors to distinguish data from different genotypes of LASV and/or sub-species of M. natalensis

2. Vast uncertainty in the accuracy of predictions from regions that have never been sampled for rodents, in particular Mastomys, at all

3. Confusion over whether sites that were LASV Ab+ for rodents but PCR- were considered LASV+ in the pathogen layer

4. Lack of seasonality included in either the reservoir or pathogen layer.

5. A desire to see human infections modeled using an age-seroprevalence approach.

6. Concern over the emphasis on SIRS dynamics over SIR with very little support for this decision.

On the whole, the authors have done a fairly thorough job addressing these concerns: they have directly adapted their analyses in response to points #1-3, discussed but not addressed #4 and 5, and taken a haphazard approach to their response to point #6. I can accept that #5 (age-seroprevalence) will not be explored in this paper but I still have questions over the two outstanding points:

1.Lack of seasonality included in either the reservoir or pathogen layer:

Point taken that dynamics are assessed over long timescales and that spatial, environmental factors are accounted for. However, given the sparseness of the data for some of these localities, as well as the short lifespan of the rodent hosts (meaning Ab+ data may be difficult to acquire), I still think it is possible that a few rodents are sampled at the wrong time of year to get a false negative LASV pixel result for a given region. Along these lines, do you really think 5 Ab negative rodents is enough to conclude that a site is LASV(-)? My understanding is that the average lifespan of M. natalensis is only ~6 months, so I would not be surprised at all to find 5 seronegative individuals in a site where Lassa really does occur. Can you provide a baseline seroprevalence in rodents of a LASV(+) site for comparison?

I like the Excel workbooks with the raw data added to the Github repo, but neither the M. natalensis nor the Lassa tables report month or season of each data point. If these data exist, they should be reported and, ideally, included in the regression model for the

reservoir and pathogen layers. At a very minimum, I would like the authors summarize the seasonality of input data in some way to show that there is not some glaring inconsistency whereby a rodent was never sampled at the time of year relevant for the disease in question in a particular area.

2.Concern over the emphasis on SIRS dynamics over SIR with very little support for this decision.

The authors attempt to address these concerns, and in fact, they do a decent job of emphasizing that support is fairly weak for SIRS assumptions in the results and discussion; however, the methods are still entirely focused on an SIRS approach and now inconsistent with the rest of the paper. This discrepancy needs to be addressed – see specific line by line comments below. Additionally, the authors note in their response that cell-mediated immunity is known to play a role in LASV response in humans and that Abs wane with time—this may be true, but those dynamics are still not SIRS. If humans remain immune but seronegative, they should move into a different class that is Ab negative but certainly not susceptible; these dynamics could be modeled, and I’ll emphasize again that information on the age structure of the serological response would be helpful in assessing this.

3.One other point that I noted on my re-read was that the authors spend a lot of time at the end of the paper predicting annual “cases” (e.g. Fig 6). Perhaps I’ve become too steeped in this difference from COVID, but I would advocate for changing the terminology to “infections” rather than “cases” which to me, implies symptoms. If LASV infections in West Africa are anywhere near as frequent as suggested in this paper, then my guess is that the vast majority are asymptomatic or mildly symptomatic and go unnoticed. Given this, it would be better to describe them as cases instead of infections.

Line-by-line comments:

Abstract:

No need to address this in the abstract necessarily, but you mention ‘West Africa’ throughout the manuscript and show a map of the UN-defined region in all of your figures. It would be helpful to formally define this region (or cite a UN source) somewhere in the text so that the geographic extent of analysis does not seem arbitrary.

Additionally, I think the 4 million (+) annual infections from the SIRS model is a fairly unreasonable projection, and I would suggest to leave this finding out of the abstract.

Author Summary:

Is Nigeria truly at risk for emergence of “new strains” of Lassa virus or just at risk for ‘emergence’? The authors do not report any evidence as to what genotypes to expect in one reason vs. another.

Main paper

Lines 148-153: given the short lifespan of M. natalensis and the seasonal dynamics of Lassa, it seems that 5 seronegative rodents might be easy to acquire. What is the comparative seroprevalence in Lassa-positive regions?

Line 163-167: It would be would be helpful to see a PRISMA diagram in the supplement that explains how you compiled your data for each layer: what terms were searched and surveys were excluded at each point in the analysis. You are very clear about the search terms used for the rodent infections—and you include the helpful Workbooks on Github—but less so here for the humans, and I can’t find the raw human data in the repo. What terms were searched and what serosurveys were excluded at each point in the analysis?

Table 3: Edit % Pos. to % Seropositive. Edit table title to “serosurvey” instead of “survey”

Line 201/Figure 2/Line 278: Why do we assume SIRS as default? In the Results, you report under SIR assumptions but not mention is made of this in the Methods.

Line 307: Again, what about the alternative version in which immunity is maintained? Also, as mentioned above transition to a cell-mediated immune class should not give an SIRS-like dynamic, as individuals who wane from the R class will not move back to S

Line 340: Again, you emphasize this rate seroreversion extensively. I would suggest presenting the uncertainty in this rate—and the two different models derived from that uncertainty at the beginning of this methods section to make it clear that two possible out comes (and a range in between them) are present.

Line 372: As mentioned above, I would like to know if season of sampling for rodents influences the Lassa risk map. Why do you think Lassa is restricted in the west and east if not as a result of human density? Can you discuss this in the Discussion.

Fig 4. What is shown on the x-axis? Probability of a given pixel being LASV (+)? If so, label as such.

Fig 5. You only show serosurveys with pop sizes greater than 50 – why is this? According to Table 3, it looks like some of these surveys must have had very few individuals tested – how did you account for this in your model? This is not clear from the supplement or the main text.

Line 388: As mention above, I would suggest trading “cases” (implying symptomatic cases) with “infections” that might go unnoticed and explain some of these results. This terminology persists throughout the following paragraph

Table 4: Suddenly, all assumptions here switch to SIR when previously the paper emphasized SIRS. This distinction needs to be clarified and consistent throughout. I would suggest presenting results for SIR assumptions only and then including SIRS results in the supplement

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: No: Human data for seroprevalence assays appears to be missing from the github repo, or at least difficult to find, as compared with rodent and LASV data.

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions, please see http://journals.plos.org/compbiol/s/submission-guidelines#loc-materials-and-methods

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008811.r005

Decision Letter 2

Nina H Fefferman, Amy Wesolowski

17 Feb 2021

Dear Basinski,

We are pleased to inform you that your manuscript 'Bridging the gap: Using reservoir ecology and human serosurveys to estimate Lassa virus spillover in West Africa' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Amy Wesolowski

Associate Editor

PLOS Computational Biology

Nina Fefferman

Deputy Editor

PLOS Computational Biology

***********************************************************

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008811.r006

Acceptance letter

Nina H Fefferman, Amy Wesolowski

26 Feb 2021

PCOMPBIOL-D-20-01255R2

Bridging the gap: Using reservoir ecology and human serosurveys to estimate Lassa virus spillover in West Africa

Dear Dr Basinski,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Alice Ellingham

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. Details on the predictors used in the model and model fits.

    (PDF)

    Attachment

    Submitted filename: PCB_Lassa.pdf

    Attachment

    Submitted filename: Response_To_Reviewers.pdf

    Attachment

    Submitted filename: ResponseToReviewers.pdf

    Data Availability Statement

    Our full data-set and the script files used to fit the models are available in the github repository: https://github.com/54481andrew/pathogen-spillover-forecast.


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES