Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2022 Oct 1.
Published in final edited form as: Spat Stat. 2022 Jun 29;51:100679. doi: 10.1016/j.spasta.2022.100679

Combining school-catchment area models with geostatistical models for analysing school survey data from low-resource settings: Inferential benefits and limitations

Peter M Macharia a,b,*, Nicolas Ray c,d, Caroline W Gitonga b, Robert W Snow b,e, Emanuele Giorgi a
PMCID: PMC7613137  EMSID: EMS147098  PMID: 35880005

Abstract

School-based sampling has been used to inform targeted responses for malaria and neglected tropical diseases. Standard geostatistical methods for mapping disease prevalence use the school location to model spatial correlation, which is questionable since exposure to the disease is more likely to occur in the residential location. In this paper, we propose to overcome the limitations of standard geostatistical methods by introducing a modelling framework that accounts for the uncertainty in the location of the residence of the students. By using cost distance and cost allocation models to define spatial accessibility and in absence of any information on the travel mode of students to school, we consider three school catchment area models that assume walking only, walking and bicycling and, walking and motorized transport. We illustrate the use of this approach using two case studies of malaria in Kenya and compare it with the standard approach that uses the school locations to build geostatistical models. We argue that the proposed modelling framework presents several inferential benefits, such as the ability to combine data from multiple surveys some of which may also record the residence location, and to deal with ecological bias when estimating the effects of malaria risk factors. However, our results show that invalid assumptions on the modes of travel to school can worsen the predictive performance of geostatistical models. Future research in this area should focus on collecting information on the modes of transportation to school which can then be used to better parametrize the catchment area models.

Keywords: Catchment area models, Disease mapping, School survey, Missing locations, Model-based geostatistics, Prevalence

1. Background

In low resource settings, the prevalence of parasitic infections are important indicators to guide control. Surveys are traditionally undertaken among sampled residents in communities or from fixed locations that serve these communities, such as schools. School-based sampling has been used for decades in Sub Saharan Africa (SSA) to inform the targeted responses for helminth (Hodges et al., 2011; Tchuem Tchuenté et al., 2012; Soares Magalhães et al., 2011), schistosomiasis (Hodges et al., 2011; Tchuem Tchuenté et al., 2012; Soares Magalhães et al., 2011; Knowles et al., 2017; Fornace et al., 2020; Clements et al., 2006) and malaria (Gitonga et al., 2010; Brooker et al., 2009; Ashton et al., 2015; Mathanga et al., 2015) control. School-based surveys for parasitic diseases represent convenient and cost-effective sampling strategies to provide local disease information, where school attendance is high, and infections can be asymptomatic (Ashton et al., 2011; Brooker et al., 2009; Drake et al., 2011; Mathanga et al., 2015; Stevenson et al., 2013; Takem et al., 2013). Sample school surveys can be powered to provide estimates of malaria prevalence at geographical units (e.g., districts) used for decision making. For example, targeting certain districts with high malaria prevalence. More commonly, the lack of statistical power from minimally sampled schools has involved the applications of model-based geostatistical (MBG) methods (Diggle et al., 1998), using aggregated information at sampled school locations to provide information at unsampled locations (Soares Magalhães et al., 2011; Fornace et al., 2020; Clements et al., 2006; Ashton et al., 2011).

MBG methods for disease mapping have become an established set of modern and robust statistical tools (Diggle et al., 1998) that are used to inform disease control strategies (Macharia et al., 2018; Biggeri and Catelan, 2012), especially in low-resource settings where disease registries are non existent or incomplete (Alegana et al., 2020; Pop et al., 2019; Stefan et al., 2014). Typically, MBG for disease prevalence mapping aim to predict a disease risk surface using data consisting of a finite set of locations xi, for i = 1,…, N, where a number of ni individuals are tested for a disease of interest and of which yi test positive. Ideally, the xi would correspond to the locations where individuals contracted the disease but, in practice, this is often difficult, if not impossible, to assess and access. In most geostatistical analyses of epidemiological data, the location of the school or village is used as the main location of exposure to the disease of interest.

However, in most school malaria surveys, due to resource constraints, precise spatial information on the residence of the children is rarely collected. Only the geographical location of the school is collected, hence the uncertainty in the household location of the children within a school catchment area. Consequently, when mapping malaria prevalence using school survey data, the school location is often used as the exposure location due to missing spatial information on the residential location. For example, in Ashton et al. (2015), Binomial geostatistical models are fitted to serological indicators collected from school surveys, while using the location of the school to define the spatial correlation between observations. The same approach has been used in other studies that have combined school-based data with community-based data (Macharia et al., 2018; Runge et al., 2020). In Stensgaard et al. (2011), the problem of the use of the school locations to estimate covariates effects on prevalence is alleviated by averaging the covariates within 1 km around each school location. However, this approach is questionable, since it implicitly assumes a circular school catchment area with a radius of 1 km while still making use of the school location in the computation of the spatial correlation.

A common problem to these approaches is that, by allocating individuals to locations that are less representative of their actual exposure to malaria, it can potentially bias the spatial structure of the MBG model and thus invalidate the predictive inferences on prevalence. This is especially important for school going children since a bite from an infected Anopheles mosquito is more likely to occur during night, when children are usually at home (Maxwell et al., 1998).

The statistical problem addressed in this paper is related to the problem that arises in passive surveillance when dealing with spatially aggregated disease counts; see for example Cameron et al. (2021). In Sturrock et al. (2014), aggregated case data reported at health facility are modelled by defining the probability of observing a case at any given location as the product of being a case and the probability that individual would seek treatment. However, in this work the random effects that modulate the probability of being a case are modelled as a spatially discrete process that is tied to the specific definition of the hospital catchment areas. The fine scale predictions of prevalence in this case, are thus a product of spatially continuous covariates and spatially discrete random effects. In this paper, we overcome this limitation by adopting a spatially continuous random effect whose properties are independent of the catchment areas. However, one key difference between the problem addressed in this paper and previous work on aggregated counts at health facilities, is that the latter problem is more naturally addressed using spatial point patterns methods. In other words, if locations were fully observed, a Log-Gaussian Cox process (LGCP) would be a natural modelling option to model reported cases at health facilities, whilst in our case, the location is not the object of interest, but rather the prevalence associated with that. Diggle et al. (2013), using LGCPs, provide a principled solution for fine-scale mapping by modelling the spatially aggregated disease counts as the realization of an aggregated spatially continuous stochastic process. As in Cameron et al. (2021), the methods that have been developed in the context of spatially aggregated passive surveillance data are based on log-linear models do not provide a solution to the problem addressed in this paper. Nelli et al. (2020) develop methods to combine passive and active surveillance data, however the sampled locations of the latter are assumed to be observed.

To the best of our knowledge, no statistically rigorous solution has been proposed to handle the problem of missing residence locations from malaria school survey data in a geostatistical model. In this paper, we provide a first solution to this problem and compare the predictive inferences on prevalence resulting from this novel approach with standard statistical approaches that use the school location to model the spatial correlation.

2. Methods: combining school catchment area models with geostatistical models

In this study, the set of residence locations X, unlike in standard geostatistical analyses, must be treated as a random variable and a suitable distribution for this must first be defined. Once we have defined the statistical model for X, we can use this to impute likely values for the unobserved residence locations of children and incorporate these into a geostatistical model for disease prevalence. This process allows us to rigorously acknowledge the uncertainty arising from the missing residence locations in the predictive inferences of disease prevalence. However, this begs the two following questions: (1) what is a suitable model for X? (2) How to combine the model for X with a geostatistical model for disease prevalence?

To answer these questions, our approach is to develop a marked Poisson process for the unobserved residence locations, X, with marks corresponding to each of the schools, and whose domain is restricted by a school catchment area (SCA) model informed by several factors that affect travel (road network, land use, protected areas, water bodies and travel speed). We then use the resulting model for X to generate samples of locations and feed these into a MBG model for disease prevalence. This approach presents several computational issues which we address using an stochastic partial differential equation (SPDE) approximation (Lindgren et al., 2011) for spatial Gaussian processes.

We first introduce the framework for accounting for missing residence locations in the context of disease prevalence mapping which entails creating SCAs and generating samples of the most likely residential locations X. This is then followed by the estimation of the model parameters and spatial prediction of disease prevalence within a predefined geographical area of interest. The application of the proposed modelling framework is illustrated through two case studies of malaria mapping in Western Kenya using school survey data.

2.1. Accounting for missing residence locations in a geostatistical model for disease prevalence mapping

In this section, we formalize and provide a solution to the problem of how to propagate the uncertainty in geostatistical models for prevalence mapping, arising from the lack of residence locations in school survey data.

Let [·] be a shorthand notation for “the density function of the random variable ·”. We then use X = {X1,…, Xn} to denote the random variables representing the set of unobserved residence locations, and Y = (Y1,…, Yn) for the observed individual-level binary outcomes indicating a positive (Yi = 1) or negative (Yi = 0) test. Finally, let S = {S(x) : x ∈ ℝ2} be an isotropic and stationary Gaussian process with mean zero and covariance function

cov{S(x),S(x)}=σ2ρ(u),u=xx.

Assuming that X and S are independent of each other, or in other words assuming a non-preferential sampling design, the joint distribution of X, S and Y can be written as

[X,S,Y]=[X][S][YS,X]=[X][S][YS(X)] (1)

where S(X) = {S(x) : x ∈ X} and [Y|S(X)] is a set of mutually independent Bernoulli variables, i.e.

[YS(X)]=i=1n[YiS(Xi)].

When residence locations X are observed, then [X] is irrelevant for drawing inferences on S, and thus can be ignored. In our case, instead, because of the missingness of X, the distribution [X] must be integrated out from the likelihood function for the unknown vector of parameters θ, i.e.

L(θ)=[Y;θ]=A2nn[X][S][YS(X)]dSdX. (2)

where A ⊂ ℝ2 is our geographical region of interest.

To estimate θ from (2) and draw predictive inferences on S, we then first need to specify a model for the location process [X].

2.2. Modelling [X] using school catchment area models

To model the location process [X], we propose to generate a school catchment area (SCA) based on travel time from the residence of children. The SCA is then used as the boundary of the area from which we shall draw samples of the residence locations using population density information. More specifically, an SCA is defined as the geographical area or zone around a school that draws majority of the students (Macharia et al., 2021b). One approach is to use so-called gravity models to express the decreasing likelihood of geographically accessing a school, as the distance or travel time to that school increase (Guagliardo, 2004) and they have been used to model catchment areas for healthcare planning (Macharia et al., 2017; Alegana et al., 2012; Guagliardo, 2004). Hence, spatial accessibility of a residence location x is defined as

a(x)=jcjf(x,xj)γ, (3)

where: cj is a constant that expresses the capacity of a school placed at location xj, and can, for example, be quantified by the number of teachers in that school; f (x, xj) is the impedance (traveltime) between locations x and xj; γ is a gravity decay coefficient, also referred to as the travel friction coefficient.

Real SCAs vary in size according to the underlying population distribution, number of schools in the surrounding area, school capacity and other school attractiveness factors. School attendance and geocoded residential location X, when available, can be used to estimate the parameters in (3). However, in the absence of such data, as in the scenario considered in this paper, a feasible and useful alternative is to use cost distance to define the spatial accessibility function a(x) and use optimization algorithms that allow to identify an optimal route of travel to school. Based on this definition of spatial accessibility, an SCA is then defined as the geographical area encompassing all locations closest, in terms of travel time, to that school than any other school. The travel time to school is dependent on the speed of the mode of travel to school which, in turn, is affected by several spatial layers, including the road network that students travel on, land cover layer to represent the travel impedance in spaces between the roads, and barriers to movement comprising water bodies, flooded areas, and protected areas. Barriers are considered impassable, except in the presence of a bridge where a road intersects a barrier such as a river. To model the speed of the mode of transport and identify the optimal route to school, we propose to use terrain-based least-cost path distance calculation (Ray and Ebener, 2008) and geospatial layers representing factors that affect travel to generate a travel time grid indicating the time taken by a student to travel from their residential location to the nearest school.

Each road class and land cover are assigned a mode of transport and travel speed, which ideally are informed by observational data of schools attendance behaviour in a specific region. However, in low resource settings, such data are rarely available (Macharia et al., 2021b) and a common alternative is the review of literature in similar settings to assemble road speeds and modes of transport in one or more travel scenarios. We provide details on the parametrization of the speed functions for our study regions in Kenya, in Section 3.2.

After generating a SCA, we spatially link this with a population density raster (Stevens et al., 2015) which is then used to sample residential locations within the SCA. The population density rasters are constructed through dasymetric techniques that redistributes national census population counts from administrative units to high spatial resolution (e.g. 100 metres m × 100 metres) (Mennis, 2009; Stevens et al., 2015). Let C denote the area encompassed by the boundaries of a given SCA; we then generate samples X(j) = {x1(j),…, xn(j)}, for j = 1,…, B for the unobserved residence locations, X, by using a fine regular grid covering C to approximate the probability of sampling a location xc, given by

λ(x)Cλ(u)du,

where λ(x) denotes the population density at x.

2.3. Approximating [S] and the likelihood function: from parameter estimation to spatial prediction

The resulting Monte Carlo samples X(j), j = 1,…, B, obtained as described in the previous section, are now used to approximate (2) as

L(θ)n1Bj=1B[S][YS(X(j))]dS. (4)

To avoid the computation of B covariance matrices for each of the stimulated samples X(j), we approximate S using a piece-wise linear approximation for S(x). More specifically, we partition the study region A into a set of non-intersecting triangles that share at most a common edge. For the generation of the triangles we follow the approach outline in Section 2.2.2 of Krainski et al. (2018).

We then approximate the spatial Gaussian process as

S(x)=k=1mbk(x)Wk (5)

where bk(x) are basis functions and the Wk, for k = 1,… m are Gaussian random variables. Based on the created mesh, we then define the basis functions bk(x) using barycentric coordinates, in which the location of a point is specified by reference to vertices of a triangle. It then follows that bk(x) takes a non-zero value whenever a point x falls inside a triangle identified by the vertex associated with bk(x) such that k=1mbk(x)=1; for more details see Section 2.2.2 of Krainski et al. (2018). Following Lindgren et al. (2011), we assume that W = (W1,…, Wm) follows a zero-mean multivariate Gaussian distribution with precision matrix Q, which is chosen so as to approximate a Matérn spatial field with smoothness parameter κ = 1 and scale parameter ϕ. Hence, we write

Q=(ϕ2κΓ(κ)4πσ2Γ(κ+2))2(ϕ4C+2ϕ2G1+G2)

where C, G1 and G2 are sparse matrices whose entries are non-zero only for pairs of vertices that share the same triangles; for more details on how the entries of C, G1 and G2 are defined, we refer the reader to Lindgren et al. (2011).

Let xi(j) denote the ith element of X(j) for i = · · ·, n; following from (5), we then define [Y|W, X(j)] as a set of mutually independent Bernoulli variables with linear predictor

log{p(xi(j))1p(xi(j))}=d(xi(j))β+k=1mbk(xi(j))Wk (6)

where d(x(j)) is a vector of spatial covariates, recorded at location x(j), with associated regression coefficients β.

Let us split the vector of unknown parameters θ, into covariance parameters ψ = (σ2, ϕ) of the spatial process and regression coefficients β, and reparametrize (4) based on W to give

L(θ)m[W;ψ](1Bj=1B[YW,X(j);β])   dW. (7)

Since the above integral is intractable, we use Monte Carlo methods to approximate the likelihood function as follows. Let ψ0 and β0 be our initial guesses for the parameters ψ and β, respectively. To simplify the notation let g(Y,W;β)=j=1B[YW,X(j)]/B; we then rewrite (7) as

L(θ)   m[W;ψ]g(Y,W;β)dW=m[W;ψ]g(Y,W;β)[Y,W;ψ0,β0][Y,W;ψ0,β0]dWm[W;ψ]g(Y,W;β)[W;ψ0]g(Y,W;β0)[WY;ψ0,β0]dW=E0[[W;ψ]g(Y,W;β)[W;ψ0]g(Y,W;β0)] (8)

where E0 is the expectation taken with respect to the distribution of W, conditional on Y, with parameters ψ0 and β0 or, using the shorthand notation, [W|Y; β0, ψ0]. Hence, we approximate (7), by sampling from [W|Y; β0, ψ0] using a Metropolis Hastings independence sampler with proposal distribution given by a multivariate Gaussian distribution with mean and covariance matrix corresponding to the mode and inverse of the negative Hessian of [W; ψ0]g(Y, W; β0), respectively.

When the goal of the analysis is primarily focused on spatial prediction and not on drawing inferences on β, the computational burden can be alleviated by first estimating β using a simpler model that ignores spatial correlation, which is obtained by setting S(x) = 0 for all x to give log{p˜(x(j))/(1p˜(x(j)))}=dT(x(j))β. The likelihood function of this non-spatial model is given by

L˜(β)=1Bj=1Bi=1np˜(xi(j))[1p˜(xi(j))]

.

After maximizing the above function with respect to β, we obtain β˜ which we now plug-in into the linear predictor (6). This then leads to a simplified likelihood function for the covariance parameters ψ, expressed by

Lβ˜(ψ)=E0[[W;ψ]g(Y,W;β˜)[W;ψ0]g(Y,W;β˜)]=E0[[W;ψ][W;ψ0]] (9)

By maximizing the above function, we finally obtain ϕ˜ as our point estimate of ψ. Since the primary objective of our case study in predicting malaria prevalence, we adopt this approach in our two applications.

Spatial prediction of prevalence is carried by plugging the point estimates β˜ and ϕ˜ into (6).

3. Application 1: large scale mapping of malaria prevalence across eight counties of Western Kenya

3.1. Data

We analyse data from a national school-based survey of malaria prevalence conducted in 2009 in Kenya; full details of the survey are provided elsewhere (Gitonga et al., 2010). Here, we consider 84 sampled public day primary schools located in a high malaria transmission region of Western Kenya, covering eight counties and 62 sub-counties close to Lake Victoria. All eight counties had at least a school surveyed while 50 sub-counties (81%) had at least a surveyed school (Fig. 1). At each school approximately 100 children aged 4–22 years were randomly sampled from classes 2–6 for a total of 9103 children. The majority of the children (91%) were aged between 8 and 14 years while only 0.5% were aged at least 17 years. Each sampled child provided a finger-prick blood sample that was used to detect Histadine Rich Protein (HRP) as evidence of recent Plasmodium falciparum infection using a rapid diagnostic test (RDT) (Paracheck-Pf device). Slides from all RDT positive samples were examined using light microscopy and 10% of all RDT negatives. A child was deemed positive for malaria when parasites were detected on microscopy. The location of the school was recorded using a hand-held Global Positioning System (GPS) device (Gitonga et al., 2010).

Fig. 1.

Fig. 1

Study areas of the applications of Sections 3 and 4, both located in Western Kenya. Solid black (counties) and light grey (sub-counties) lines represent administrative units. The map in the lower right corner shows the study area and sampled households from the application of Section 4.

In addition to the malaria school survey data, a set of spatial covariates were used to assist the spatial prediction of prevalence at unsampled locations. These are listed and described in Table 1. Before including these covariates in our model, we explored the association of each of the covariates listed in Table 1 by taking the empirical logit transformation (Stanton and Diggle, 2013) of the total number of cases recorded at each school and plotted this against each of the covariates, whose value on the x-axis was obtained by taking its average within the SCAs.

Table 1. Summary of the covariates in the analysis of Section 3.

Covariate Description Source and resolution
1. Annual mean Temperature Temperature affects the survival and development of P. falciparum from larvae into viable adults. Temperatures greater than 34 °C lead to almost 100% larval mortality, while temperatures less than 16 °C, the larvae are unable to produce viable adults (Bayoh and Lindsay, 2004). Mortality of the anopheles mosquitoes increases at ambient temperatures approaching 40 °C while temperatures between 25 °C and 30 °C are considered optimum for Plasimodium falciparum sporogony (Molineaux, 1988). MODIS, 5.6 km grids (Busetto and Ranghetti, 2016)
2. Annual mean Precipitation Rainfall combined with suitable ambient temperatures provides potential breeding environments for Anopheles (Noor et al., 2014; Dutta and Dutt, 1978). CHIRPS, 5 km grids (Funk et al., 2015).
3. Urbanization Malaria infection is usually lower in urban compared to rural areas due to reduced malaria vector density and biting rate (Kabaria et al., 2017). Urbanization was proxied by nighttime lights (NTL) (Savory et al., 2017). NTL are also associated with human activity, population distribution, reduced poverty rates and increased access to health facilities (Kabaria et al., 2017). NTL 1 km square grids (Savory et al., 2017)
4. Enhanced vegetation index Vegetation acts as a proxy for the presence of suitable mosquitoes breeding sites (Noor et al., 2014; Dutta and Dutt, 1978). MODIS, 0.25 km grids (Busetto and Ranghetti, 2016)

3.2. Model specification for [X]

We first assembled a list of all public day primary schools in 2009 (Mulaku and Nyadimo, 2011) (Figure S1.1 in Supplementary material 1) and variables representing factors that are known to affect travel (Table 2 and Figures S1.2 to S1.4 in Supplementary material 1) in order to define the speed of a specific mode of travel at any location x of the study area (Fig. 1). To compute travel time, we used AccessMod (version: 5.7.3-alpha) (Ray and Ebener, 2008), an open-source package that models geographical access using terrain-based least-cost path distance calculation (Ray and Ebener, 2008). We used the ‘merge land cover’ module of AccessMod to merge the land cover, road network, water bodies, national parks and reserves, and obtain a ‘merged landcover’ raster at 100-metre resolution. Travel speeds were then assigned to each land cover type and road class.

Table 2. Summary of factors that affect travel time to schools in Western Kenya.

Factor Description Type, Year and Resolution
1. Road Network The road network was based on data from the Ministry of Transport that used the gold GPS technique to map coverage of roads in 2016. The layer was updated via OpenStreetMap and Google Map Maker and was cleaned by deleting duplicates and correcting digitization errors such as overshoots and undershoots at connection points or junctions and those that extended into water bodies. Further details in Macharia et al. (2017, 2021a) and Joseph et al. (2020). Vector layer, 2016
2. Land use The land use/cover information was obtained from 2016 Copernicus Sentinel-2 satellite containing five classes (bare areas,built-up areas, water bodies, cultivated and vegetation cover areas available at http://geoportal.rcmrd.org/ Raster, 20 by 20 m. 2016.
3. Digital elevation model (DEM) The slope of the terrain was derived from a DEM based on Shuttle Radar Topographic Mission (SRTM) available at http://geoportal.rcmrd.org/ Raster, 30 by 30 m
4. Barriers to movement Barriers to movement included major rivers, lakes, flooded areas and protected areas (Joseph et al., 2020). Vector layers.

School going children can use different transport modes to reach their school. Since data on modes of travel were not available, we considered three different models corresponding to travel scenarios based on different assumptions for the means of transportation used by the children to reach their school.

  • Model W. It assumes that all students walk to school with speeds ranging between 5 km/h to 0 km/h (Samimi and Ermagun, 2013; Macharia et al., 2017; Mehdizadeh et al., 2017; Alegana et al., 2021) as detailed in Table 3.

  • Model WB. Travel to school is assumed to be a combination of walking and bicycling with a maximum speed of 10 km/h for bicycles.

  • Model WM. Travel to school is assumed to be combination of walking and motorized transport (motorcycles, private and public vehicles). The maximum motorized speed was 50 km/h.

Table 3.

Speeds assigned to generate catchments for Models W (Walk only), WB (Walk and Bicycle) and WM (Walk and motorized). All speeds are in kilometer/hour and take a walking mode of transport unless where otherwise stated.

Land and road type Model W Model WB Model WM
Tree cover 3.5 3.5 3.5
Shrub cover 4.5 4.5 4.5
Grassland 4 4 4
Cropland 3.5 3.5 3.5
Regularly flooded 0 0 0
Sparse vegetation 4.5 4.5 4.5
Bare areas 5 10 (Bicycling) 5
Built up areas 5 10 (Bicycling) 5
Open water 0 0 0
Primary road 5 5 50 (Motorized)
Secondary road 5 5 30 (Motorized)
County road 5 10 (Bicycling) 25 (Motorized)
Rural road 5 10 (Bicycling) 5

Table 3 shows how velocities vary in each transport scenario for different types of terrains.

To generate SCAs for all schools in the region, we used the “accessibility” module of AccessMod to run an anisotropic travel time computation for each of three aforementioned travel models. The anisotropic option considers the slopes derived from the Digital Elevation Model (DEM) (Table 2) and travelling towards the school (direction of travel) to correct for walking (10) and bicycling speeds (Ray and Ebener, 2008; Tobler, 1993). Hence, we express the adjusted walking speed Wv as

wv=wf×eexp{3.5|k+0.05|} (10)

where wf is the speed on flat surface on the landcover considered and k is the slope derived from the elevation.

The adjustment for the walking mode of travel decreases walking speed as the slope increases, while increasing the walking speed for a negative slope, according to the Tobler’s hiking function (Tobler, 1993). Bicycling speed are adjusted assuming that the increased speed on negative slopes does not exceed twice the speed on flat ground (Ray and Ebener, 2008). The motorized speeds were not adjusted for the slope as vehicles are powered by an engine. Fig. 2 provides three example routes that help to illustrate how the chosen parametrizations for the different speed affect travel time. Using a regular grid with spatial resolution of 20-m, we estimate that a student riding a bicycle on Route 3 (blue-low class road), would travel at an average speed of 10 km per hour taking a total of 14 min to reach their school. On Route 2, a student will first walk through the area without a road at 5 km per hour (6 min), take a vehicle at the bus stop travelling at a speed of 50 km/h (2 min), taking a total of 8 min to reach their school. Finally, a student taking Route 1 (purple — high road class) would take 4 min using a motorized vehicle at 50 km per hour. The routes are least cost paths determined by the cost distance algorithm with speed adjusted as described above.

Fig. 2.

Fig. 2

A map showing examples of optimal routes taken by students, under different mode of travel. For a detailed explanation, we refer the reader to the main text.

We used the ‘cost allocation’ option in AccessMod to compute the cost allocation grid delineating all SCAs. The cost allocation algorithm is similar to that used for a Voronoi diagram in Euclidean distance analysis, or to a location–allocation model in a network analysis (Ouma et al., 2021). The generated SCAs for the 84 sampled schools were then spatially linked to the 2009 population density maps (Stevens et al., 2015) and 10,000 samples for the residence locations, X, of the children were generated. The choice of least cost distance and cost allocation algorithms to model travel time and SCAs, respectively was appropriate given they account for transport factors (Table 2) and corresponding speeds (Table 3). This is unlike the use of straight line distances, Voronoi diagram and location–allocation model that do not account for these travel factors. See Ouma et al. (2021) and Macharia et al. (2021b) for further details.

Finally, three geostatistical models under the different travel scenarios described above were fitted; the mesh used to define the piece-wise linear approximation of the spatial Gaussian process is shown in Figure S1.12 of Supplementary material 1. In addition to these models, we also fit two geostatistical models that use the school location to model the spatial correlation in the data but make different use of the covariates: the SL model uses the value of the covariate at the location of the school; the SCLA model uses the averaged covariate within SCAs.

3.3. Simulation study

We carry out a simulation study to pursue two objectives:

  1. to quantify the inferential benefit of accounting for the uncertainty in the location of residence;

  2. to understand how the mis-specification of the mode of travel may affect the predictive inferences for prevalence.

To this end, we then proceed through the following iterative steps.

Step 1. Generate a data-set of binary outcomes indicating the malaria status of children, under the geostatistical model that assumes that children use “walking” only (Model W) as a mode of travel to school.

Step 2. Fit five statistical models to the simulated data-set from the previous step: model W (the true model), model WB, model WM, model SL and model SLCA.

Step 3. Generate prevalence predictions p^ and exceedance probabilities e(x) for a 30% prevalence threshold, over a regular grid covering the study area, for each of the five models.

Step 4. Using the exceedance probabilities e(x) from the previous step, classify each pixel as being above 30% if e(x) > l and below 30% if e(x) < l, where l is obtained by maximizing the specificity and sensitivity of the classification.

Step 5. Repeat Step 1 to Step 4, 1000 times.

Let p(x) denote the true prevalence. Using a regular grid (x1,,xq) at 10-metre resolution, we then summarize the predictive performance using the following indices.

  • Average bias: j=1q(p^(xj)p(xj))/q

  • Root-mean-square prediction error: [j=1q(p^(xj)p(xj))2/q]1/2

  • Sensitivity: the average proportion across all simulations of pixels that are correctly classified as exceeding 30% prevalence based on the classification in Step 4.

  • Specificity: the average proportion across all simulations of pixels that are correctly classified as non-exceeding 30% prevalence based on the classification in Step 4.

3.4. Results

Geographic accessibility (travel time in minutes) to the nearest public day primary school in Western Kenya was highly heterogeneous in 2009. The combination of walking and motorized transport (Model WM) provided the fastest option of reaching the nearest school (ranging 0 to 128 min) (see Figure S1.7 in Supplementary material 1) while walking-only scenario (Model W) ranged between 0 and 233 min (Fig. 3). Walking and bicycling combined (Model WB) ranged between 0 and 203 min (see Figure S1.6 in Supplementary material 1). The higher travel times were common in areas adjacent to the mountains and at county border regions. Overall, the majority of the school going children had good geographic access to their nearest primary schools in 2009. Across the eight counties, 68.4%, 74.1% and 78.3% of all the school going children (2.2 million) in the region in 2009 were within 30 min of the nearest primary school for models W, WB and WM, respectively.

Fig. 3.

Fig. 3

Travel time (in minutes) to the nearest public day primary school for all 2170 public primary schools in Western Kenya in 2009 ranging from 0 min (light green) to 233 min (red) Model W. Results of Model WB and Model WM are shown in Supplementary material 1.

A universal gold standard (threshold) of travel time/distance to the nearest primary school does exist due to differences in population distribution, context, geography, infrastructure, and resources between countries. In Kenya, The Ministry of Education aims to have a school within 2 km walking distance of every household which is about 24 min based on an average walking speed of 5 km/h. The average travel time (distance) in the region was 28 min (2.33 km) while 65% of the school-going children were within 24 min. The western Kenya average is fairly comparable to Rwanda (1.7 km), higher than Gutemala (1.1 km), South Africa (1.1 km) and Peru (1.4 km) but lower than Tanzania (5.9 km) (Rodriguez-Segura and Kim, 2021). However, these estimates should be interpreted with caution given differences in methods, input data and context.

The travel time was used as the basis for generating SCAs for each of the three likely travel scenarios to a school. The catchment areas were generated for all the 2170 schools in Western Kenya in 2009 to account for competition of the neighbouring schools for the sampled schools. A subset comprising the 84 sampled schools was then retained for the subsequent analysis. The results of the catchment areas are shown in Figures S1.8 to S1.10 of Supplementary material 1. Each school catchment area covered all areas nearer to it (based on travel time) than any other school based on the cost allocation algorithm. This meant that all areas and school going children were covered by their nearest school. The size and shape of the catchment areas were variable within and between models. However, neither of the models provided a systematically smaller or bigger sized catchment area relative to the other catchments from alternative travel scenarios. Fig. 4A shows school catchment areas generated using the three travel scenarios (Models W, WB and WM) for a Nasianda Primary School where 108 students were surveyed. Fig. 4B shows generated residential locations, X, for one iteration for each child sampled from Nasianda Primary school, based on population density as the intensity of a inhomogeneous Poisson process.

Fig. 4.

Fig. 4

A: Sample of school catchment areas generated from each of transport scenarios; walking (Model W), walking and bicycling (Model WB) and walking and motorized (Model WM). The school is shown as a black dot; B sampled locations for a single iteration from one of the catchment areas (Model WB) overlaid on a population distribution map (Stevens et al., 2015) (green to blue shades). In B, the school is shown as an orange triangle. All the school catchment areas are shown in Supplementary material 1.

From the explanatory analysis, enhanced vegetation index was excluded from the analysis, since this was found to be highly correlated with both the precipitation and the temperature. The three remaining covariates showed an approximately linear relationship with the empirical logit (see Figure S1.11 in Supplementary material 1). Consequently, temperature, precipitation and night time lights were used as spatial predictors for all the five models considered (Models SL, SLCA, W, WB and WM) for school going children in Western Kenya.

All Five models considered provide a similar spatial pattern and identify the Western region as an area of high malaria prevalence in 2009. The north-Western region had the highest predicted prevalence (over 50%), while in the north-east the values range between 30% and 50%. In the Southern region, we find the lowest values of prevalence, ranging approximately between 10% and 30%. Only few small areas showed a predicted prevalence below 10%. Overall across the models, the highest predicted values of prevalence is 77.5% according to model W (X sampled within SCAs based on walking scenario) followed by Models WM and WB with slightly lower values of 75.6% and 75.1%, respectively. The SLCA (school’s location — SCA averaged covariates) model had a maximum predicted prevalence of 72.6% while model SL (school’s location) had 69.15% (Fig. 5).

Fig. 5.

Fig. 5

Annual predicted mean malaria prevalence at 1 × 1 km spatial resolution ranging from 0% (blue) to 77.5% (dark red) in Western Kenya in 2009 for 2 standard statistical models (SL and SLCA) and three models accounting for locations uncertainty W, WB, WM. Ellipses show areas with differences. The protected areas are shown in green.

Based on each of the five models, we also estimated the population of school-going children living in areas having a prevalence of at least 30% in 2009. The SL and SLCA models would have the smallest proportion of about 42.1% and 42.6% respectively. On the other hand, the W, WB and WM models yield similar estimates of 44.3%, 44.6% and 43.5% respectively.

These results suggest that difference between the five models considered can be found in localized areas of the study region. A close inspection of Fig. 5 confirms this, where we used purple ovals to highlight those areas. For example, in Busia county, corresponding to the leftmost ring of Fig. 5), the SL and SLCA models classified areas in the ring as over 50% while the other models predicted the prevalence to be between 30% and 50%

In addition to the maps of the predicted prevalence, we also compared the exceedance probabilities (EPs) that malaria prevalence lies above 30%, as shown in Fig. 6. The differences between the five models are more stark in these maps, where we highlighted areas that are at least 90% likely to exceed 30% in red. The extents where malaria prevalence was greater than 30% with over 90% probability was dominant in the north west region. The maps clearly show that the SL and SLCA models identifies a large contiguous areas of EPs larger than 90%, whilst the other three models yield a more heterogeneous pattern in the Northern part of the study region. Unlike the SL and SLCA models, in the Southern part, a relatively large area is shown to have an EP larger than 90% based on W, WB and WM. Approximately 23.1% and 22.6% of the school going children were within the areas where prevalence was classified as over 30% with a probability of at least 90% based on SL and SLCA models respectively. On the other hand, models W, WB, WM have approximately 19.2%, 16.7% and 14.3%, respectively school going children with the same margins.

Fig. 6.

Fig. 6

Maps of the predicted exceedance probability for a 30% prevalence threshold with a 90% probability on a 1 by 1 km regular grid in Western Kenya in 2009 for 2 standard statistical models (SL and SLCA) and three models accounting for locations uncertainty W, WB, WM. Ellipses show areas with differences. The protected areas are shown in green.

The results of the simulation study are presented in Section 3.3. Except for WM model which provides the worst performance, the differences between the other models in terms of the four metrics used, are rather small. These small differences can be explained by the fact that the catchment areas are relatively small to the scale parameter ϕ, hence school locations can be used to approximate the residual spatial correlation structure reasonably well in this instance (see Table 4).

Table 4. Summaries of the simulation study of the application presented in Section 3; for more details see Section 3.3.

Model Bias RMSE Sensitivity Specificity
W 0.002 0.176 0.741 0.757
WB −0.005 0.190 0.716 0.729
WM −0.008 0.257 0.569 0.581
SL −0.001 0.169 0.764 0.756
SLCA −0.001 0.169 0.765 0.756

4. Application 2: Small scale mapping of malaria prevalence in Western Highlands of Kenya

4.1. Data

In this section, we analyse data from a school-based survey of malaria prevalence conducted in 2010 in a small area within the region encompassed by the first application as shown in Fig. 1. The full details of the survey are provided elsewhere (Stevenson et al., 2013). The data consists of 46 sampled primary schools, randomly selected from a census of all public primary schools in the area. At each school, 11 boys and 11 girls per class from classes 2 to 6 were selected randomly for a total of 4852 children. Those aged 8–14 and over 17 years were 90% and 1.2% of the total children, respectively. Each sampled child provided a finger-prick blood sample that was used to detect HRP as evidence of recent Plasmodium falciparum infection using RDT (Paracheck, Orchid Biomedical Systems, India). The compound of each child sampled at school was located and mapped using a personal digital assistant (PDA) with GPS receiver (Stevenson et al., 2013) and as illustrated in Fig. 1. Important distinction between the first and the second application, is that the latter mapped the compound/households of the school children, the gold standard, while the former did not.

The set of spatial covariates in the first application listed in Table 1 were used. However, because of the low variation of these across the study site of this second application, we carry out the analysis without using any covariates. Factors that affect travel to school are shown in Figure S2.1 (Supplementary material 2).

4.2. Model specification for [X]

We used the same approaches to estimate travel time and model SCAs as defined in the first application in terms of factors that affect travel (Table 2) and corresponding travel speeds (Table 3) for the three apriori defined travel scenarios (W, WB and WM). Likewise, 10,000 sample residence locations, X, of the children were generated constrained by the SCAs from the 46 sampled schools.

Five geostatistical models were then fitted. The first three were based on sampled residence locations, X using the mesh shown in Figure S2.6 of Supplementary material 2. The other two models were based on school location (model SL) and household location, the gold standard model denoted as model HL.

In this application, the HL model, which was not available in the first application, represents our gold-standard reference which we use to discriminate which model delivers the best predictive performance. Hence, we used the following indicators to quantify which model yields a predictive surface for prevalence that follows more closely the HL model.

  • Average bias: j=1q(p^(xj)p^HL(xj))/q

  • Root-mean-square prediction error: [j=1q(p^(xj)p^HL(xj))2/q]1/2

In the above expressions, p^HL(xj) denotes the predicted prevalence a grid location x1 from the HL model.

4.3. Results

Travel time (in minutes) to the nearest primary school and the corresponding SCAs are shown in Figure S2.2 of the Supplementary material 2. Travel time to the nearest school was high and heterogeneous across the three models. Almost all the children were within half hour of the nearest primary school in the three travel scenarios. Specifically, 97.6%, 98.3% and 97.8% of all school going children in the area (347,013) were within half-hour threshold for models W, WB and WM, respectively. Those outside the threshold were adjacent to a river and on the edge of the study area.

All the five geostatistical models provide a similar spatial pattern of malaria prevalence in 2010 (Fig. 7). Overall, the malaria prevalence in the area is highly variable. The north-western region had the highest predicted prevalence of over 30% with a contiguous area of over 50%. The rest of the areas had prevalence of between 10% and 30% and a few areas in the southern and eastern border with less than 10% (Fig. 7). The gold standard model (model SL), had a maximum prevalence of 83.2%, the highest recorded among the five models in the area. This prevalence was closer to the maximum values recorded by the models accounting for uncertainty (75.3%, 77.4%, 76.0% for models W, WB and WM, respectively) relative to model (SL) estimate of 70.8%.

Fig. 7.

Fig. 7

Predicted mean malaria prevalence at 0.3 × 0.3 km spatial resolution ranging from 0% (blue) to 83.2% (dark red) in 2010 for 5 geostatistical models based on school location (SL), gold standard model (HL) based on household location and three models accounting for locations uncertainty W, WB, WM. Circles show areas with differences.

When we spatially overlaid population distribution maps with the predicated prevalence for each model, there were differences in the proportion of school-going children living in high malarious areas (at least 30% prevalence) in 2010. A third (33.1%) of the school-going children lived in areas with a prevalence of at least 30% based on model HL. The WB model had the closest proportion (27%) to model HL while, models SL W, and WM yields the lowest proportion of children within similar margins, 24.1%, 24.4% and 23.2%, respectively.

Similarly, we also compared EPs that malaria prevalence lies above 30%. In Fig. 8, we highlight areas (in red) that are at least 90% likely to exceed 30% which were predominant in the north-west region. All the models identify a large contiguous area meeting the criterion in north-west region. However, the contiguous area has smaller geographical extents for model HL relative to the other 4 models. In addition to the large area, the models HL, W, WB and WM identify other smaller patches meeting the criteria which are not identified by the school location model. Approximately 10.3% of the school going children resided in areas where prevalence was over 30% with a probability of at least 90% based on models HL and SL. The models accounting for location uncertainty, W, WB and WB had slightly higher proportion, 16.7%, 16.8%, and 15.8% of the school going children, within the same margins, respectively.

Fig. 8.

Fig. 8

Maps of exceedance probability for a 30% prevalence threshold with a 90% probability on a 0.3 km regular grid for 5 geostatistical models based on school location (SL), gold standard model (HL) based on household location and three models accounting for locations uncertainty, models W, WB and WM.

Table 5 shows the average root-mean-square and bias metrics (see Section 4.2) which are used to compare how well each of the four considered models can yield a predictive risk surface for prevalence that more closely follows most the one generated by the geostatistical model based on the actual residence locations of the students. The results indicate that, albeit marginally, the model based on the school locations generates a predictive prevalence surface that better approximate the predictions generated by the HL model than the other three models that use school catchment areas. This can be explained in this case by the fact that the accuracy with which estimated catchment areas can approximate the actual catchment areas varies greatly across the study site. Figure S2.3 to S2.5 in Supplementary material 2 shows an overlay of the modelled SCAs and the actual household location. Overall, the models fairly approximated the actual SCAs represented by household locations. Model W had the highest number of household within modelled SCAs (74.4%) while model WM had the least (68.8%) as shown in Table 1 of Supplementary material 2. Per modelled SCA, the number of households within modelled SCA was highly variable ranging from only 22.8% to 100% of the households. The performance was especially poor where there were several schools in close proximity.

Table 5. Average root-mean-square-error (RMSE) and bias as specified in Section 4.2.

Model Average RMSE Average bias
SL 0.071 0.005
W 0.082 0.007
WB 0.085 0.007
WM 0.082 0.007

5. Discussion

In this paper, we have introduced a geospatial framework for the geostatistical analysis of school malaria surveys data with incomplete spatial information on the residential addresses where disease exposure occurs. The solution that we have introduced in the paper can be summarized in three main steps. The first step requires the formulation of a suitable statistical model for the unobserved residence locations of the children attending school. This was achieved by generating school catchment areas based on factors that affect travel to primary schools and sampling possible residential locations within the catchments. In the second step, we used the proposed statistical model for the unobserved residence locations to generate samples of residence locations to average the likelihood function and carry out parameter estimation. The third step consisted of carrying out prediction for disease prevalence by plugging-in the parameter estimates obtained in the second step and generating predictive samples of prevalence at selected prediction locations. This framework provides a statistically rigorous approach to propagate the uncertainty arising from the missingness of residence locations, which in standard analyses of school malaria data is reduced to the location of the school (Macharia et al., 2018; Runge et al., 2020; Stensgaard et al., 2011).

One of the main benefits of the proposed modelling framework is it provides a solution to several other statistical problems that have been covered in this paper: combining data from multiple surveys, some of which may have missing information on the residence of sampled individuals; reducing the bias in the estimation of regression relationships between prevalence and disease risk factors, induced by the aggregation of spatial information to a single location. However, as suggested by the results of the applications, the extent to which those issue can be successfully tackled is largely dependent on how well the school-catchment-area models allow for reliable inferences on the residence locations.

In our first case study, whilst areal-level summaries, such as the total population falling in areas with an exceedance probability over 90% for a 30% prevalence thresholds, were moderately similar (ranging between 23% and 14%) across the models considered, we found substantial localized differences in both the predicted prevalence (Fig. 5) and the exceedance probabilities (Fig. 6) between models that accounted for location-uncertainty (models W, WB and WM) and those that did not, based on school locations (models SL and SLCA. However small these areas may be, this aspect is especially important to consider when targeting and prioritizing specific areas with suitable malaria control activities. Among the models that accounted for location-uncertainty, the differences were less strong, suggesting that the assumptions made by different geographical access models may not strongly affect the inferences on disease prevalence.

In the second application, we assessed how similar are the inference between geostatistical models that incorporate school-catchment area models and geostatistical models that use the actual residence locations. The results suggested that inaccurate school catchment areas can have a material impact on geostatistical inferences and a simpler model that only uses the school locations can deliver better predictive performances. Future research should thus focus on improving the catchment area models which provide a way of developing more realistic geostatistical models than those simply use school locations. For example, in the first application, if more accurate information were available, such as the ward (the smallest administrative unit in Kenya) or enumeration area in which the students resided, the boundaries of these administrative areas could be used as an alternative to the modelled school catchment areas. However, the improvement accrued by exploiting this information will vary according to the size of those administrative areas relative to the school catchment area.

In both applications, among the three models considered for the school catchment area, the one assuming “walking only” (i.e. Model W) as a mode of travel to school is the one more strongly supported by previous studies. It is in fact more plausible that majority of the children walk to reach their school, since this mobility pattern has been observed in relation to other service points (e.g., healthcare providers) in Western Kenya (Salon and Gulyani, 2010; Dixit et al., 2016), as a result of low levels of ownership of motor vehicles (Macharia et al., 2021a). However, more recently, it has also been observed that the number of motorcycles, locally known also known as “boda boda”, have been increasing especially in Western Kenya (Macharia et al., 2021a), with a smaller proportion of children more likely to use the public transport or private vehicles. For these reasons, more accurate information on the mode of travel of children could help to significantly improve the methods presented in this paper, which can be applied to other infectious diseases that are monitored using school data, as in the case of helminth and schistosomiasis infections (Hodges et al., 2011; Tchuem Tchuenté et al., 2012; Soares Magalhães et al., 2011; Gitonga et al., 2010; Brooker et al., 2009; Ashton et al., 2015; Mathanga et al., 2015).

The framework developed is not only applicable to school survey data but also to other data-scenarios where there is missing information on the location where most of the exposure to the disease is likely to occur. For example, in some households surveys and routinely collected data, due to confidentiality reasons, the residence location cannot be made available for analysis. An example of this is given by the Demographic and Health Survey (DHS). DHS are nationally representative household surveys that have been conducted in more than 85 countries since 1984 to collect demographic and health data (Corsi et al., 2012). However, to reduce disclosure risk in DHS, a cluster is assigned the coordinates of the centre of the sampled enumeration area and further randomly displaced. The framework developed can be adapted to partially account for this geomasking. This is vital because in low resource settings, such household surveys are the only source of development indicators. An important aspect of the proposed framework is that it can also be used to combine data from multiple sources with varying accuracy for the information of the residence of the sampled individuals.

As shown in this study, catchment areas models are an integral component of disease mapping and are essential in order to yield reliable inferences and summaries of uncertainty for the health outcome under investigation (Macharia et al., 2021b). While the three types of catchment areas we generated did not yield substantially different results, further research is required to improve these models. Here, each of the models assumed non overlapping catchments but, in reality, students from the same location can attend different schools and thus create overlapping catchments. To accommodate this, the attractiveness factor should not only be a function of distance or travel time, as in our case, but should account for school capacity, classroom size, number of teachers, perceived teaching quality, and previous examinations results. In our analysis, every student was assumed to attend their nearest school. This could be further improved if school attendance data were available, which would allow us to empirically identify a threshold for the distance or travel time below which students do in fact go to the nearest school. Such an approach has been implemented in the construction of health facility catchment areas by using health seeking behaviour information from DHS (Alegana et al., 2012).

In addition to the limitations outlined in the previous section, there are other limitations in the illustrated applications. In our analysis, we excluded boarding schools, which may lead to the underestimation of the travel time and thus overestimate the size of SCAs. However, there were only nine schools that were purely boarding out of 2170 in Western Kenya in 2009. The access models were parametrized using speeds and modes of transport from other studies (Samimi and Ermagun, 2013; Macharia et al., 2017; Mehdizadeh et al., 2017; Alegana et al., 2021) since we did not have data for Western Kenya. Future studies should consider collecting data on mode of transport, speeds and the utilized school in this region for improved modelling of accessing metrics and SCAs. The chosen travel route to school is a complex process influenced by socio-economic factors and facilitators of movement such as roads and obstacles. We assumed that students do not bypass their nearest school, but a small proportion likely bypass their nearest school due to poverty, affordability, or parents’ past education experience among other factors.

Supplementary Material

Supplementary material 1
Supplementary material 2

Acknowledgements

The authors wish to thank members of the KEMRI — Eastern and Southern Africa Centre of International Parasite Control (ESACIPAC), Drs Charles Mwandawiro, Jimmy Kihara and Sammy Njenga for their assistance during the 2009 school surveys and Gillian H. Stresman for assistance with data used in the second application.

Funding

This work was supported by Newton International Fellowship (number NIF/R1/201418) of the Royal Society, UK awarded to PMM and EG. EG was also supported by a Springboard grant (number SBF004/1009) of the Academy of Medical Sciences, UK. RWS is supported as a Wellcome Trust Principal Fellow (number 212176). CWG was supported by a Commonwealth PhD scholarship from the UK’s Department for International Development. Field work for the 2009 school survey was provided by the Division of Malaria Control, Ministry of Public Health and Sanitation through a grant from the Department for International Development (DfID), UK through the WHO Kenya Country Office and the Wellcome Trust UK as part of fellowship support to AM Noor (number 081829) and RW Snow (number 079080). PMM, CWG and RWS are grateful to the support of the Wellcome Trust to the Kenya Major Overseas Programme (number 203077).

Footnotes

CRediT authorship contribution statement

Peter M. Macharia: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Review and editing. Nicolas Ray: Software, Methodology, Writing – review & editing. Caroline W. Gitonga: Data curation, Funding acquisition, Writing – review & editing. Robert W. Snow: Conceptualization, Data curation, Funding acquisition, Investigation, Writing – review & editing. Emanuele Giorgi: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Review and editing.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethics approval and consent to participate

Study protocol for application one (Gitonga et al., 2010) received ethical approval from the Kenya Medical Research Institute and National Ethics Review Committee (numbers 1407 and 1596). Additional approval was provided by the Permanent Secretary’s office of the Ministry of Education (MoE) and the Division of Malaria Control, Ministry of Public Health and Sanitation. All national, provincial and district-level health and education authorities were briefed about the survey purpose and selected schools. Official letters of support were prepared by Provincial MoE officers. Study protocol for application two (Stevenson et al., 2013) was approved by the ethical committees of the London School of Hygiene & Tropical Medicine and the Kenya Medical Research Institute and was part of a larger government-lead, national school survey programme (Gitonga et al., 2010).

Availability of data and materials

Data that support the findings of this study are available at Population Health Dataverse, http://dx.doi.org/10.7910/DVN/UQLTO5, generated gridded surfaces are located at https://doi.org/10.6084/m9.figshare.19804207.v1. The codes used in this analysis are available at https://github.com/giorgilancs/mbgmissinglocations.

References

  1. Alegana VA, Okiro EA, Snow RW. Routine data for malaria morbidity estimation in africa: challenges and prospects. BMC Med. 2020;18:121. doi: 10.1186/s12916-020-01593-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Alegana V, Pezzulo C, Tatem A, Omar B, Christensen A. Mapping out-of-school adolescents and youths in low-and middle-income countries. Humanit Soc Sci Commun. 2021;8:1–10. [Google Scholar]
  3. Alegana VA, Wright J, Petrina U, Noor AM, Snow RW, Atkinson PM. Spatial modelling of healthcare utilisation for treatment of fever in Namibia. Int J Health Geogr. 2012;11:6. doi: 10.1186/1476-072X-11-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Ashton RA, Kefyalew T, Rand A, Sime H, Assefa A, Mekasha A, Edosa W, Tesfaye G, Cano J, Teka H, Reithinger R, et al. Geostatistical modeling of malaria endemicity using serological indicators of exposure collected through school surveys. Am J Trop Med Hyg. 2015;93:168–177. doi: 10.4269/ajtmh.14-0620. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Ashton RA, Kefyalew T, Tesfaye G, Pullan RL, Yadeta D, Reithinger R, Kolaczinski JH, Brooker S. School-based surveys of malaria in Oromia Regional State, Ethiopia: A rapid survey method for malaria in low transmission settings. Malar J. 2011;10:1–13. doi: 10.1186/1475-2875-10-25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bayoh MN, Lindsay SW. Temperature-related duration of aquatic stages of the Afrotropical malaria vector mosquito Anopheles gambiae in the laboratory. Med Vet Entomol. 2004;18:174–179. doi: 10.1111/j.0269-283X.2004.00495.x. [DOI] [PubMed] [Google Scholar]
  7. Biggeri A, Catelan D. Disease mapping. 2012:1979. doi: 10.1002/9781119940012.ch9. [DOI] [Google Scholar]
  8. Brooker S, Kolaczinski JH, Gitonga CW, Noor AM, Snow RW. The use of schools for malaria surveillance and programme evaluation in Africa. Malar J. 2009;8:1–9. doi: 10.1186/1475-2875-8-231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Busetto L, Ranghetti L. MODIStsp: AN r package for automatic preprocessing of MODIS land products time series. Comput Geosci. 2016;97:40–48. doi: 10.1016/j.cageo.2016.08.020. [DOI] [Google Scholar]
  10. Cameron E, Young AJ, Twohig KA, Pothin E, Bhavnani D, Dismer A, Merilien JB, Hamre K, Meyer P, Le Menach A, Cohen JM, et al. Mapping the endemicity and seasonality of clinical malaria for intervention targeting in haiti using routine case data. ELife. 2021;10:e62122. doi: 10.7554/eLife.62122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Clements AC, Lwambo NJ, Blair L, Nyandindi U, Kaatano G, Kinung’hi S, Webster JP, Fenwick A, Brooker S. BayesIan spatial analysis and disease mapping: tools to enhance planning and implementation of a schistosomiasis control programme in tanzania. Trop Med Int Health. 2006;11:490–503. doi: 10.1111/j.1365-3156.2006.01594.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Corsi DJ, Neuman M, Finlay JE, Subramanian S. Demographic and health surveys: a profile. Int J Epidemiol. 2012;41:1602–1613. doi: 10.1093/ije/dys184. [DOI] [PubMed] [Google Scholar]
  13. Diggle PJ, Moraga P, Rowlingson B, Taylor BM. Spatial and spatio-temporal Log-Gaussian cox processes: Extending the geostatistical paradigm. Statist Sci. 2013;28:542–563. doi: 10.1214/13-STS441. [DOI] [Google Scholar]
  14. Diggle PJ, Tawn J, Moyeed R. Model-based geostatistics. Appl StatisticsStatistics. 1998;47:299–350.:1011.1669v3 doi: 10.1007/978-0-387-98135-2. arXiv:arXiv. [DOI] [Google Scholar]
  15. Dixit A, Lee MC, Goettsch B, Afrane Y, Githeko AK, Yan G. Discovering the cost of care: consumer, provider, and retailer surveys shed light on the determinants of malaria health-seeking behaviours. Malar J. 2016;15:179. doi: 10.1186/s12936-016-1232-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Drake TL, Okello G, Njagi K, Halliday KE, Jukes MC, Mangham L, Brooker S. Cost analysis of school-based intermittent screening and treatment of malaria in Kenya. Malar J. 2011;10:1–11. doi: 10.1186/1475-2875-10-273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Dutta HM, Dutt AK. Malarial ecology: a global perspective. Soc Sci Med. 1978;12:69–84. [PubMed] [Google Scholar]
  18. Fornace KM, Fronterrè C, Fleming FM, Simpson H, Zoure H, Rebollo M, Mwinzi P, Vounatsou P, Pullan RL. Evaluating survey designs for targeting preventive chemotherapy against schistosoma haematobium and schistosoma mansoni across sub-saharan africa: a geostatistical analysis and modelling study. Parasites Vectors. 2020;13:1–13. doi: 10.1186/s13071-020-04413-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Funk C, Peterson P, Landsfeld M, Pedreros D, Verdin J, Shukla S, Husak G, Rowland J, Harrison L, Hoell A, Michaelsen J. The climate hazards infrared precipitation with stations - A new environmental record for monitoring extremes. Sci Data. 2015;2:1–21. doi: 10.1038/sdata.2015.66. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Gitonga CW, Karanja PN, Kihara J, Mwanje M, Juma E, Snow RW, Noor AM, Brooker S. Implementing school malaria surveys in Kenya: Towards a national surveillance system. Malar J. 2010;9:1–13. doi: 10.1186/1475-2875-9-306. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Guagliardo MF. Spatial accessibility of primary care: concepts, methods and challenges. Int J Health Geogr. 2004;3(0):3. doi: 10.1186/1476-072X-3-3. URL: http://www.ij-healthgeographics.com/content/3/1/3 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Hodges M, Dada N, Wamsley A, Paye J, Nyorkor E, Sonnie M, Barnish G, Bockarie M, Zhang Y. Improved mapping strategy to better inform policy on the control of schistosomiasis and soil-transmitted helminthiasis in sierra leone. Parasites Vectors. 2011;4:1–7. doi: 10.1186/1756-3305-4-97. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Joseph NK, Macharia PM, Ouma PO, Mumo J, Jalang’o R, Wagacha PW, Achieng VO, Ndung’u E, Okoth P, Muñiz Y, Panciera R, et al. Spatial access inequities and childhood immunisation uptake in Kenya. BMC Public Health. 2020;20:1407. doi: 10.1186/s12889-020-09486-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Kabaria CW, Gilbert M, Noor AM, Snow RW, Linard C. The impact of urbanization and population density on childhood Plasmodium falciparum parasite prevalence rates in Africa. Malar J. 2017;16:1–10. doi: 10.1186/s12936-017-1694-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Knowles SC, Sturrock HJ, Turner H, Whitton JM, Gower CM, Jemu S, Phillips AE, Meite A, Thomas B, Kollie K, et al. Optimising cluster survey design for planning schistosomiasis preventive chemotherapy. PLoS Negl Trop Dis. 2017;11:e0005599. doi: 10.1371/journal.pntd.0005599. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Krainski E, Gómez Rubio V, Bakka H, Lenzi A, Castro-Camilo D, Simpson D, Lindgren F, Rue H. Advanced spatial modeling with stochastic partial differential equations using R and INLA. 2018. [DOI]
  27. Lindgren F, Rue H, Lindström J. An explicit link between gaussian fields and gaussian markov random fields: the stochastic partial differential equation approach. J R Stat Soc Ser B Stat Methodol. 2011;73:423–498. [Google Scholar]
  28. Macharia PM, Giorgi E, Noor AM, Waqo E, Kiptui R, Okiro EA, Snow RW. Spatio-temporal analysis of Plasmodium falciparum prevalence to understand the past and chart the future of malaria control in Kenya. Malar J. 2018;17:340. doi: 10.1186/s12936-018-2489-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Macharia PM, Mumo E, Okiro EA. Modelling geographical accessibility to urban centres in Kenya in 2019. PLoS One. 2021a;16:e0251624. doi: 10.1371/journal.pone.0251624. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Macharia PM, Odera PA, Snow RW, Noor AM. Spatial models for the rational allocation of routinely distributed bed nets to public health facilities in Western Kenya. Malar J. 2017;16:367. doi: 10.1186/s12936-017-2009-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Macharia PM, Ray N, Giorgi E, Okiro EA, Snow RW. Defining service catchment areas in low-resource settings. BMJ Glob Health. 2021b;6:e006381. doi: 10.1136/bmjgh-2021-006381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Mathanga DP, Halliday KE, Jawati M, Verney A, Bauleni A, Sande J, Ali D, Jones R, Witek-McManus S, Roschnik N, Brooker SJ. The high burden of malaria in primary school children in Southern Malawi. Am J Trop Med Hyg. 2015;93:779–789. doi: 10.4269/ajtmh.14-0618. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Maxwell CA, Wakibara J, Tho S, Curtis CF. Malaria-infective biting at different hours of the night. Med Vet Entomol. 1998;12:325–327. doi: 10.1046/j.1365-2915.1998.00108.x. [DOI] [PubMed] [Google Scholar]
  34. Mehdizadeh M, Mamdoohi A, Nordfjaern T. Walking time to school, children’s active school travel and their related factors. J Transp Health. 2017;6:313–326. [Google Scholar]
  35. Mennis J. Dasymetric mapping for estimating population in small areas. Geogr Compass. 2009;3:727–745. doi: 10.1111/j.1749-8198.2009.00220.x. [DOI] [Google Scholar]
  36. Molineaux L. In: Malaria: Principles and Practice of Malariology. Wernsdorfer WHMI, editor. Vol. 2. Churchill Livingstone; Edinburgh: 1988. The epidemiology of human malaria as an explanation of its distribution including some implications for its control; pp. 913–998. [Google Scholar]
  37. Mulaku G, Nyadimo E. Gis in education planning: The kenyan school mapping project. Surv Rev. 2011;43:567–578. [Google Scholar]
  38. Nelli L, Ferguson HM, Matthiopoulos J. Achieving explanatory depth and spatial breadth in infectious disease modelling: Integrating active and passive case surveillance. Stat Methods Med Res. 2020;29:1273–1287. doi: 10.1177/0962280219856380. arXiv. [DOI] [PubMed] [Google Scholar]
  39. Noor AM, Kinyoki DK, Mundia CW, Kabaria CW, Mutua JW, Alegana VA, Fall IS, Snow RW. The changing risk of plasmodium falciparum malaria infection in Africa: 2000-10: a spatial and temporal analysis of transmission intensity. Lancet. 2014;383:1739–1747. doi: 10.1016/s0140-6736(13)62566-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Ouma P, Macharia PM, Okiro E, Alegana V. In: Practicing Health Geography: The African Context. Makanga PT, editor. Springer International Publishing; Cham: 2021. Methods of measuring spatial accessibility to health care in uganda; pp. 77–90. [Google Scholar]
  41. Pop B, Fetica B, Blaga ML, Trifa AP, Achimas-Cadariu P, Vlad CI, Achimas-Cadariu A. The role of medical registries, potential applications and limitations. Med Pharm Rep. 2019;92:7. doi: 10.15386/cjmed-1015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Ray N, Ebener S. AccessMod 3.0: COmputing geographic coverage and accessibility to health care services using anisotropic movemen of patients. Int J Health Geogr. 2008;7:63. doi: 10.1186/1476-072X-7-63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Rodriguez-Segura D, Kim BH. The last mile in school access: Mapping education deserts in developing countries. Dev Eng. 2021;6:100064. doi: 10.1016/j.deveng.2021.100064. [DOI] [Google Scholar]
  44. Runge M, Snow RW, Molteni F, Thawer S, Mohamed A, Mandike R, Giorgi E, Macharia PM, Smith TA, Lengeler C, Pothin E. Simulating the council-specific impact of anti-malaria interventions: A tool to support malaria strategic planning in Tanzania. PLoS One. 2020;15 doi: 10.1371/journal.pone.0228469. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Salon D, Gulyani S. Mobility, poverty, and gender: travel ‘choices’ of slum residents in Nairobi, Kenya. Transp Rev. 2010;30:641–657. doi: 10.1080/01441640903298998. [DOI] [Google Scholar]
  46. Samimi A, Ermagun A. Students’ tendency to walk to school: case study of tehran. J Urban Plann Dev. 2013;139:144–152. [Google Scholar]
  47. Savory DJ, Andrade-Pacheco R, Gething PW, Midekisa A, Bennett A, Sturrock HJW. Intercalibration and Gaussian process modeling of nighttime lights imagery for measuring urbanization trends in Africa 2000–2013. Remote Sens. 2017;9:713. doi: 10.3390/rs9070713. [DOI] [Google Scholar]
  48. Soares Magalhães NK, Gyapong JO, Brooker S, Zhang Y, Blair L, Fenwick A, Clements AC. Mapping helminth co-infection and co-intensity: geostatistical prediction in ghana. PLoS Negl Trop Dis. 2011;5:e1200. doi: 10.1371/journal.pntd.0001200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Stanton MC, Diggle PJ. Geostatistical analysis of binomial data: generalised linear or transformed Gaussian modelling? Environmetrics. 2013;24:158–171. doi: 10.1002/env.2205. [DOI] [Google Scholar]
  50. Stefan DC, Baadjes B, Kruger M. Incidence of childhood cancer in namibia: the need for registries in africa. Pan Afr Med J. 2014;17:191. doi: 10.11604/pamj.2014.17.191.3830. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Stensgaard AS, Vounatsou P, Onapa AW, Simonsen PE, Pedersen EM, Rahbek C, Kristensen TK. BayesIan geostatistical modelling of malaria and lymphatic filariasis infections in Uganda: predictors of risk and geographical patterns of co-endemicity. Malar J. 2011;10:298. doi: 10.1186/1475-2875-10-298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Stevens FR, Gaughan AE, Linard C, Tatem AJ. Disaggregating census data for population mapping using random forests with remotely-sensed and ancillary data. PLoS One. 2015;10:e0107042. doi: 10.1371/journal.pone.0107042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Stevenson JC, Stresman GH, Gitonga CW, Gillig J, Owaga C, Marube E, Odongo W, Okoth A, China P, Oriango R, Brooker SJ, et al. Reliability of school surveys in estimating geographic variation in malaria transmission in the Western Kenyan highlands. PLoS One. 2013;8:e77641. doi: 10.1371/journal.pone.0077641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Sturrock H, Cohen J, Keil P, Tatem AJ, Menach ALe, Ntshalintshali NE, Hsiang MS, Gosling RD. Fine-scale malaria risk mapping from routine aggregated case data. Malar J. 2014;13:421. doi: 10.1186/1475-2875-13-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Takem EN, Affara M, Amambua-Ngwa A, Okebe J, Ceesay SJ, Jawara M, Oriero E, Nwakanma D, Pinder M, Clifford C, Taal M, et al. Detecting Foci of malaria transmission with school surveys: A pilot study in the Gambia. PLoS One. 2013;8:e67108. doi: 10.1371/journal.pone.0067108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Tchuem Tchuenté LA, Kamwa Ngassam RI, Sumo L, Ngassam P, Dongmo Noumedem C, Nzu DDL, Dankoni E, Kenfack CM, Gipwe NF, Akame J, et al. Mapping of schistosomiasis and soil-transmitted helminthiasis in the regions of centre, east and west cameroon. PLoS Negl Trop Dis. 2012;6:e1553. doi: 10.1371/journal.pntd.0001553. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Tobler W. National Center for Geographic Information and Analysis. University of California; Santa Barbara: 1993. Three Presentations on Geographical Analysis and Modeling:Non-Isotropic Geographic Modeling; Speculations on the Geometry of Geography; Global Spatial Analysis. Technical Report 93-1. URL: http://www.geodyssey.com/papers/tobler93.html . [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material 1
Supplementary material 2

Data Availability Statement

Data that support the findings of this study are available at Population Health Dataverse, http://dx.doi.org/10.7910/DVN/UQLTO5, generated gridded surfaces are located at https://doi.org/10.6084/m9.figshare.19804207.v1. The codes used in this analysis are available at https://github.com/giorgilancs/mbgmissinglocations.

RESOURCES