Bayesian Random Effect Modeling for analyzing spatial clustering of differential time trends of diarrhea incidences

Frank Badu Osei; Alfred Stein

doi:10.1038/s41598-019-49549-4

. 2019 Sep 13;9:13217. doi: 10.1038/s41598-019-49549-4

Bayesian Random Effect Modeling for analyzing spatial clustering of differential time trends of diarrhea incidences

Frank Badu Osei ^1,^✉, Alfred Stein ¹

PMCID: PMC6744449 PMID: 31519962

Abstract

In 2012, nearly 644,000 people died from diarrhea in sub-Saharan Africa. This is a significant obstacle towards the achievement of the Sustainable Development Goal 3 of ensuring a healthy life and promoting the wellbeing at all ages. To enhance evidence-based site-specific intervention and mitigation strategies, especially in resource-poor countries, we focused on developing differential time trend models for diarrhea. We modeled the logarithm of the unknown risk for each district as a linear function of time with spatially varying effects. We induced correlation between the random intercepts and slopes either by linear functions or bivariate conditional autoregressive (BiCAR) priors. In comparison, models which included correlation between the varying intercepts and slopes outperformed those without. The convolution model with the BiCAR correlation prior was more competitive than the others. The inclusion of correlation between the intercepts and slopes provided an epidemiological value regarding the response of diarrhea infection dynamics to environmental factors in the past and present. We found diarrhea risk to increase by 23% yearly, a rate far exceeding Ghana’s population growth rate of 2.3%. The varying time trends widely varied and clustered, with the majority of districts with at least 80% chance of their rates exceeding the previous years. These findings can be useful for active site-specific evidence-based planning and interventions for diarrhea.

Subject terms: Environmental social sciences, Infectious diseases, Statistics

Introduction

Diarrhea remains a public health menace, and a global obstacle to attain the Sustainable Development Goal 3 (SDG-3) of ensuring healthy lives and promote well-being for all at all ages. Globally, over 1.7 billion episodes of diarrhea are recorded every year with the majority of these occurring in low and middle-income countries^1–5. The etiological agents, rotavirus, enteropathogenic E. coli, enterotoxigenic E. coli, calicivirus, and Shigella^6,7, are primarily mediated by environmental, climatic and sociodemographic factors⁸. Children under five years are the most vulnerable⁹. In sub-Saharan Africa, about 644,000 people died from diarrhea in 2012, accounting for 6.7% of deaths. From the 1980s to 2008, diarrhea mortality declined from an estimated 4.5 million to 1.3 million in 2008 with the advent of oral rehydration salts, improved sanitation and access to clean water¹. Also, the incidences may be declining slightly⁴, yet the statistics are still sobering and unacceptable. Provision of treated water and proper sanitary conditions remains the formidable approach to reducing diarrhea. Budgetary constraints of resource-poor countries make this almost unfeasible. If, however, the spatial patterns of the temporal dynamics are well understood, then limited resources could be channeled to areas experiencing elevated growth trends.

The study of the space-time variation of diarrhea could give critical etiological clues and help to improve resources allocation and planning of interventions. Several studies have addressed issues of spatial and temporal variation of diarrhea with differences in environmental and socioeconomic risk factors, as well as detection of areas with exceptionally high risk^10–14. Most of these studies, however, either focus on the spatial patterns at a particular point in time or the temporal patterns for an entire geographic area. The reason may be attributable to data challenges and/or unavailable easy to implement statistical methods. Two implicit assumptions are applied under such studies; either the temporal variation of the spatial patterns is assumed to be flat or the spatial variation of the temporal pattern is assumed to be flat. The current advent of information technology has provided opportunities to manage and store geographically and temporally related disease data, at least aggregated over large geographic and temporal windows. Bayesian hierarchical modeling framework offers the advantage to reliably estimate disease parameters through random effects modeling that can be extended to accommodate variation in space, variation in time, and variation in space-time.

In addition to cluster detection methods like the popular space-time scan statistics^15,16, Bayesian model-based approaches have also found many applications in epidemiology for estimating space-time disease variation due to their flexibility in specifying a variety of spatial, temporal, and space-time interaction structures. Bernardinelli et al.¹⁷ introduced a parametric space-time mapping method to evaluate differential time trends for mapping disease and mortality rates. They assumed a log-linear relationship between the rates and the calendar time within areas and that the time trends vary from area to area. Knorr-Held¹⁸ extended separable space-time models to include nonparametric space-time interaction effects. The space-time interaction effects are commonly specified either of the four ways: unstructured temporal and unstructured spatial effects (Type I), structured temporal and unstructured spatial effects (Type II), unstructured temporal and structured spatial effects (Type III), or structured temporal and structured spatial effects (Type IV). The common specification for the structured temporal trends for either of these specifications has been the random walk prior. For infectious diseases, areas with similar time trends are likely to form local clusters. To this end, the focus of both Knorr-Held¹⁸ and Bernardinelli et al.¹⁷, and their various applications have been on providing space-time variation estimates, a divergence from our focus of evaluating the spatial clustering of the differential time trends.

Our aim is to study the spatial clustering of the differential time trends of small area diarrhea occurrences, with the aim of detecting areas of exceptionally high time trends. If there is evidence of spatial clustering of higher than expected temporal trends, specific intervention programs could be developed to target such areas. With our data of few time stamps, a parametric time trend model is decisive. A critical challenge for Bernardinelli et al.¹⁷ was choosing between heterogeneity and clustering, and the way to account for correlation between the varying intercepts and slopes. To address these, we extend their methods to a convolution prior and explore different latent processes to account for correlation between the varying intercepts and slopes. Our data are primarily aggregated counts of yearly diarrhea occurrences for 170 administrative districts for five years. We hypothesize that each district has a diarrhea time trend with spatial random effects, either structured, unstructured or both, that is different from the overall time trend. To achieve this, we used hierarchical Bayesian random effect methods to model the random intercepts and spatially varying time trends jointly. The models we present are extensions of what is presented in¹⁷. We propose and compare different approaches to account correlations between the spatially varying time trends and intercepts. We develop the models relying on district level diarrhea morbidities in Ghana, where there is limited knowledge of the spatiotemporal trends. Also, despite the declining global trends of diarrhea, morbidities in Ghana continue to increase and remain amongst the top 5 out-patient morbidities. Reported incidences increased from 726,000 cases in 2010 to 1,577,000 cases in 2014. In what follows, we describe the study area and the statistical modeling. Next, we present the results and discuss their implication, and end with some conclusions.

Methods

Study area and data

Directly or indirectly, Ghana continues to undertake development projects to increase access to good water and sanitation; some improvements have been achieved over the last few years¹⁹. However, diarrhea remains amongst the top 5 out-patient morbidities. From 2010 to 2014, incidences increased from 726,000 to 1,577,000 cases. Accordingly, there have been considerable research interests in understanding the etiology, trends, and the characteristics of affected individuals in Ghana^20–27. Studies of the spatio-temporal trends which remain significant to enhance the optimal usability of scarce resources, however, are scarce. The data used in this study consist of district-level diarrhea morbidities for 170 districts for five years. We obtained the data from the Centre for Health Information and Management (CHIM) of the Ghana Health Services (GHS). We obtained population estimates for the years 2010 to 2014 from the Ghana Statistical Service. The geographical scale of analysis was the 170 administrative districts where data had been recorded.

Statistical modeling

Bayesian hierarchical modeling easily incorporates correlation processes by including an intermediate layer (process layer) between the data likelihood (data layer) and the prior distributions (prior layer). For the data layer, we consider the spatio-temporal diarrhea counts y_it, i = 1, …, m = 170 districts and t = 2010, …, 2014 years as independent random samples from the Poisson distribution, y_it|ς_it ~ Poisson(n_itς_it), where the ς_it is the risk and n_it is the population.

For the process layer, Bernardinelli et al.¹⁷ proposed a monotonically and differentiable log link function to match the risk ς_it and the systematic component, $\log ς_{i t} ~ N (η_{i t}, σ_{ς}^{2})$ , where

η_{i t} = β_{0} + ({\bar{β}}_{1} + ϑ_{2 i}) t + ϑ_{1 i}

Here, the parameter β₀ denotes the overall intercept (risk) on the log scale, and ${\bar{β}}_{1}$ a is fixed effect parameter for the overall time trend in diarrhea growth, while the parameter ϑ_2i is the district-specific spatially structured differential time trends. Inferentially, $β_{1 i} = {\bar{β}}_{1} + ϑ_{2 i}$ specifies the district-specific rate of diarrhea growth. Specifically, ϑ_1i and ϑ_2i are varying intercepts and slopes, respectively. The varying intercepts ϑ_1i account for unobserved ecological factors which might give rise to either spatially structured (clustering) or unstructured (heterogeneity) extra-Poisson variation.

For a clustering model, the common specification for ϑ_1i or ϑ_2i is as a univariate intrinsic conditional autoregressive (iCAR) process, which depends upon an m × m spatial proximity matrix w_ij with unknown variance $σ_{ϑ}^{2}$ . The iCAR specification implies that the distribution of ϑ_ki = {ϑ_1i, ϑ_2i} conditional on the set ϑ_k,−i = ϑ_k,j≠i = {ϑ_k1, …, ϑ_k,i−1,ϑ_k,i+1, …, ϑ_km} are weighted averages of function evaluations of J neighboring districts; thus $ϑ_{k i} | ϑ_{k, - i} ~ N ({\bar{ϑ}}_{k i}, σ_{k ϑ}^{2} / \sum_{j \neq i} w_{i j})$ , k = 1, 2. The mean ${\bar{ϑ}}_{k i} = \sum_{j} w_{i j} ϑ_{k j} / \sum_{j \neq i} w_{i j}$ , ∀j ∈ w_ij, where w_ij, j = 1, …, J denotes the set of J neighbors of districts i. The weights w_ij are fixed constants that measure the proximity of districts iand j. Let the set of boundary points on district i be denoted by (i). Then we define w_ij as a binary connectivity weight matrix such that w_ij = 1 if $(i) \cap (j) \neq \emptyset$ , and w_ij = 0 otherwise. For brevity, we write the iCAR prior as $ϑ_{k i} ~ i C A R ({\bar{ϑ}}_{k i}, σ_{ϑ_{k}}^{2})$ . Since iCAR is translational invariant, the constraints $\sum_{i} ϑ_{k i} = 0$ is required for identifiability of the mean.

Choosing between heterogeneity and clustering of the time trends and/or the intercepts is critical. The choice depends upon the prior belief about the scale of spatial dependency. A spatial scale of dependency larger (smaller) than the size of spatial units leads to clustering (heterogeneity). For infectious diseases, both the smaller and larger scale of spatial dependency is plausible. To avoid choosing between these two, we propose to use a convolution model which includes both clustering and heterogeneous random intercepts and slopes:

η_{i t} = β_{0} + ({\bar{β}}_{1} + ϑ_{2 i} + υ_{2 i}) t + ϑ_{1 i} + υ_{1 i}

where ϑ_ki = {ϑ_1i, ϑ_2i} are modeled as iCAR processes $ϑ_{k i} ~ i C A R ({\bar{ϑ}}_{k i}, σ_{ϑ_{k}}^{2})$ with unknown prior variance $σ_{ϑ_{k}}^{2}$ for the clustering components and υ_ki = {υ_1i, υ_2i} are assigned Gaussian processes $υ_{k i} ~ N (0, σ_{υ_{k}}^{2})$ with unknown prior variance $σ_{υ_{k}}^{2}$ for the heterogeneity components. Under this prior structure, the $υ_{k i} ~ N (0, σ_{υ_{k}}^{2})$ do not depend upon the structure of spatial units and are said to be exchangeable. Specifying $ϑ_{2 i} ~ N (0, σ_{ϑ_{2}}^{2})$ implies random exchangeability between the differential time growths, whereas $ϑ_{2 i} ~ i C A R ({\bar{ϑ}}_{2 i}, σ_{ϑ_{2}}^{2})$ implies Markovian interactions between the differential time growths. The expression $ϑ_{2 i} ~ i C A R ({\bar{ϑ}}_{2 i}, σ_{ϑ_{2}}^{2})$ then becomes a model for the district-specific time trends.

Correlation between random intercepts and slopes

As it stands now, the latent variables $ϑ_{k i} ~ i C A R ({\bar{ϑ}}_{k i}, σ_{ϑ_{k}}^{2})$ and $υ_{k i} ~ N (0, σ_{υ_{k}}^{2})$ are independent. For epidemiological purposes, allowing for correlation between ϑ_1i and ϑ_2i, and that of υ_1i and υ_2i could answer critical etiological questions such as how the population has reacted towards environmental factors introduced at the reference time of the study. Avoiding the correlation between the random intercepts and slopes of the time trends could cause the areas-specific time tends to be pulled towards the overall mean trend¹⁷. For models such as (1), and when heterogeneity modeling is the concern, Bernardinelli et al.¹⁷ proposed ϑ_1i as drawn by a univariate normal $υ_{1 i} ~ N (0, σ_{υ_{1}}^{2})$ , while υ_2i is also drawn from a univariate normal conditional on ϑ_1i, $υ_{2 i} | υ_{1 i} ~ N (γ_{υ} υ_{1 i}, σ_{υ_{2}}^{2})$ . Here, γ_υ is the correlation parameter between the unstructured intercepts υ_1i and slopes υ_2i. With this premise, other variants can be deduced. For instance, when clustering is of concern for both slopes and intercepts, then the latent variable ϑ_1i is modeled as drawn by the conditional autoregressive process $ϑ_{1 i} ~ i C A R ({\bar{ϑ}}_{1 i}, σ_{ϑ_{1}}^{2})$ , and ϑ_2i as drawn from a univariate normal conditional on ϑ_1i, $ϑ_{2 i} | ϑ_{1 i} ~ N (γ_{ϑ} ϑ_{1 i}, σ_{ϑ_{2}}^{2})$ . Besides, when both clustering and heterogeneity are of concern, correlation can be induced by expressing $ϑ_{1 i} ~ i C A R ({\bar{ϑ}}_{1 i}, σ_{ϑ_{1}}^{2})$ , $υ_{1 i} ~ N (0, σ_{υ_{1}}^{2})$ , and $(ϑ_{2 i} + υ_{2 i}) | ϑ_{1 i}, υ_{1 i} ~ N (γ_{υ ϑ} {ϑ_{1 i} + υ_{1 i}}, σ_{ϑ_{2} + υ_{2}}^{2})$ , where $σ_{ϑ_{2} + υ_{2}}^{2}$ is the total variance of ϑ_1i + υ_1i. An additional possibility is to account for correlations separately between the structured and unstructured intercepts and slopes by modeling the latent variables $ϑ_{1 i} ~ i C A R ({\bar{ϑ}}_{1 i}, σ_{ϑ_{1}}^{2})$ , $υ_{1 i} ~ N (0, σ_{υ_{1}}^{2})$ , $ϑ_{2 i} | ϑ_{1 i} ~ N (γ_{ϑ} ϑ_{1 i}, σ_{ϑ_{2}}^{2})$ , $υ_{2 i} | υ_{1 i} ~ N (γ_{υ} υ_{1 i}, σ_{υ_{2}}^{2})$ . For the correlation parameters γ_υ,γ_ϑ, and γ_υϑ, samples may be drawn from the univariate normal distributions $γ_{υ} ~ N (0, σ_{γ_{υ}}^{2})$ , $γ_{ϑ} ~ N (0, σ_{γ_{ϑ}}^{2})$ , and $γ_{υ ϑ} ~ N (0, σ_{γ_{υ ϑ}}^{2})$ , respectively.

We propose to specify the bivariate iCAR (BiCAR) for the clustering components ϑ_ki = {ϑ_1i, ϑ_2i}. The univariate iCAR extends naturally to the BiCAR specification by replacing the univariate normal condi-tional distribution with a bivariate conditional distribution $ϑ_{k i} | (ϑ_{1, - i}, ϑ_{2, - i}) ~ N ({\bar{ϑ}}_{k i}, Σ_{ϑ} / \sum_{j \neq i} w_{i j})$ where ${\bar{ϑ}}_{k i} = {(\sum_{j} w_{i j} ϑ_{1 j} / \sum_{j \neq i} w_{i j}, \sum_{j} w_{i j} ϑ_{2 j} / \sum_{j \neq i} w_{i j})}^{T}$ is the mean vector and Σ_ϑ is a 2 × 2 covariance matrix. The covariance Σ_ϑ has diagonal elements $Σ_{ϑ} [1, 1] = σ_{ϑ_{1}}^{2}$ and $Σ_{ϑ} [2, 2] = σ_{ϑ_{2}}^{2}$ representing the conditional variances for the structured ϑ_2i slopes and intercept ϑ_1i, respectively. The off-diagonal elements $Σ_{ϑ} [1, 2] = Σ_{ϑ} [2, 1] = γ_{ϑ} σ_{ϑ_{1}}^{2} σ_{ϑ_{2}}^{2}$ captures the within-area correlation through the correlation parameter γ_ϑ. For brevity, we write $ϑ_{k i} ~ B i C A R ({\bar{ϑ}}_{k i}, Σ_{ϑ})$ . For the heterogeneity components υ_ki = {υ_1i, υ_2i}, we propose to use a zero-mean bivariate normal distribution υ_ki ~ N₂(0, Σ_υ) where Σ_υ is a 2 × 2 covariance matrix. The specification for Σ_υ follows as Σ_ϑ described above where $Σ_{υ} [1, 1] = σ_{υ_{1}}^{2}$ , $Σ_{υ} [2, 2] = σ_{υ_{2}}^{2}$ , and $Σ_{υ} [1, 2] = Σ_{υ} [2, 1] = γ_{υ} σ_{υ_{1}}^{2} σ_{υ_{2}}^{2}$ . Here, γ_υ is the within-area correlation between the unstructured intercepts and slopes.

For the third layer, the prior layer, we assign prior distributions to all variance parameters and fixed effects. A non-informative flat distribution for the intercept, p(β₀) ∝ 1, is appropriate to ensure that the data drive inference. Non-informative priors result in posterior inference similar to maximum likelihood inference²⁸. For the fixed effects, vague normal priors are specified ${\bar{β}}_{1}$ , ${\bar{β}}_{1} ~ N (0, 10^{5})$ . This is the equivalent to a non-informative Gaussian prior. To the variance parameters $σ^{2} = {σ_{ϑ_{k}}^{2}, σ_{υ_{k}}^{2}, σ_{ς}^{2}}$ , we assigned proper vague gamma priors σ⁻² ~ G(0.5, 0.05) to ensure conjugacy and computational convenience²⁹. We assigned a Wishart prior to the two precision matrices $Σ_{ϑ}^{- 1}$ and $Σ_{υ}^{- 1}$ , as it is a conjugate prior for the inverse of the covariance parameters of a multivariate normal distribution^30,31. More precisely, we assigned $Σ_{ϑ}^{- 1} ~ Wishart (Ω, d f)$ , $Σ_{υ}^{- 1} ~ Wishart (Ω, d f)$ with scale matrix Ω and degrees of freedom df = 2 for a weakly informative distribution. We set Ω as a scaled identity matrix with diagonal entries Ω[1, 1] = Ω[2, 2] = 1 and off-diagonal entries Ω[1, 2] = Ω[2, 1] = 0, a specification Moraga and Lawson³² utilized to run simulation studies of multivariate iCAR modeling.

Model fitting and estimation

We fitted eight different models with different combinations and structures of the process layer (Table 1). Models 1 to 3 include no correlation between the intercepts and slopes while Models 4 to 8 include different forms of interaction between the intercepts and slopes. Model 1 is a spatially varying unstructured time coefficients model with unstructured spatial effects. Model 2 includes an iCAR structured time-varying coefficients and structured iCAR intercepts. Model 3 is a convolution model which includes spatially varying unstructured and structured time effects, and unstructured and structured intercepts. Model 4 and Model 5 extend Model 1 and Model 2, respectively, to include correlation between the intercepts and slopes. In Model 6, we induce correlation by expressing the sum of structured and unstructured time coefficients as a linear function of the sum of structured and unstructured intercepts. In Model 7, we induce separate correlations between the structured intercepts and slopes and the unstructured intercepts and slopes. Finally, Model 8, uses a BiCAR specification to account for correlation between the structured incepted and slopes and a multivariate normal distribution to account for correlation between the unstructured intercepts and slopes. In Table 1, we present details of the models and structures of the latent parameters and their prior distributions.

Table 1.

Latent structures of the different models.

Model	Linea predictor η_it	Structured	Unstructured	correlation
1	$β_{0} + ({\bar{β}}_{1} + υ_{2 i}) t + υ_{1 i}$		$υ_{k i} ~ N (0, σ_{υ_{k}}^{2})$ $k = 1, 2$
2	$β_{0} + ({\bar{β}}_{1} + ϑ_{2 i}) t + ϑ_{1 i}$	$ϑ_{k i} ~ i C A R ({\bar{ϑ}}_{k i}, σ_{ϑ_{k}}^{2})$ k = 1, 2
3	$β_{0} + ({\bar{β}}_{1} + ϑ_{2 i} + υ_{2 i}) t + ϑ_{1 i} + υ_{1 i}$	$ϑ_{k i} ~ i C A R ({\bar{ϑ}}_{k i}, σ_{ϑ_{k}}^{2})$ k = 1, 2	$υ_{k i} ~ N (0, σ_{υ_{k}}^{2})$ k = 1, 2
4	$β_{0} + ({\bar{β}}_{1} + υ_{2 i}) t + υ_{1 i}$		$υ_{1 i} ~ N (0, σ_{ϑ_{1}}^{2})$ $υ_{2 i} \| υ_{1 i} ~ N (γ υ_{1 i}, σ_{υ_{2}}^{2})$	$γ_{υ} ~ N (0, σ_{γ_{υ}}^{2})$
5	$β_{0} + ({\bar{β}}_{1} + ϑ_{2 i}) t + ϑ_{1 i}$	$ϑ_{1 i} ~ i C A R ({\bar{ϑ}}_{1 i}, σ_{ϑ_{1}}^{2})$ $ϑ_{2 i} \| ϑ_{1 i} ~ N (γ ϑ_{1 i}, σ_{ϑ_{2}}^{2})$		$γ_{ϑ} ~ N (0, σ_{γ_{ϑ}}^{2})$
6	$β_{0} + ({\bar{β}}_{1} + ϑ_{2 i} + υ_{2 i}) t + ϑ_{1 i} + υ_{1 i}$	$ϑ_{1 i} ~ i C A R ({\bar{ϑ}}_{1 i}, σ_{ϑ_{1}}^{2})$ ϑ_2i + υ_2i\|ϑ_1i,υ_1i~ $N (γ_{ϑ υ} (ϑ_{1 i} + υ_{1 i}), σ_{ϑ_{2} + υ_{2}}^{2})$	$υ_{1 i} ~ N (0, σ_{υ_{1}}^{2})$	$γ_{ϑ υ} ~ N (0, σ_{γ_{ϑ υ}}^{2})$
7	$β_{0} + ({\bar{β}}_{1} + ϑ_{2 i} + υ_{2 i}) t + ϑ_{1 i} + υ_{1 i}$	$ϑ_{1 i} ~ i C A R ({\bar{ϑ}}_{1 i}, σ_{ϑ_{1}}^{2})$ $ϑ_{2 i} \| ϑ_{1 i} ~ N (γ_{ϑ} ϑ_{1 i}, σ_{ϑ_{2}}^{2})$	$υ_{1 i} ~ N (0, σ_{υ_{1}}^{2})$ $υ_{2 i} \| υ_{1 i} ~ N (γ_{υ} υ_{1 i}, σ_{υ_{2}}^{2})$	$γ_{ϑ} ~ N (0, σ_{γ_{ϑ}}^{2})$ $γ_{υ} ~ N (0, σ_{υ}^{2})$
8	$β_{0} + ({\bar{β}}_{1} + ϑ_{2 i} + υ_{2 i}) t + ϑ_{1 i} + υ_{1 i}$	$ϑ_{k i} ~ B i C A R ({\bar{ϑ}}_{k i}, Σ_{ϑ})$	υ_ki ~ N₂(0, Συ)	γ_ϑ, γ_υ, γ_ϑυ

Open in a new tab

We estimated the parameters of the models within the Bayesian hierarchical framework. For each model, let the vector ψ₁ be a full Gaussian latent field that represents the process layer, and ψ₂ represent the prior layer, then the joint density p(ψ₁, ψ₂) = p(ψ₁|ψ₂)p(ψ₂). Following the Bayesian paradigm, we factorize the posterior density p(ψ₁|ψ₂|y) as proportional to the product of the data layer p(y|ψ₁, ψ₂) and the joint density p(ψ₁, ψ₂), $p (ψ_{1} {| ψ}_{2} | y) \propto p (y | ψ_{1}, ψ_{2}) \cdot p (ψ_{1} | ψ_{2}) \cdot p (ψ_{2})$ . We used Markov Chains Monte Carlo (MCMC) simulations to draw samples from the full conditional density of the posterior p(ψ₁|ψ₂|y). Estimation was implemented in the WINBUGS 1.4.3 software package³³. We used 200,000 MCMC iterations and 100,000 burn-in samples, storing only every 20th sampled parameter of the Markov chains. We implemented the model using three independent chains with dispersed initial values. Convergence was assessed graphically by the autocorrelation plots of the traces. We fitted all models using the R2WinBUGS package³³ together with the R software³⁴. Point estimates of variables of interest and their associated uncertainties were obtained via the marginal posterior distributions.

Model evaluation and comparison

We evaluated the adequacy of the model fits using two different cross-validating predictive checks. First, we used the posterior predictive checks and Bayesian p-values, defined as $\Pr (y_{i t} \geq y_{i t}^{p r e d})$ , where the predicted datasets $y_{i t}^{p r e d}$ are generated from the predictive distribution of the models. Bayesian p-values close to 0.5 suggests that the generated data are compatible with the model, whereas values close to 0 and 1 are considered extreme and hence suggest a poor fit. Since the distribution of the Bayesian p-values is not symmetrical, values <0.1 and >0.9 were considered extreme values and, hence, an indication of poor fit³⁵. Additionally, we used the Chi-square goodness-of-fit statistic based on the discrepancy function, $χ_{o b s}^{2} = \sum_{i t} [{(y_{i t} - n_{i t} ς_{i t})}^{2} / n_{i t} ς_{i t}]$ ³⁶. Similarly, $χ_{p r e d}^{2}$ is calculated for the predicted datasets $y_{i t}^{p r e d}$ . The two quantities are compared using Bayesian p-values, here, defined as $\Pr (χ_{o b s}^{2} \geq χ_{p r e d}^{2})$ . Likewise, Bayesian p-values close to 0.5 suggests that the generated data are compatible with the model.

We compared the models using the deviance information criterion (DIC). The $D I C = \bar{D} + p_{D}$ is the sum of the model fit $\bar{D}$ and model complexity p_D³⁷. Negative twice the log-likelihood of the deviance informs the model fit, while the effective number of parameters informs the model complexity. It is a generalization of the Akaike’s information criterion (AIC). Like the AIC, the smaller the DIC value, the better the predictive performance of the model.

Posterior estimates

We estimated the posterior means of the parameters from the posterior samples. For the parameters ${\hat{γ}}_{ϑ}$ , ${\hat{γ}}_{v}$ , and ${\hat{γ}}_{ϑ v}$ of Model 8, empirical analogs were used. From the posterior estimates of the covariance matrices ${\hat{Σ}}_{ϑ}$ and ${\hat{Σ}}_{υ}$ , we estimated the within-area correlations between the structured slopes and intercepts. Let ${\hat{σ}}_{ϑ_{1}}^{2} = {\hat{Σ}}_{ϑ} [1, 1]$ , ${\hat{σ}}_{ϑ_{2}}^{2} = {\hat{Σ}}_{ϑ} [2, 2]$ , ${\hat{σ}}_{υ_{1}}^{2} = {\hat{Σ}}_{υ} [1, 1]$ , ${\hat{σ}}_{υ_{2}}^{2} = {\hat{Σ}}_{υ} [2, 2]$ be the empirical variances of the structured and unstructured time effects and slopes, respectively. Then ${\hat{γ}}_{ϑ} = {\hat{Σ}}_{ϑ} [1, 2] / {\hat{σ}}_{ϑ_{1}} {\hat{σ}}_{ϑ_{2}}$ is the empirical estimate of the correlations between the structured slopes and intercepts and ${\hat{γ}}_{υ} = {\hat{Σ}}_{υ} [1, 2] / {\hat{σ}}_{υ_{1}} {\hat{σ}}_{υ_{2}}$ for that of the unstructured slopes and intercepts. The total within-area correlation between the slopes and intercepts equals ${\hat{γ}}_{ϑ v} = ({\hat{Σ}}_{ϑ} [1, 2] + {\hat{Σ}}_{υ} [1, 2]) / \sqrt{{\hat{σ}}_{ϑ_{1}}^{2} + {\hat{σ}}_{υ_{1}}^{2}} \sqrt{{\hat{σ}}_{ϑ_{2}}^{2} + {\hat{σ}}_{υ_{2}}^{2}}$ .

Next, we estimated the relative contribution of the structured and unstructured slopes and intercepts as fractions of the marginal variability of $σ_{ϑ_{k}}^{2}$ over the total marginal variability $σ_{ϑ_{k}}^{2} + σ_{υ_{k}}^{2}$ . Since the parameter $σ_{ϑ_{k}}^{2}$ is not directly available, we used its empirical analog ${\hat{σ}}_{ϑ_{k}}^{2} = \sum_{i} (ϑ_{k i} - {\bar{ϑ}}_{k}) / (n - 1)$ . Thus, $f r a c_{ϑ_{k}} = {\hat{σ}}_{ϑ_{k}}^{2} / σ_{ϑ_{k}}^{2} + σ_{υ_{k}}^{2})$ is the relative contribution of spatially structured slope and intercepts.

Spatial clustering of time trends

Our critical interest was on the spatial clustering of the differential time trends. To evaluate this, we dwell on the uncertainties or posterior probabilities associated with the posterior means of the district-specific time trends β_1i. We used the spatially smoothed exceedance probabilities of the posterior measures to detect and map clustering of these parameters instead of their raw estimates. First, from the posterior samples ${β_{1 i}^{g}}_{g = 1, \dots, G}$ of say β_1i, we estimated the exceedance probabilities $\Pr (β_{1 i} > β_{1}^{T h})$ as the probability that β_1i exceeds some threshold level say $β_{1}^{T h}$ . We estimated $\Pr (β_{i 1} > β_{1}^{T h})$ as how frequently the posterior samples ${β_{1 i}^{g}}_{g = 1, \dots, G}$ exceed the threshold $β_{1}^{T h}$ during the MCMC iterations. We then define $\Pr (β_{1 i} > β_{1}^{T h}) = \sum_{g = 1}^{G} I (β_{1 i}^{g} > β_{1}^{T h}) / G$ , where I() is the indicator function. Next, we estimated the spatially smoothed posterior probabilities as $\bar{\Pr} (β_{1 i} > β_{1}^{T h}) = \sum_{j = 0}^{J} \Pr (β_{1, i j} > β_{1}^{T h}) / (J + 1)$ , ∀j ∈ w_ij, where $\Pr (β_{1, i 0} > β_{1}^{T h}) = \Pr (β_{1 i} > β_{1}^{T h})$ . We set the threshold at $β_{1}^{T h} = \bar{β}$ . We checked the sensitivity of the exceedances by estimating the probabilities for different incremental multiplicative effects $\exp (β_{1}^{T h}) = {\exp (1.0, 1.10, 1.20, 1.25, 2.0}$ . Also for the random effects, ϑ_1i, υ_1i, ϑ_2i, υ_2i, we similarly estimated the spatially smoothed exceedance probabilities $\Pr (\exp (ϑ_{1 i}) > 1)$ , $\Pr (\exp (ϑ_{2 i}) > 1)$ , $\Pr (\exp (υ_{1 i}) > 1)$ , and $\Pr (\exp (υ_{2 i}) > 1)$ .

Results and Analyses

The adequacy of the models was evaluated using Bayesian predicted checks and the chi-square goodness-of-fit test. Table 2 provides estimates of the model fit parameters. An adequate fit is suggested for all the models as the $p_{B a y e s}^{χ}$ for the chi-square goodness-of-fit test lies within the interval [0.1, 0.9]³⁵. On the mean of the Bayesian p-values of the predictive checks, all models obtained ${\bar{p}}_{B a y e s}^{p r e d} = 0.5$ , also supporting adequate fit. Additional confirmation of adequate model fit is also based on the fact that just less than 3% of the observations had extreme Bayesian p-value, $% p_{B a y e s}^{e x t_p r e d} < 3 %$ .

Table 2.

Parameters of Models fit and comparison.

Parameters	Model 1	Model 2	Model 3	Model 4	Model 5	Model 6	Model 7	Model 8
$χ_{o b s}^{2}$	901.96	905.77	904.13	899.45	885.83	898.62	897.31	896.03
$χ_{r e p}^{2}$	832.96	831.26	834.25	832.39	833.29	832.70	831.87	832.03
$p_{B a y e s}^{χ}$	0.87	0.89	0.89	0.88	0.82	0.86	0.88	0.86
${\bar{p}}_{B a y e s}^{p r e d}$	0.5	0.5	0.5	0.5	0.5	0.5	0.5	0.5
$% p_{B a y e s}^{e x t_p r e d}$	1.76	1.41	1.65	2.11	0.71	1.18	1.76	0.94
Gamma priors (shape, rate)	DIC
(0.5, 0.0005)	10668.2	10658.2	10653.1	10646.5	10647.0	10645.9	10640.8	10626.8
(0.5, 0.005)	10669.4	10659.4	10652.4	10644.7	10644.5	10644.9	10641.6	10628.6
(0.5, 0.05)	10667.2	10656.2	10651.0	10642.8	10644.4	10644.0	10640.1	10627.5
(0.5, 0.5)	10667.5	10657.5	10651.7	10645.8	10648.5	10645.6	10641.7	10627.8

Open in a new tab

We studied the sensitivity of our results to various gamma priors for the variance parameters. We specifically varied the rate parameter in the gamma prior while maintaining the shape parameter. These results are shown in Table 2. Under the different gamma priors, the DIC values for each model were only marginally different, suggesting that the results are not overly sensitive to the choice of gamma priors. Analyzing the DIC values from Table 2 for model comparison, the significance of including correlation between the random intercepts and slopes can be observed in Models 4 to 8. The DIC values of these models decrease as compared to Models 1 to 3. This indicates that the possible correlation between the random slopes and intercepts should not be overlooked. Models 4 to 8 each have their weakness and strengths in terms of accounting for the correlations. Unlike Models 4 to 6 which are able to either account for correlation between unstructured intercepts and slopes, or structured intercepts and slopes or both, Models 7 and 8 evaluate joint correlations between the unstructured intercepts and slopes and between the structured intercepts and slopes. Comparing the DIC values, Model 8 provides an improvement over all the other models indicating that multivariate structures on the intercepts ϑ_ki and slopes υ_ki best support our data. The advantage of specifying ϑ_ki and υ_ki as multivariate structures is noticeable in capturing the separate correlations; thus ${\hat{γ}}_{ϑ_{1, 2}}$ between ϑ_1i and ϑ_2i, ${\hat{γ}}_{υ_{1, 2}}$ between υ_1i and υ_2i, and the total correlation ${\hat{γ}}_{ϑ v}$ . Yet, Model 7 is also appealing regarding its simple structure. In our implementation, the data showed no significant correlation between the structured slopes and intercepts for Models 7 and 8.

Table 3 reports the posterior estimates of the parameters of all models. We observe consistent estimates for most of the parameters, though differences are observed for some parameters. The overall incidence rate is $\exp (β_{0}) \approx 4.5$ per 100 people for all the models. For the time trends, all models showed $\exp ({\bar{β}}_{1}) \approx 1.23$ . This corresponds to 23% yearly average increases in diarrhea risk in Ghana. Unlike the variances of the structured and unstructured intercepts $σ_{ϑ_{1}}^{2}$ , $σ_{υ_{1}}^{2}$ and slopes $σ_{ϑ_{2}}^{2}$ , $σ_{υ_{2}}^{2}$ , the differences in the posterior estimates of the between-districts variances of the risks $σ_{ς}^{2}$ are marginal. This indicates that the district-specific risk estimates are robust to the choice of model and latent structures for smoothing. Thus, little overall smoothing is performed. The variances, on the other hand, are sensitive to the choice of latent smoothing structures imposed on the varying intercepts and lopes. We observe, however, similar variances amongst the varying intercepts or slopes for some models where the correlation parameter is the only difference. For instance, Models 1 and 4 have marginal differences in their variances for the intercepts but not for the slopes. The same applies to Models 2 and 5, suggesting that the intercepts parameters are rather robust when the correlation between intercepts and slopes are accounted for in this manner. For the models with either structured or unstructured random effects (intercepts and slopes or both), the unstructured components always dominate ( $f r a c_{ϑ_{2}} < 50 %$ and $f r a c_{ϑ_{2}} < 50 %$ ). This suggests the dominance of heterogeneity over clustering. Model 6 accounts for the correlation between the convolution intercepts and slopes. We observe the same correlation parameter estimate as Models 4 and 5, with a similar variance for the structured intercepts of its variant model (Model 3), except for the structured intercepts component, which is largely reduced. This is an indication of the robustness of the convolution method of accounting for correlation between the random slopes and intercepts. Results of Models 7 and 8 indicate similar estimates for most variance components since they both account for separate correlations between the unstructured and structured intercepts and slopes.

Table 3.

Posterior estimates of model parameters.

Parameters	Without correlation			With correlation
Parameters	Model 1	Model 2	Model 3	Model 4	Model 5	Model 6	Model 7	Model 8
$e x p (β_{0})$	0.0449	0.0449	0.0452	0.0446	0.0449	0.0452	0.0450	0.0450
$e x p ({\bar{β}}_{1})$	1.2318	1.2332	1.2322	1.2339	1.2338	1.2299	1.2333	1.2353
$σ_{ς}^{2}$	0.1673 (0.0111)	0.1744 (0.0114)	0.1679 (0.0111)	0.1689 (0.0107)	0.1711 (0.0115)	0.1690 (0.0111)	0.1693 (0.0110)	0.1667 (0.0107)
$σ_{ϑ_{1}}^{2}$	*	1.4375 (0.1805)	0.1166 (0.0846)	*	1.4961 (0.1856)	0.0699 (0.0529)	0.0764 (0.0586)	0.2117 (0.0854)
$σ_{ϑ_{2}}^{2}$	*	0.1802 (0.0310)	0.0227 (0.0209)	*	0.0386 (0.0386)	*	0.0104 (0.0122)	0.0717 (0.0186)
$σ_{υ_{1}}^{2}$	0.327 (0.038)	*	0.2633 (0.0464)	0.3253 (0.039)	*	0.2799 (0.0404)	0.2757 (0.0390)	0.2528 (0.0395)
$σ_{υ_{2}}^{2}$	0.049 (0.007)	*	0.0383 (0.0089)	0.0383 (0.0062)	*	*	0.0262 (0.0129)	0.0495 (0.0079)
$σ_{ϑ_{2} + υ_{2}}^{2}$	*	*	*	*	*	0.0381 (0.0065)	*	*
${\hat{γ}}_{ϑ}$	*	*	*	*	−0.1827 (0.0346)	*	−0.0040 (0.0541)	−0.1385 (0.1966)
${\hat{γ}}_{v}$	*	*	*	−0.1899 (0.0346)	*	*	−0.2110 (0.0408)	−0.4723 (0.0903)
${\hat{γ}}_{ϑ v}$	*	*	*	*	*	−0.1866 (0.0333)	*	−0.2870 (0.0973)
$f r a c_{ϑ_{1}}$	*	*	30.98	*	*	26.41	28.01	37.45
$f r a c_{ϑ_{2}}$	*	*	32.17	*	*	*	28.51	41.92

Open in a new tab

*Non-available.

The correlation parameters γ_υ, γ_ϑ, and γ_ϑv for Models 4, 5 and 6, respectively, are not significantly different probably because they use similar structures to account for the correlations. Model 6 is an improvement of Models 4 and 5; although the DIC is only slightly lower than the previous models, it is able to capture the fraction of spatially structured variation within the random intercepts. For Models 4 to 8, the correlation parameters can be interpreted as scale factors to estimate the scaled versions of the random intercepts. Specifically, for Model 4, γ_υ = −0.1899 is the scale factor for estimating the random slopes from the corresponding random intercepts. The correlation parameter γ_ϑv for Model 8, on the other hand, can be likened and interpreted as Pearson’s correlation coefficient.

Model 8 shows there is an overall increasing time trend of $\exp ({\bar{β}}_{1}) = 1.23$ for diarrhea in Ghana. This is also interpreted as the rate ratio between two consecutive years and implies diarrhea risk increases 1.23 times every year. The estimate of the correlation parameter γ_ϑv = −0.287 implies that lower intercepts are associated with more positive slopes. We further present maps of the posterior estimates of important quantities. Figure 1 shows the distribution of the differential time trends which is of primary interest. For ease of interpretation, β_1i are exponentiated such that they are interpreted as multiplicative effects. Thus, exp(β_1i) > 1 implies a positive time trend, whereas exp(β_1i) < 1 implies a negative time trend. We observe increasing trends of diarrhea throughout, except for a few districts within the south-central parts which have a decreasing or no temporal changes. There are also isolated instances (4 districts) with extreme time trends exp(β_1i > 2.0). To assess the statistical significance and the clustering patterns of the time trends, we mapped the spatially smoothed posterior probability that β_1i exceeds the mean time trend ${\bar{β}}_{1}$ . Same were estimated for the thresholds $\exp (β_{1}^{T h}) = {1.0, 1.10, 1.20, 1.25, 2.0}$ (Fig. 2). Districts with darker (lighter) grey have at least 80% probability of exceeding (falling below) the threshold. While fewer districts have at least a 0.8 chance of their trend to exceed the mean time trend $\exp ({\bar{β}}_{1}) = 1.23$ , the majority of districts have 80% chance of their trends to exceed $β_{1}^{T h} = 1.0$ , just as that of $β_{1}^{T h} = 1.10$ . This suggests an increasing time trend for the majority of the districts. No district has at least 0.8 chance of its trend to exceed $\exp ({\bar{β}}_{1}) = 1.25$ and $\exp ({\bar{β}}_{1}) = 2.0$ . This indicates that the few isolated areas identified with extreme time trends exp(β_1i > 2.0) are outliers.

Map of the spatial distribution of the differential time trends (left) and its histogram (right).

Exceedance probabilities of the differential time trends for different thresholds.

Figures 3 and 4 show the random effects of the intercepts and slopes and their associated exceedance probabilities, respectively. Recounting Fig. 4, maps of exp(ϑ_1i) and exp(υ_1i) are interpreted as residual spatially structured and unstructured risks after accounting for time trends, respectively. That of exp(ϑ_2i) and exp(υ_2i), on the other hand, are the residual time trends after accounting for the mean time trend. As expected, we observe no spatial similarity between these maps, except that the unstructured random effects dominated the structured components. Comparing the maps of exp(ϑ_2i) and exp(υ_2i), we can construe that the spatially unstructured components of the time trends dominated and accounted for 64.8% of the random slope variation (Table 3). Similarly comparing exp(ϑ_1i) and exp(υ_1i), the spatially unstructured residual spatial effects dominated by 71.9%.

Residual random effects of the structured and unstructured time trends and the risk.

Exceedance probabilities of Residual random effects of the structured and unstructured time trends the risk.

Discussion

This study presents an extension of a standard approach to analyzing spatial patterns of time trends in spatial epidemiology as has been developed by Bernardinelli et al.¹⁷. Unlike the direct applications of existing methods^17,18 and their applications, our focus has not only been on estimating the time trends but also evaluating the spatial clustering of the time trends. The epidemiological context of the study dwells on studying the differential time trends of diarrhea occurrences in Ghana. Estimating the time trend for each area would be a critical challenge in frequentist statistics in terms of reliability of estimates due to the number of fewer time stamps. In our hierarchical Bayesian space-time approach, we have taken advantage of the ability to borrow information across both space and time to improve the reliability of our estimates. The inclusion of the correlation term between the random intercepts and slopes has both methodological and epidemiological implications. Models with the correlation terms were observed to be superior, in terms of prediction power, over those without.

Here, we discuss both the issues arising out of the statistical modeling and the epidemiological implications for the analysis of diarrhea surveillance data in Ghana. The preferred model, amongst those without correlation between the intercepts and slopes, is Model 3. This model avoids choosing between heterogeneity and clustering for both the time trends and residual spatial variations. The significance of the correlation parameters in Models 4 to 7 suggest the importance to include correlation between the intercepts and slopes, the avoidance of which would cause estimates of the district-specific trend to be pulled towards the mean trend¹⁷. Derivation and comparing variant extensions to incorporate this correlation proofed worthwhile as the strengths and weakness of each approach were unveiled. Inducing correlation by expressing the random components of the slopes as a linear function of the random intercepts is straightforward. Yet, the model with BiCAR correlation (Model 8) showed higher performance and preferred over the others in our case. This model has the advantage to estimate the correlation between the unstructured intercepts and slopes, between the structured intercepts and slopes, and the overall correlation. Important differences in our models are the variation in time trends and the way correlation between intercepts and slopes are accommodated. We conclude that the inherent structure of the data should guide the choice of correlation method by comparing competing models. Centering the covariate, in this case, the time, should reduce the correlation between the intercepts and slopes, but not for the case of strong inherent correlation. For cancer related and non-infectious diseases this might work. In our case, we explicitly accounted for the inherent correlation. We observed the correlation to be significant even though the time variable was centered. Robust parameter estimation is not the only advantage to account for correlation; the kind of correlation between the random intercepts and slopes (either negative or positive) presents an important etiological clue regarding the response of the population to environmental or climatic changes. For our study, Models 4 to 8 indicated a negative association between the intercepts and slopes, implying that the district-specific risks are converging to the same levels. The etiological clue might be that environmental risk factors which were frequently different in the past are currently frequently similar. Population growth, unplanned urbanization, and rural-urban migration which have exceeded the availability of safe drinking water and sanitation could be the causal factors.

From an epidemiological point of view, the current study helps to form a clearer picture of the space-time trends of diarrhea in Ghana, prompting for an effective control strategy. The average time trend, $e x p ({\bar{β}}_{1}) \approx 1.23$ , reflecting a yearly incremental rate of 23% is striking as it far beats the yearly population growth rate of nearly 2.3%. This alone is enough indication to prompt health officials and policymakers about the severity of diarrhea menace. Returning to the observed spatial patterns of the time trends which is the primary objective of this study (Fig. 3), there is wide spatial variation, of which the unstructured components dominate over the structured ones. This is where avoiding choosing between heterogeneity and clustering is relevant. It suggests the importance of household level risk factors (captured by unstructured heterogeneity) in controlling the temporal effects of diarrhea. The spatially varying growth rates could indicate varying responses to time-varying climatic risk factors such as temperature, humidity, and rainfall. These are known to influence the risk of diarrhea infections^10,38. Additional plausible interpretation, especially for districts with high time trends, could be improvements in the disease surveillance systems or reporting strategies. Additionally, the random effects of the time trends, especially the structured components, may also be concomitant with district-level population growth changes, but further studies are required to sustain this as fact. For the unstructured component, deterioration of household-level water and sanitation amenities are plausible factors and likewise requires further studies. That said, the probability maps of the structured and unstructured time trends could easily support policymakers about districts of needed concern for either household level targeting or regional (beyond district-level) targeting. Either way, improvements in household level amenities will have ripple effects on district or regional-level clustering. In a nutshell, these findings deserve further reflection by health officials and policymakers in line with which prompt actions might be required.

Most small area health studies focus on small area contrasts of the risk for a single temporal window or temporal contrasts for the whole study area. While we developed our models with a focus on diarrhea in Ghana, an extension to other diseases and/or other developing countries is conceptually straightforward. The novelty of our study is partly based on the variant approaches developed to induce correlation between the intercepts and slopes. Here, for our data with few times stamps, we have used Bayesian random effects model to evaluate the small area contrasts of the linear temporal trends. It can be argued that our approach is similar to the Type IV interaction since the linear trends (structured) with local slopes are specified for each district which are then further smoothed (structured) over space, except that we additionally focused on the spatial clustering of the time trends. Also, accounting for correlation between the intercepts and slopes is therefore simple since each spatial units has a single slope unlike the case of the Type IV where the number of slopes per spatial unit is equivalent to the length of the random walk. However, for data with many time stamps, our approach has the disadvantage to over smooth the temporal trends due to the imposition of linear time trends. Extension of the parametric linear trend model to a nonparametric time trend model like the random walk approach, however, is possible when the focus is to detect where and when the time trends cluster.

In a similar objective to our study, Waller et al.³⁹ explored the spatio-temporal patterns in the county-level incidence of Lyme disease in the northeastern United States. Another distinction here is that we have developed and compared different methods to accommodate the inherent correlation between the random components of the intercepts and slopes. It may be possible to extend the time trend component to a random walk prior instead of imposing linearity if data for a substantial number of temporal points are available. Within our study area, a typical infectious disease extension is to compare with the spatially varying temporal trends of intestinal parasites morbidities which also has similar risk factors. In the future, we intend to extend our study to a bivariate model, where the spatially varying time trends of two diseases could be determined jointly.

Now we turn to the implications of our study for public health policy and interventions. We indeed reiterate that, in health policy research, the models could be implemented to identify and map areas with extremely increasing time trends with a view of determining critical areas needing interventions. In fact, it is noteworthy to state that the patterns of infectious diseases are dynamic rather than static. Hence, since our study is retrospective, public health intervention policies should rather be based on results of new data at the time of their developments.

Our study also has some limitations. First, our study has the implicit assumption of equal within-district time trends without consideration for household or relatively smaller area-level trends. Although we attempted to capture this effect through the unstructured random components of the time trends, it could instead be spurious. That said, the significance of the study to public health far outweighs these limitations. In the future, we intend to focus on spatial disaggregating of the time trends to account for within area variation.

Conclusions

In this study, we have demonstrated how space-time random effect modeling is useful to detect district-specific time trends of diarrhea. Models which account for correlation between varying intercepts and slopes are competitive than those without, with a BiCAR method being the most competitive. The inclusion of correlation between the intercepts and slopes provided additional epidemiological information, illuminating the response of the disease dynamics to environment changes in past and present. We found increasing trends of diarrhea risk amongst many districts, but many with trends lesser than the overall mean trend. The spatially varying trend maps are useful for guiding interventions and resource allocations geared towards curbing this menace. Extensions of our models to other or multiple infectious diseases are straightforward, and we seek to venture into this in the future. However, flexible usability of our approach by public health professionals in Ghana will require further engagements and capacity training which we aspire to fulfill in the future.

Acknowledgements

We extend our sincere appreciation to the CHIM of the Ghana Health Services for providing all the necessary data and background information for this research.

Author Contributions

F.B.O. conceived of the study and carried out the analysis and drafted the manuscript. A.S. conceived of the study, and participated in its design and coordination and helped to draft the manuscript. All authors read and approved the final manuscript.

Data Availability

The datasets used and/or analyzed during the current study are available from the CHIM of the Ghana Health Services upon reasonable request

Competing Interests

The authors declare no competing interests.

Footnotes

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Black, R. E., Cousens, S., Johnson, H. L., Lawn, J. E. & Rudan, I. Global, regional, and national causes of child mortality in 2008: a systematic analysis. Lancet375, (2010). [DOI] [PubMed]
2.Black RE, Morris SS, Bryce J. Where and why are 10 million children dying every year? The Lancet. 2003;361:2226–2234. doi: 10.1016/S0140-6736(03)13779-8. [DOI] [PubMed] [Google Scholar]
3.Boschi-Pinto C. Estimating child mortality due to diarrhoea in developing countries. Bull. World Health Organ. 2008;86:710–717. doi: 10.2471/BLT.07.050054. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Fischer Walker CL, Perin J, Aryee MJ, Boschi-Pinto C, Black RE. Diarrhea incidence in low- and middle-income countries in 1990 and 2010: a systematic review. BMC Public Health. 2012;12:1–7. doi: 10.1186/1471-2458-12-220. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Parashar UD, Hummelman EG, Bresee JS, Miller MA, Glass RI. Global illness and deaths caused by rotavirus disease in children. Emerg. Infect. Dis. 2003;9:565–572. doi: 10.3201/eid0905.020562. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Lanata CF, et al. Global Causes of Diarrheal Disease Mortality in Children <5 Years of Age: A Systematic Review. PLOS ONE. 2013;8:e72788. doi: 10.1371/journal.pone.0072788. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Kotloff KL, et al. Burden and aetiology of diarrhoeal disease in infants and young children in developing countries (the Global Enteric Multicenter Study, GEMS): a prospective, case-control study. The Lancet. 2013;382:209–222. doi: 10.1016/S0140-6736(13)60844-2. [DOI] [PubMed] [Google Scholar]
8.Julian TR. Environmental transmission of diarrheal pathogens in low and middle income countries. Env. Sci Process. Impacts. 2016;18:944–955. doi: 10.1039/C6EM00222F. [DOI] [PubMed] [Google Scholar]
9.Liu, L. et al. Global, regional, and national causes of child mortality in 2000–13, with projections to inform post-2015 priorities: an updated systematic analysis. 385, 430–440 (2015). [DOI] [PubMed]
10.Azage M, Kumie A, Worku A, Bagtzoglou AC. Childhood Diarrhea Exhibits Spatiotemporal Variation in Northwest Ethiopia: A SaTScan Spatial Statistical Analysis. PLOS ONE. 2015;10:e0144690. doi: 10.1371/journal.pone.0144690. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Kulinkina AV, et al. Seasonality of water quality and diarrheal disease counts in urban and rural settings in south India. Sci. Rep. 2016;6:srep20521. doi: 10.1038/srep20521. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Pande S, Keyzer MA, Arouna A, Sonneveld BG. Addressing diarrhea prevalence in the West African Middle Belt: social and geographic dimensions in a case study for Benin. Int. J. Health Geogr. 2008;7:17. doi: 10.1186/1476-072X-7-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Thompson CN, et al. The impact of environmental and climatic variation on the spatiotemporal trends of hospitalized pediatric diarrhea in Ho Chi Minh City, Vietnam. Health Place. 2015;35:147–154. doi: 10.1016/j.healthplace.2015.08.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Xu, Z. et al. Exploration of diarrhoea seasonality and its drivers in China. Sci. Rep. 5, (2015). [DOI] [PMC free article] [PubMed]
15.Takahashi K, Kulldorff M, Tango T, Yih K. A flexibly shaped space-time scan statistic for disease outbreak detection and monitoring. Int. J. Health Geogr. 2008;7:14. doi: 10.1186/1476-072X-7-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Kulldorff M, Heffernan R, Hartman J, Assunção R, Mostashari F. A Space–Time Permutation Scan Statistic for Disease Outbreak Detection. PLOS Med. 2005;2:e59. doi: 10.1371/journal.pmed.0020059. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Bernardinelli L, et al. Bayesian analysis of space-time variation in disease risk. Stat. Med. 1995;14:2433–2443. doi: 10.1002/sim.4780142112. [DOI] [PubMed] [Google Scholar]
18.Knorr-Held L. Bayesian modelling of inseparable space-time variation in disease risk. Stat. Med. 2000;19:2555–2567. doi: 10.1002/1097-0258(20000915/30)19:17/18<2555::AID-SIM587>3.0.CO;2-#. [DOI] [PubMed] [Google Scholar]
19.Awuah E, Nyarko KB, Owusu PA. Water and sanitation in Ghana. Desalination. 2009;248:460–467. doi: 10.1016/j.desal.2008.05.088. [DOI] [Google Scholar]
20.Adjei A, et al. Cryptosporidium oocysts in Ghanaian AIDS patients with diarrhoea. East Afr. Med. J. 2003;80:369–372. doi: 10.4314/eamj.v80i7.8721. [DOI] [PubMed] [Google Scholar]
21.Adjei AA, et al. Cryptosporidium Spp., a frequent cause of diarrhea among children at the Korle-Bu Teaching Hospital, Accra, Ghana. Jpn. J. Infect. Dis. 2004;57:216–219. [PubMed] [Google Scholar]
22.Ahiadeke C. Breast-feeding, diarrhoea and sanitation as components of infant and child health: a study of large scale survey data from Ghana and Nigeria. J. Biosoc. Sci. 2000;32:47–61. doi: 10.1017/S002193200000047X. [DOI] [PubMed] [Google Scholar]
23.Eibach D, et al. Molecular Epidemiology and Antibiotic Susceptibility of Vibrio cholerae Associated with a Large Cholera Outbreak in Ghana in 2014. PLoS Negl. Trop. Dis. 2016;10:e0004751. doi: 10.1371/journal.pntd.0004751. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Gyimah, S. O. Interaction Effects of Maternal Education and Household Facilities on Childhood Diarrhea in Sub-Saharan Africa: The Case of Ghana. World Health & Population Available at: http://www.longwoods.com/content/17628. (Accessed: 30th May 2017) (2003).
25.Krumkamp, R. et al. Gastrointestinal Infections and Diarrheal Disease in Ghanaian Infants and Children: An Outpatient Case-Control Study. 9, e0003568 (2015). [DOI] [PMC free article] [PubMed]
26.Kumi-Kyereme A, Amo-Adjei J. Household wealth, residential status and the incidence of diarrhoea among children under-five years in Ghana. J. Epidemiol. Glob. Health. 2016;6:131–140. doi: 10.1016/j.jegh.2015.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Nkrumah B, Nguah SB. Giardia lamblia: a major parasitic cause of childhood diarrhoea in patients attending a district hospital in Ghana. Parasit. Vectors. 2011;4:163. doi: 10.1186/1756-3305-4-163. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Waller, L. A. & Gotway, C. A. Applied Spatial Statistics for Public Health Data. (John Wiley & Sons, 2004).
29.Wakefield JC, Morris SE. The Bayesian Modeling of Disease Risk in Relation to a Point Source. J. Am. Stat. Assoc. 2001;96:77–91. doi: 10.1198/016214501750332992. [DOI] [Google Scholar]
30.Press, S. J. Applied Multivariate Analysis: Using Bayesian and Frequentist Methods of Inference, Second Edition. (Dover Publications, 2005).
31.Gelman, A. et al. Bayesian Data Analysis, Third Edition. (Chapman and Hall/CRC, 2013).
32.Moraga P, Lawson AB. Gaussian component mixtures and CAR models in Bayesian disease mapping. Comput. Stat. Data Anal. 2012;56:1417–1433. doi: 10.1016/j.csda.2011.11.011. [DOI] [Google Scholar]
33.Spiegelhalter, D. J., Thomas, A. & Best, N. G. WinBUGS Version 1.4.3. (2008).
34.R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing (2016).
35.Tzala E, Best N. Bayesian latent variable modelling of multivariate spatio-temporal variation in cancer mortality. Stat. Methods Med. Res. 2008;17:97–118. doi: 10.1177/0962280207081243. [DOI] [PubMed] [Google Scholar]
36.Marshall EC, Spiegelhalter DJ. Approximate cross-validatory predictive checks in disease mapping models. Stat. Med. 2003;22:1649–1660. doi: 10.1002/sim.1403. [DOI] [PubMed] [Google Scholar]
37.Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B Stat. Methodol. 2002;64:583–639. doi: 10.1111/1467-9868.00353. [DOI] [Google Scholar]
38.Azage M, Kumie A, Worku A, Bagtzoglou AC, Anagnostou E. Effect of climatic variability on childhood diarrhea and its high risk periods in northwestern parts of Ethiopia. PLOS ONE. 2017;12:e0186933. doi: 10.1371/journal.pone.0186933. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Waller LA, et al. Spatio-temporal patterns in county-level incidence and reporting of Lyme disease in the northeastern United States, 1990–2000. Environ. Ecol. Stat. 2007;14:83–100. doi: 10.1007/s10651-006-0002-z. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets used and/or analyzed during the current study are available from the CHIM of the Ghana Health Services upon reasonable request

[CR1] 1.Black, R. E., Cousens, S., Johnson, H. L., Lawn, J. E. & Rudan, I. Global, regional, and national causes of child mortality in 2008: a systematic analysis. Lancet375, (2010). [DOI] [PubMed]

[CR2] 2.Black RE, Morris SS, Bryce J. Where and why are 10 million children dying every year? The Lancet. 2003;361:2226–2234. doi: 10.1016/S0140-6736(03)13779-8. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Boschi-Pinto C. Estimating child mortality due to diarrhoea in developing countries. Bull. World Health Organ. 2008;86:710–717. doi: 10.2471/BLT.07.050054. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Fischer Walker CL, Perin J, Aryee MJ, Boschi-Pinto C, Black RE. Diarrhea incidence in low- and middle-income countries in 1990 and 2010: a systematic review. BMC Public Health. 2012;12:1–7. doi: 10.1186/1471-2458-12-220. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Parashar UD, Hummelman EG, Bresee JS, Miller MA, Glass RI. Global illness and deaths caused by rotavirus disease in children. Emerg. Infect. Dis. 2003;9:565–572. doi: 10.3201/eid0905.020562. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Lanata CF, et al. Global Causes of Diarrheal Disease Mortality in Children <5 Years of Age: A Systematic Review. PLOS ONE. 2013;8:e72788. doi: 10.1371/journal.pone.0072788. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Kotloff KL, et al. Burden and aetiology of diarrhoeal disease in infants and young children in developing countries (the Global Enteric Multicenter Study, GEMS): a prospective, case-control study. The Lancet. 2013;382:209–222. doi: 10.1016/S0140-6736(13)60844-2. [DOI] [PubMed] [Google Scholar]

[CR8] 8.Julian TR. Environmental transmission of diarrheal pathogens in low and middle income countries. Env. Sci Process. Impacts. 2016;18:944–955. doi: 10.1039/C6EM00222F. [DOI] [PubMed] [Google Scholar]

[CR9] 9.Liu, L. et al. Global, regional, and national causes of child mortality in 2000–13, with projections to inform post-2015 priorities: an updated systematic analysis. 385, 430–440 (2015). [DOI] [PubMed]

[CR10] 10.Azage M, Kumie A, Worku A, Bagtzoglou AC. Childhood Diarrhea Exhibits Spatiotemporal Variation in Northwest Ethiopia: A SaTScan Spatial Statistical Analysis. PLOS ONE. 2015;10:e0144690. doi: 10.1371/journal.pone.0144690. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Kulinkina AV, et al. Seasonality of water quality and diarrheal disease counts in urban and rural settings in south India. Sci. Rep. 2016;6:srep20521. doi: 10.1038/srep20521. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Pande S, Keyzer MA, Arouna A, Sonneveld BG. Addressing diarrhea prevalence in the West African Middle Belt: social and geographic dimensions in a case study for Benin. Int. J. Health Geogr. 2008;7:17. doi: 10.1186/1476-072X-7-17. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Thompson CN, et al. The impact of environmental and climatic variation on the spatiotemporal trends of hospitalized pediatric diarrhea in Ho Chi Minh City, Vietnam. Health Place. 2015;35:147–154. doi: 10.1016/j.healthplace.2015.08.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Xu, Z. et al. Exploration of diarrhoea seasonality and its drivers in China. Sci. Rep. 5, (2015). [DOI] [PMC free article] [PubMed]

[CR15] 15.Takahashi K, Kulldorff M, Tango T, Yih K. A flexibly shaped space-time scan statistic for disease outbreak detection and monitoring. Int. J. Health Geogr. 2008;7:14. doi: 10.1186/1476-072X-7-14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Kulldorff M, Heffernan R, Hartman J, Assunção R, Mostashari F. A Space–Time Permutation Scan Statistic for Disease Outbreak Detection. PLOS Med. 2005;2:e59. doi: 10.1371/journal.pmed.0020059. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Bernardinelli L, et al. Bayesian analysis of space-time variation in disease risk. Stat. Med. 1995;14:2433–2443. doi: 10.1002/sim.4780142112. [DOI] [PubMed] [Google Scholar]

[CR18] 18.Knorr-Held L. Bayesian modelling of inseparable space-time variation in disease risk. Stat. Med. 2000;19:2555–2567. doi: 10.1002/1097-0258(20000915/30)19:17/18<2555::AID-SIM587>3.0.CO;2-#. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Awuah E, Nyarko KB, Owusu PA. Water and sanitation in Ghana. Desalination. 2009;248:460–467. doi: 10.1016/j.desal.2008.05.088. [DOI] [Google Scholar]

[CR20] 20.Adjei A, et al. Cryptosporidium oocysts in Ghanaian AIDS patients with diarrhoea. East Afr. Med. J. 2003;80:369–372. doi: 10.4314/eamj.v80i7.8721. [DOI] [PubMed] [Google Scholar]

[CR21] 21.Adjei AA, et al. Cryptosporidium Spp., a frequent cause of diarrhea among children at the Korle-Bu Teaching Hospital, Accra, Ghana. Jpn. J. Infect. Dis. 2004;57:216–219. [PubMed] [Google Scholar]

[CR22] 22.Ahiadeke C. Breast-feeding, diarrhoea and sanitation as components of infant and child health: a study of large scale survey data from Ghana and Nigeria. J. Biosoc. Sci. 2000;32:47–61. doi: 10.1017/S002193200000047X. [DOI] [PubMed] [Google Scholar]

[CR23] 23.Eibach D, et al. Molecular Epidemiology and Antibiotic Susceptibility of Vibrio cholerae Associated with a Large Cholera Outbreak in Ghana in 2014. PLoS Negl. Trop. Dis. 2016;10:e0004751. doi: 10.1371/journal.pntd.0004751. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Gyimah, S. O. Interaction Effects of Maternal Education and Household Facilities on Childhood Diarrhea in Sub-Saharan Africa: The Case of Ghana. World Health & Population Available at: http://www.longwoods.com/content/17628. (Accessed: 30th May 2017) (2003).

[CR25] 25.Krumkamp, R. et al. Gastrointestinal Infections and Diarrheal Disease in Ghanaian Infants and Children: An Outpatient Case-Control Study. 9, e0003568 (2015). [DOI] [PMC free article] [PubMed]

[CR26] 26.Kumi-Kyereme A, Amo-Adjei J. Household wealth, residential status and the incidence of diarrhoea among children under-five years in Ghana. J. Epidemiol. Glob. Health. 2016;6:131–140. doi: 10.1016/j.jegh.2015.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Nkrumah B, Nguah SB. Giardia lamblia: a major parasitic cause of childhood diarrhoea in patients attending a district hospital in Ghana. Parasit. Vectors. 2011;4:163. doi: 10.1186/1756-3305-4-163. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Waller, L. A. & Gotway, C. A. Applied Spatial Statistics for Public Health Data. (John Wiley & Sons, 2004).

[CR29] 29.Wakefield JC, Morris SE. The Bayesian Modeling of Disease Risk in Relation to a Point Source. J. Am. Stat. Assoc. 2001;96:77–91. doi: 10.1198/016214501750332992. [DOI] [Google Scholar]

[CR30] 30.Press, S. J. Applied Multivariate Analysis: Using Bayesian and Frequentist Methods of Inference, Second Edition. (Dover Publications, 2005).

[CR31] 31.Gelman, A. et al. Bayesian Data Analysis, Third Edition. (Chapman and Hall/CRC, 2013).

[CR32] 32.Moraga P, Lawson AB. Gaussian component mixtures and CAR models in Bayesian disease mapping. Comput. Stat. Data Anal. 2012;56:1417–1433. doi: 10.1016/j.csda.2011.11.011. [DOI] [Google Scholar]

[CR33] 33.Spiegelhalter, D. J., Thomas, A. & Best, N. G. WinBUGS Version 1.4.3. (2008).

[CR34] 34.R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing (2016).

[CR35] 35.Tzala E, Best N. Bayesian latent variable modelling of multivariate spatio-temporal variation in cancer mortality. Stat. Methods Med. Res. 2008;17:97–118. doi: 10.1177/0962280207081243. [DOI] [PubMed] [Google Scholar]

[CR36] 36.Marshall EC, Spiegelhalter DJ. Approximate cross-validatory predictive checks in disease mapping models. Stat. Med. 2003;22:1649–1660. doi: 10.1002/sim.1403. [DOI] [PubMed] [Google Scholar]

[CR37] 37.Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B Stat. Methodol. 2002;64:583–639. doi: 10.1111/1467-9868.00353. [DOI] [Google Scholar]

[CR38] 38.Azage M, Kumie A, Worku A, Bagtzoglou AC, Anagnostou E. Effect of climatic variability on childhood diarrhea and its high risk periods in northwestern parts of Ethiopia. PLOS ONE. 2017;12:e0186933. doi: 10.1371/journal.pone.0186933. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Waller LA, et al. Spatio-temporal patterns in county-level incidence and reporting of Lyme disease in the northeastern United States, 1990–2000. Environ. Ecol. Stat. 2007;14:83–100. doi: 10.1007/s10651-006-0002-z. [DOI] [Google Scholar]

PERMALINK

Bayesian Random Effect Modeling for analyzing spatial clustering of differential time trends of diarrhea incidences

Frank Badu Osei

Alfred Stein

Abstract

Introduction