Quasi-Likelihood Techniques in a Logistic Regression Equation for Identifying Simulium damnosum s.l. Larval Habitats Intra-cluster Covariates in Togo

BENJAMIN G JACOB; ROBERT J NOVAK; LAURENT TOE; MOUSSA S SANFO; ABENA N AFRIYIE; MOHAMMED A IBRAHIM; DANIEL A GRIFFITH; THOMAS R UNNASCH

doi:10.1080/10095020.2012.714663

. Author manuscript; available in PMC: 2013 Sep 24.

Published in final edited form as: Geo Spat Inf Sci. 2012 Sep 24;15(2):117–133. doi: 10.1080/10095020.2012.714663

Quasi-Likelihood Techniques in a Logistic Regression Equation for Identifying Simulium damnosum s.l. Larval Habitats Intra-cluster Covariates in Togo

BENJAMIN G JACOB ^a,^*, ROBERT J NOVAK ^a, LAURENT TOE ^b, MOUSSA S SANFO ^b, ABENA N AFRIYIE ^c, MOHAMMED A IBRAHIM ^d, DANIEL A GRIFFITH ^e, THOMAS R UNNASCH ^a

PMCID: PMC3595116 NIHMSID: NIHMS418175 PMID: 23504576

Abstract

The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter estimators from the sampled data. Thereafter, Durbin-Watson test statistics were used to test the null hypothesis that the regression residuals were not autocorrelated against the alternative that the residuals followed an autoregressive process in AUTOREG. Bayesian uncertainty matrices were also constructed employing normal priors for each of the sampled estimators in PROC MCMC. The residuals revealed both spatially structured and unstructured error effects in the high and low ABR-stratified clusters. The analyses also revealed that the estimators, levels of turbidity and presence of rocks were statistically significant for the high-ABR-stratified clusters, while the estimators distance between habitats and floating vegetation were important for the low-ABR-stratified cluster. Varying and constant coefficient regression models, ABR- stratified GIS-generated clusters, sub-meter resolution satellite imagery, a robust residual intra-cluster diagnostic test, MBR-based histograms, eigendecomposition spatial filter algorithms and Bayesian matrices can enable accurate autoregressive estimation of latent uncertainity affects and other residual error probabilities (i.e., heteroskedasticity) for testing correlations between georeferenced S. damnosum s.l. riverine larval habitat estimators. The asymptotic distribution of the resulting residual adjusted intra-cluster predictor error autocovariate coefficients can thereafter be established while estimates of the asymptotic variance can lead to the construction of approximate confidence intervals for accurately targeting productive S. damnosum s.l habitats based on spatiotemporal field-sampled count data.

Keywords: Simulium damnosum s.l., cluster covariates, QuickBird, onchoceriasis, annual biting rates, Bayesian, Togo

Introduction

Onchoceriasis or river blindness is a human filarial infection which causes blindness in infected people and is transmitted by Simulium species or black flies. Traditionally, the designation of at-risk communities for onchocerciasis is accomplished through intensive ground-based epidemiological surveys of communities located in rural riverine areas in which the disease is endemic. In 1974, the World Health Organization (WHO) began the Onchocerciasis Control Programme (OCP) in 11 West African countries. Thereafter, intensive country-wide field-sampled surveys began in 1975.^[1] These surveys revealed that onchoceriasis was hyperendemic with prevalence rates around 70% with some areas showing human blindness rates up to 9%. The frequency and level of evolution of ocular lesions and of onchocercal blindness were among the highest in the world. This data was employed by the OCP to justify and deploy aerial applications of larval insecticides to reduce the populations of Similium damnosum s.l., the primary vector of onchocerciasis, thereby, reducing transmission to humans.^[2] Simuliidae or black flies in the Simulium damnosum Theobald complex are the only insect vectors of human onchoceriasis in West African countries.^[3]

The original OCP vector control efforts resulted in a dramatic reduction in the transmission of onchoceriasis, an accomplishment that was maintained for over 12 years.^[4] The effectiveness of the Program is illustrated by the reduction in Annual Biting Rates (ABRs) of Simulium species at Pont frontiere on the Leraba River, which declined from 26 in 1975 to 3 in 1989, when the control program ended. Similarly, at Loabe on the Nakambi River and at Ziou Zabre (both in Burkina Faso), the ABR fell from 6,090 and 11,879 in 1975 to 238 and 1,465 in 1989, respectively. Significant downward trends as a result of the black fly control program were observed at Bagre on the Nazinon River and at Bitou on the Nauhao River, a tributary of the Nakambi River. In contrast, certain areas within the 11 countries covered by the OCP, such as the Oti River tributaries in Togo, Leraba /Comoe Rivers in Burkina Faso, Baoulé and Sankarani Rivers in Mali, White Bandama in Côte d'Ivoire and the Black Volta River region in Ghana, the effectiveness of vector control measures was not as successful.

Within the 11-country OCP area the movement of black flies was found to represent a significant impediment to the success of vector control efforts, particularly in areas on the eastern and western border. Dry season reinvasion of black flies travelling on Harmattan winds or West African trade winds from the north was blamed for the rapid dispersion of insecticide-resistant black flies to other river basins.^[5] As a result of this re-population, the boundaries of the program area were extended to include additional riverine breeding sites that were serving as the reservoir for these migrating flies.^[6]

Macrogeographic factors such as the complexity of the landscape, human population movements and the suspension of larviciding of neighboring black fly producing rivers may also have had an effect on eliminating parasite reservoirs. Additionally, marked differences between spatiotemporal-sampled habitats with respect to microgeographical distribution such as larval habitat production, host preferences, vectorial capacity, and susceptibility to larvicides, biting cycle and population age structure may also have complicated larvicidial operations. It has been suggested that spatiotemporal patterns of vector insect larval habitat production are driven by 2 mechanisms, namely, (i) variation in intrinsic properties of breeding habitats, which affect growth and survival of immature populations, and (ii) the spatial locations of focal habitats in relation to human habitation.^[2] These habitat variabilities require vigorous quantitative statistical analyses for implementing control programs. By doing so, varying linear outputs and discrete-time state-space models could be used to remotely target productive S. damnosum s.l. larval habitats based on spatiotemporal field-sampled count data. Treatments or habitat perturbations should be based on surveillance of larvae in the most productive areas of an ecosystem. ^[7]

In this paper we constructed multiple spatiotemporal cluster-based autoregressive residual error matrices employing Durbin Watson (DW) first-order autocorrelation statistics, spatial filter orthogonal eigenvectors and Bayesian hierarchical generalized linear mixed models to spatially target productive S. damnosum s.l larval habitats based on field-sampled count data in two riverine epidemiological breeding study sites in Togo. Since contagious processes, such as conspecific attraction and others can generate time series-dependent error patterns in S. damnosum s.l. riverine larval habitats species abundance that cannot be explained by simple residual hierarchical cluster-based regression models, we assumed by combining linear and non-linear residual predictive-error estimation algorithms we could qualitatively assess and quantify varying and constant intra-cluster regression-based disturbances (e.g., conditional heteroskedasticity, serial error correlation).

The importance of this research may also be expressed in the GIScience literature regarding representations of geographic space as well as various literatures concerned with time series-dependent vector insect larval habitat modeling. Presently there is not a steady current of literature on representation issues in GIS for remotely quantitatively assessing spatiotemporal-sampled vector insect data. Thus, the potential synergy of developments in GIS, spatial statistics and entomology may not be apparent to researchers in their respected disciplines. The fusing of these independent research trajectories into a common cohesive agenda (e.g., GIS/remote integrated vector aquatic larval habitat control-based cyberenvironment) could generate geostatistical tools that might reveal new insights into the role of physical and human geography in vector borne disease transmission. Results from the entomological and epidemiological surveillance activities have indicated fly infectivity levels and infection in humans that require improved programme^[1]. Therefore, the objectives of this research were to: (1) perform a hierarchical regression-based cluster analyses using multiple georeferenced S damnosum s.l. larval habitat parameter estimators, (2) construct multiple stepwise linear models using the sampled explanatory variables (3) filter all latent serial autocorrelation error coefficients using an eigenfunction decomposition algorithm, (4) formulate a customized uncertainty diagnostic test using the random effects from an iterative Bayesian analyses; and, (5) validate all forecasted estimates using a cumulative residual analyses for qualitatively assessing and accurately quantifying intra-cluster regression-based error coefficients associated to prolific riverine larval habitat clusters based on spatiotemporal field-sampled count data.

1 Material and methodology

1.1 Study site

Five onchoceriasis/black fly sites located in Togo were used in this study including: Mo, Landa Pozada, Bagan, Titra and Sarakawa-Kpleou. The riverine epidemiological study sites are located approximately 100 km of Kara a city in northern Togo, situated Kara Region, 413 km north of the capital Lomé. The Haugeau River flows a little way south of Kara and is the main resource of water for the region. North of Kara is the Oti River which runs through a sandstone plateau. This area is vegetatvely savanna and is characterized by granite and gneiss outcrops. The Oti River drains the plateau and is a main tributary of the Volta River. The rivers and its tributaries are characterized by a period of flooding from July to November with a peak in September and a lengthy low water period from January to June. Land uses include arable land, permanent crops, permanent pastures, forests and woodland. The climate is generally tropical with average temperatures about 30°C (86°F) in the region during the black fly season lasting up to seven months, while the dry desert winds of the Harmatten blow south from November to March, bringing cooler weather.

1.2 Remote sensing data

In this research, we used Landsat Enhanced Thematic Mapper Plus (ETM+) data for remotely determining geographic locations of the S. damnosum s.l. larval habitat clusters based on multiple regression-based parameter estimators. The Landsat ETM+ image data consisted of eight spectral bands with a spatial resolution of 30 meters for bands 1 to 5 and band 7. Resolution for band 6 (thermal infra-red) was 60 meters and resolution for band 8 (panchromatic) was 15 meters. The total scene size we used was 170×183 km. QuickBird (www.digitalglobe.com) satellite images were also acquired for the riverine epidemiological breeding study site areas. We acquired 11-bit data in five spectral bands covering panchromatic (525–924 nm), blue (447–512 nm), green (499–594 nm), red (620–688 nm), and near-infrared (NIR) (755–874 nm) wavelengths for the study sites. At nadir, the nominal ground sample distance was 0.61 m (panchromatic) with a nominal swath width of 16.5 km. The basic products were delivered at the native sensor resolution and swath width of the image acquisition. The products were then resampled to a panchromatic ground resolution of 0.61 m and cropped to define geographic polygons of the riverine epidemiological study sites.

The satellite imagery was classified using the Iterative Self-Organizing Data Analysis Technique (ISODATA) unsupervised routine in ERDAS Imagine V.8.7™. The QuickBird scene size per sampled S. damnosum s.l. riverine study site image was 25 km².

1.3 Annual Biting Rate (ABR) cluster-based classification

Initially, a 5 km buffer was placed around the riverine epidemiological breeding sites using the Land-sat Thematic data in ArcGIS. The field and remote explanatory predictor covariate coefficient estimates encompassing the georeferenced breeding sites were also entered into SAS 9.2^® (Carey, North Carolina). FLEXIBLE|FLE in SAS was then used to request the flexible-beta method. The PROC CLUSTER statement started the procedure which specified a residual robust hierarchical clustering algorithm employing METHOD=FLEXIBLE. Stratified random sampling was then performed using PROC FREQ and PROC SURVEYSELECT. A routine was developed to select the stratified samples by ABR rates (Figure 1). The final model revealed that the highest density ABR-based cluster was the Bagan riverine breeding study site while the lowest ABR cluster was the Sarakawa-Kpleou study site.

Geographical clusters of *S.damnosum s.l.* riverine habitat stratified by annual biting rates (ABR) in the Togo study site

1.5 Habitat mapping

Field-sampling was then conducted in the Bagan and Sarakawa-Kpleou riverine epidemiological breeding study sites from July 2009 to July 2010. The study sites were mapped and classified using a CSI-Wireless Differentially Corrected Global Positioning Systems (DGPS) Max receiver. This remote technology employed an OmniStar L-Band satellite signal yielding a positional error of .179 m (+/− .392 m). We then overlaid the georeferenced spatiotemporal-sampled S. damnosum s.l. larval habitat regression-based parameter estimators onto the QuickBird data in ArcGIS. We placed a robust digitized grid-based algorithm onto the satellite data to generate efficient spatial sampling units. Once overlaid, the ArcGIS grid-based data files consisted of columns and rows of uniform cells coded according to the parameter estimator ground coordinates sampled in each epidemiological study site. Each grid cell within the matrix contained an environmental-sampled attribute value as well as sampled black-fly habitat location-based geocoordinates (Figure 2).

Digitized grid-based matrix overlaid onto a QuickBird visible and near infra-red (NIR) data of the Bagan breeding study site

The Bagan and Sarakawa-Kpleou epidemiological riverine study sites were then examined extensively using longitude, latitude and altitude data. This criterion involved attaining the centrographic measures of spatial mean, distance between the sampled georeferenced larval habitats and the distance from sampled site to the nearest human habitation, for qualitatively assessing the sampled items within the selected clusters in ArcGIS. In this research the sampled habitat data was comprised of individual georeferenced observations together with a battery of categorical explanatory variables which were expanded into multiple attribute measures using histograms (Figure 3)

Histogram of Monthly Biting Rates (MBR) and distance to the nearest human habitation in the Togo study site

1.6 Environmental parameters

Distance measures were also recorded in ArcGIS spatial analyst as Euclidean distances The nearest source was determined by the Euclidean Distance function in ArcGIS^®. The Euclidean direction output raster contained the azimuth direction. The Euclidean Allocation function identified the nearest human habitation center closest to each digitized grid cell. This weighted function assigned space between the sampled larval habitats. These geometric distances in multi-dimensional space were computed as: distance(x,y) = {∑_i (x_i − y_i)²}^½. All observations sampled in this research are listed in Table 1.

Table 1.

Environmental-sampled cluster-based S. damnosum s.l parameter estimatators sampled in the Bagan and Sarakawa-Kpleou riverine breeding sites as entered in SAS^®

Variable	Description	Units
GCP	Ground control points	Decimal-degrees
FlOW	flowing water	Presence or absence
HGHT	Height of water
TURB	Turbidity of water	Formazin Turbidity Unit
AQVEG	Aquatic vegetation	Percentage
HGVEG	Hanging vegetation	Percentage
DDVEG	Dead vegetation	Percentage
RCKS	Rocks	Percentage
MMB	Man-made barriers	Type (e.g., damns, bridges)
DISHAB	Distance between habitats	meters

Open in a new tab

1.7 Spatial ecohydrological model

Three-dimensional models of the riverine epidemiological breeding study sites were then constructed based on Digital Elevation Model (DEM) statistics (Figure 4). The latest version of PCI Geomatics Orthoengine^® software was used to construct the models from the S. damnosum s.l estimators. We generated the DEMs from stereo data which required the use of geometric models and the DGPS ground coordinates of the sampled riverine larval habitats.

A Digital Elevation Model (DEM) based on georeferenced *S.damnosum s.l.* riverine habitats for the Sarakawa-Kpleou breeding study site

1.8 Regression analyses

Logistic regression models were then constructed using a 95% confidence level to ascertain whether the proportions of the sampled estimators in each riverine epidemiological study site differed by sampled larval habitat geolocations. In this research, the SAS procedures PROCMIXED were used to fit the linear larval habitat models. The regression models assumed independent Bernoulli outcomes denoted by Y_i = 0 or 1, taken at the sampled larval habitat sites (e,g., i = 1, 2, ⋯, n,). In probability and statistics, a Bernoulli process is a finite or infinite sequence of binary random variables, so theoretically it is a discrete-time stochastic process that takes only two values, canonically 0 and 1.^[11] The indicator values were then described by X_i, a 1-by-(K+1) vector of K values and a 1, for the intercept term, which represented a sampled larval habitat site geolocation I in each study site. The probability of a 1 being realized for the binary outcome data was then given by: P (Y_i = 1| X_i) = exp (X_iβ)/ [1 + exp (X_iβ)] (2.1) where β was the (K+1)-by-1 vector of non-redundant parameter estimators and P (Y_i = 0| X_i) = 1 − P (Y_i = 1|X_i).

A Poisson regression with statistical significance was also calculated by a 95% confidence level in SAS. We used the PROC GLIMMIX of SAS to fit the sampled S. damnosum s.l. estimators in each study site. In our Poisson model it was assumed that the dependent variable Y had a Poisson distribution given the independent variables [X₁, X₂, ⋯, Xm, P(Y=k| x₁, x₂, ⋯, x_m) = e^−µ µ^k / k!, k=0, 1, 2, ⋯⋯,] where the log of the mean µ was assumed to be a linear function of the independent variables. That is, log(µ) = intercept + b₁*X₁ +b₂*X₂ + ⋯ + b₃*X_m, implied that µ was the exponential function of the independent variables, when µ = exp(intercept + b₁*X₁ +b₂*X₂+⋯+b₃*X_m). The regression model was then rewritten in the following form: log(µ)log(N)+intercept+b₁*X₁+b₂*X₂+ ⋯ +b₃*X_m, where n was the total number of georeferenced larval habitats sampled in each study site. The logarithm of variable n was then used as an offset. By doing so, a regression-dependent parameter estimator with a constant coefficient of 1 was incorporated into the models. The log of the incidence, log (µ/n), was then quantified as a linear function of the independent variables in each model. The maximum likelihood method was then used to estimate the observational coefficients derived from the Poisson residuals.

In this research the parameter estimator λ_i(X_i) was both the mean and the variance of the Poisson distribution based on the regressed predictor covariate coefficients sampled at each riverine study site. The analyses assumed independent counts (i.e., n_i), taken at the sampled larval habitat locations i=1, 2⋯n, in each study site. That is, our Poisson regression assumed the response variable Y had a Poisson distribution and also assumed the logarithm of its expected value was modeled by a linear combination of the sampled parameter estimators in each study site. This expression was then written more compactly as log(E(Y | x) where x was an n+1-dimensional vector consisting of n independent variables concatenated to 1 and θ which in actuality was simply a linearly linked to b. Thus, in our Poisson residuals, θ was an input vector x and the predicted mean of the associated Poisson distribution rendered from the sampled riverine larval habitat data which for in our models was expressed as E(Y | x) = e^θ′x.

Thereafter, the sampled estimators were denoted by matrix X_i, a 1×p which was based on the vector of the coefficient measurement indicator values for a specific sampled riverine habitat location I in each study site. A variance-stabilizing transformation was also performed to allow the application of analysis of variance techniques to the models. The aim behind our variance-stabilizing transformation was to find a simple function f to apply to the sampled estimator values (i.e., x) in the ecological datasets to create new larval habitat values y = f (x) such that the variability of the sampled values (i.e., y) was not related to their mean value. While variance-stabilizing transformations are well known for certain parametric families of distributions, such as the Poisson and the binomial distribution, some types of data analysis proceed more empirically by searching among power transformations to find a suitable fixed transformation.^[9] Alternatively, if a time series-dependent data analysis suggests a functional form for the relation between variance and mean, this can be used to deduce a variance-stabilizing transformation.^[8]

Further, in the regression models the expected value of these data was provided by: µ_i (X_i)=n_i(X_i) exp(X_iβ) where, β was the vector of the non-redundant Poisson estimators which was provided by λ_i(X_i)=µ_i(X_i)/n_i(X_i) (2.2). Thereafter, the models took the form log(E(Y | x)) = a′x + b where a ε Rⁿ and b ε R. By positing salient estimators using Poisson-derived residuals, the maximization of an auto-Gaussian log-likelihood function and a set of eigenvectors where lambda is the sub-space of Rⁿ, can optimize time-series dependent predictor covariate coefficient estimates associated to georeferenced vector insect larval habitat observations^[8]

In this research, the Poisson models were generalized by introducing an unobserved heterogeneity term for the sampled observations i. Thus, the data was assumed to differ randomly in a manner that was not fully accounted for by the estimates rendered. These distributions were then re-formulated as $E (y_{i} | x_{i}, τ_{i}) = μ_{i} τ_{i} = e^{x_{i}^{'} β + ε_{i}}$ where the unobserved heterogeneity term τ_i = e^ε_i was independent of the vector of regressors X_i; thus, the distribution of y_i was conditional on X_i and τ_i was Poisson with a conditional mean and a conditional variance of $μ_{i} τ_{i} : f (y_{i} x_{i}, τ_{i}) = \frac{exp (- μ_{i} τ_{i}) {(μ_{i} τ_{i})}^{y_{i}}}{y_{i}!}$ . We then let g(τ_i) be the probability density function of τ_i. Then, the distribution f (y_i | x_i), was no longer conditional on τ_i in x in both models. By doing so, linear-dependent residual estimates were then obtained by integrating f (y_i | x_i,τ_i) with respect to $τ_{i} : f (y_{i} | x_{i}) = \int_{0}^{\infty} f (y_{i} | x_{i}, τ_{i}) g (τ_{i}) d τ_{i}$ .

The regression residuals also revealed that the S. damnosum s.l. riverine larval habitat data attributes contained a constant term. As such, it was necessary to assume that E(e^ε_i) = E(τ_i) = 1 in order to identify the mean of the distributions. We had assumed that τ_i followed a gamma (θ,θ) distribution in the models with E(τ_i) − 1 and $v (τ_{i}) - 1 / θ : g (τ_{i}) = \frac{θ^{θ}}{Γ (θ)} τ_{i}^{θ - 1} exp (- θ τ_{i})$ where $Γ (x) = \int_{0}^{\infty} z^{x - 1} exp (- z) d z$ was the gamma function and θ was a positive parameter. Thus, the density of y_i in the model residuals was X_i which was then defined using $f (y_{i} | x_{i}) = \int_{0}^{\infty} f (y_{i} | x_{i}, τ_{i}) g (τ_{i}) d τ_{i} = \frac{θ^{θ} μ_{i}^{y_{i}}}{y_{i}! Γ (θ)} \int_{0}^{\infty} e^{- (μ i + θ) τ_{i}} τ_{i}^{θ + y_{i} - 1} d τ_{i} = \frac{Γ (y_{i} + θ)}{y_{i}! Γ (θ)} {(\frac{θ}{θ + μ_{i}})}^{θ} {(\frac{μ_{i}}{θ + μ_{i}})}^{y_{i}} = \frac{θ^{θ} μ_{i}^{y_{i}} Γ (y_{i} + θ)}{y_{i}! Γ (θ) {(θ + μ_{i})}^{θ + y_{i}}}$ Unfortunately, extra-Poisson variation was detected in the variance estimates of the larval habitat models. Evidence of overdispersion indicates inadequate fit of the Poisson model.^[9] A common way to deal with overdispersion for count data is to use a generalized linear model (GLM) framework, where the most common approach is a “quasi-likelihood,” matrix with Poisson-like assumptions (i.e., quasi-Poisson) or a negative binomial model.^[10] As such, we constructed robust negative binomial regression models in PROC GLIMMIX with non-homogenous gamma distributed means by incorporating $α = \frac{1}{θ} (α \lor 0)$ in equation 2.1 as in Jacob et al.^[8] The distribution was then rewritten as $f (y_{i} | x_{i}) = \frac{Γ (y_{i} + α^{- 1})}{y_{i}! Γ (α^{- 1})} (\frac{α^{- 1}}{α^{- 1} + μ_{i}}) α^{- 1} {(\frac{μ_{i}}{α^{- 1} + μ_{i}})}^{y_{i}}, y_{i} = 0, 1, 2, \dots$ The negative binomial distribution was then derived as a gamma mixture of the Poisson random variables. In both models the conditional mean was $E (y i | x i) = μ i = e^{x_{i}^{'} β}$ and the conditional variance was $ν (y_{i} | x_{i}) = μ_{i} [1 + \frac{1}{θ} μ_{i}] = μ_{i} [1 + α μ_{i}] \lor E (y_{i} | x_{i})$ .

To further quantify the regression residuals we specified DIST=NEGBIN(p=1) in the MODEL statement in PROC REG. The negative binomial model NEGBIN1, set p = I, then revealed the variance function v(y_i | x_i) − µ_i + αµ_i was linear in the mean in both model residuals. The log-likelihood function of the NEGBIN1 models was then provided by $φ = \sum_{i = 1}^{N} {\sum_{j = 0}^{y_{i} - 1} ln (j + α^{- 1} exp (x_{i}^{'} β)) = - ln (y_{i}!) - (y_{i} + α^{- 1} exp (x_{i}^{'} β)) ln (1 + α) + y_{i} ln (α)}$ . Thereafter, the gradient for the model error was computed using $\frac{\partial φ}{\partial β} = \sum_{i = 1}^{N} {(\sum_{j = 0}^{y_{i} - 1} \frac{μ_{i}}{(j α + μ_{i})}) x_{i} - α^{- 1} ln (1 + α) μ_{i} x_{i}} and \frac{\partial φ}{\partial α} = \sum_{i = 1}^{N} {- (\sum_{j = 0}^{y_{i} - 1} \frac{α^{- 1} μ_{i}}{(j α + μ_{i})}) - α^{- 2} μ_{i} ln (1 + α) - \frac{(y_{i} + α^{- 1} μ_{i})}{1 + α} + \frac{y_{i}}{α}}$ .

In this research, the negative binomial regression models with variance function $ν (y_{i} | x_{i}) = μ_{i} + α μ_{i}^{2}$ , was referred to as the NEGBIN2 model. To estimate these models, we specified DIST=NEGBIN (p=2) in the MODEL statements. A test of the Poisson distribution was then performed by examining the hypothesis that $α = \frac{1}{θ_{i}} = 0$ . A Wald test of this hypothesis was also provided which we used to report t statistics for the regression residuals. The log-likelihood functions of the models (NEGBIN2) was then rendered by the equation $φ = \sum_{i = 1}^{N} {\sum_{i = 1}^{y_{i} - 1} ln (j + α^{- 1}) - ln (y_{i}!) = - (y_{i} + α - 1) ln (1 + α exp (x_{i}^{'} β)) + y_{i} ln (α) + y_{i} x_{i}^{'} β}$ where y was an integer when the gradient was $\frac{\partial φ}{\partial β} - \sum_{i = 1}^{N} \frac{y_{i} - μ_{i}}{1 + α μ_{i}} x_{i} and \frac{\partial φ}{\partial α} = \sum_{i = 1}^{N} {- α^{- 2} \sum_{j = 0}^{y_{i} - 1} \frac{1}{(j + α^{- 1})} + α^{- 2} ln (1 + α μ_{i}) + \frac{y_{i} - u_{i}}{α (1 + α μ_{i})}}$ .

1.9 DW statistics

Thereafter, we constructed a first-order autoregressive AR(1) error framework. The AR(1) models were constructed using the georeferenced estimators sampled from the Bagan and Sarkawa-Kpleau riverine breeding study sites. Each model was defined as $X_{t} = c + \sum_{i = 1}^{p} φ_{i} X_{t - i} + ε_{t}$ where φ₁,⋯,φ_p was a sampled observation at a particular study site, c, was a constant and ε_t was white noise.

In this research the AR(1)-process was provided by X_t = c + φX_t−1 + ε_t in the modelswhere ε_t was a white noise process with zero mean and variance $σ_{ε}^{2}$ . Thereafter, we used classifications, where, the autoregressive parameter processes were defined as wide-sense stationary, if | φ |∧ 1. This value was obtained as the output of stable filters whose input was white noise. The predictive autoregressive riverine larval habitat models was then denoted by µ, thus, E(X_t) = E(c) + φE(X_t−1) + E(ε_t). such that µ = c + φu + 0. Additionally, in the residuals if c was equal to 0, then the mean was 0. The variance was then delineated $var (X_{t}) = E (X_{t}^{2}) - μ^{2} = \frac{σ_{ε}^{2}}{1 - φ^{2}}$ , where σ_ε was the standard deviation of ε_t. This was revealed by noting that var(X_t) = φ² var(X_t−1) + σ². Further, we noted that for $B_{n} = E (X_{t + n} X_{t}) - μ^{2} = \frac{σ_{ε}^{2}}{1 - φ^{2}} φ^{| n |}$ the autocovariance function in the residuals decayed with a time constant as defined by τ = −1 / ln(φ) in the models. Thus, in order to further define the autocorrelation function we wrote B_n = Kφ^{| n |} where K was independent of n. We noted that φ^{| n |} = e^{| n | lnφ}. We then matched this value to the exponential decay law e^{−n / τ}. We then noticed that the spectral density function rendered was the Fourier transform of the autocovariance function in both models. The Fourier transform is a mathematical operation that decomposes a signal into its constituent frequencies ^[9]. In this research, the discrete-time Fourier transform in the models was

Φ (w) = \frac{1}{\sqrt{2 π}} \sum_{n = - \infty}^{\infty} B_{n} c^{- i w n} = \frac{1}{\sqrt{2 π}} (\begin{matrix} σ_{ε}^{2} \\ 1 - φ^{2} - 2 φ cos (w) \end{matrix})

This expression was periodic due to the discrete nature of the X_j which was manifested as the cosine term in the denominators. We assumed that the sampling time (i.e., Δt = 1) was much smaller than the decay time (i.e., τ) in the models. By doing so, we were then able to use a continuum approximation values to B_n: $B (t) \approx \frac{σ_{ε}^{2}}{1 - φ^{2}} φ^{| t |}$ which yielded a Lorentzian profile for the spectral density equation which was then delineated using $Φ (ω) \frac{1}{\sqrt{2 π}} \frac{σ_{ε}^{2}}{1 - φ^{2}} \frac{γ}{π (γ^{2} + ω^{2})}$ where γ = 1/τ was the angular frequency associated with τ. In vector insect larval habitat modeling, Lorentz distribution is closely related to the Poisson kernel, which is the fundamental solution for the Laplace equation in the upper half-plane.^[8] In this research the Lorentz distribution had the probability density function:

f (x; x_{0}, γ) - \frac{1}{π γ [1 + {(\frac{x - x_{0}}{γ})}^{2}]} - \frac{1}{π} [\frac{γ}{{(x - x_{0})}^{2} + γ^{2}}]

where x₀ was the sampled S. damnosum s.l. predictor covariate coefficients specifying the geolocation of the peak of the distribution where γ was the scale estimator which specified the half-width at half-maximum. In our Lorentz distribution γ was also equal to half the interquartile range (i.e., the probable error in the model residuals).

Further, an alternative expression for X_t was derived from the epidemiological riverine larval habitat models by first substituting c − φX_t−2 − ε_t−1 for X_{t − 1} in the defining equations. Continuing this process n times yielded: $X_{t} = c \sum_{k = 0}^{N - 1} φ^{k} + φ^{N} X_{t - N} + \sum_{k = 0}^{N - 1} φ^{k} ε_{t - k}$ . We noticed that for n approaching infinity, φ^N our model residuals approached zero and $X_{t} = \frac{c}{1 - φ} + \sum_{k = 0}^{\infty} φ^{k} ε_{t - k}$ . The residuals also revealed that X_t was white noise convolved with the φ^k kernel plus the constant mean. If the white noise ε_t is a Gaussian process then X_t is also a Gaussian process in a robust predictive vector insect larval habitat cluster-based regression model. ^[8] The model residual estimates also revealed that X_t was normally distributed when φ was close to one i.

The DW statistic was then spatially derived for each model to detect the relationships between the sampled riverine larval habitats values separated from each other by a given time lag based on the forecasted uncertainty estimates from the regression analysis. We used the DWPROB option in SAS to print the significance level (i.e., p-values) for the DW tests. The DW statistic then tested the null hypothesis H_o: φ₁ = 0 against H₁: −φ₁ ∨ 0 in both models

In this research, the generalized DW statistic was written as: ${DW}_{j} = \frac{{\hat{u}}^{1} A_{j}^{'} A_{j} \hat{u}}{{\hat{u}}^{1} \hat{u}}$ where û was a vector of OLS residuals and A_j was a (T − j)×T matrix. The generalized DW statistic DW_j was then rewritten as: ${DW}_{j} = \frac{Y^{'} M A_{j}^{'} A_{j} M Y}{Y^{'} M Y} = \frac{η^{'} (Q_{1}^{'} A_{j}^{'} A_{j} Q_{1}) η}{η^{'} η}$ where $Q_{1}^{'} Q_{1} = I_{τ - k}, Q_{1}^{'} X = 0, and η = Q_{1}^{'} u$ . The marginal probability for the DW statistic in the models was then Pr(DW_i ∧ c) = Pr(h∧0) where $h = η^{'} (Q_{1}^{'} A_{j}^{'} A_{j} Q_{1}) η$ . Thereafter, the p-value, (i.e., the marginal probability for the generalized DW statistic) was computed by numerical inversion of the characteristic function ϕ(u) using the quadratic expression $h = η^{'} (Q_{1}^{'} A_{j}^{'} A_{j} Q_{1} - c 1) η$ . The trapezoidal rule approximation to the marginal probability Pr(h ∧ 0) was then specified by

Pr (h \land 0) = \frac{1}{2} - \sum_{k = 0}^{K} \frac{lm [ϕ ((k + \frac{1}{2}) Δ)]}{π (k + \frac{1}{2})} + E_{t} (Δ) + E_{τ} (K)

where lm[ϕ] was part of the characteristic function and E_t(Δ) and E_T(K) were integration factors and truncation errors, respectively. The trapezoidal rule is a way to calculate the definite integral.^[9]

A numerically efficient algorithm was then used to quantify the error components in the first- order autocorrelation models. To do so required O(N) operations for the evaluation of the characteristic function ϕ(u) in each model. In this research the characteristic function in each autoregressive S. damnosum s.l. larval habitat model was denoted by

ϕ (u) {| I - 2 i u (Q_{1}^{'} A_{j}^{'} A_{j} Q_{1} - c I_{N - k}) |}^{- \frac{1}{2}} | v |^{- \frac{1}{2}} | x^{'} v^{- 1} x |^{- \frac{1}{2}} | x^{'} x |^{\frac{1}{2}}

where $v = (1 + 2 i u c) I - 2 i u c A_{j}^{'} A_{j}$ and $i = \sqrt{- 1}$ . Thereafter, by applying the Cholesky decomposition to the complex matrix V, we obtained the lower triangular matrix G which satisfied V= GG’ in both models. Cholesky decomposition is a decomposition of a symmetric, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose^[10]. The characteristic function then evaluated O(N) operations in the models by using: $ϕ (u) - | G |^{- 1} | X^{+'} X^{+} |^{- \frac{1}{2}} | x' x |^{\frac{1}{2}}$ where X⁺ = G⁻¹X.

In AUTOREG, two alternative statistics (i.e., Durbin h and t) were also employed to test for time varying autoregressive residual uncertainty coefficients that may have been asymptotically equivalent in the models. The Durbin-Watson tests are not valid when the lagged dependent variable is used in the regression model thus, the Durbin h test or Durbin t test can be used to test for first-order autocorrelation.^[10] For the Durbin h test, we specified the name of the lagged dependent variable in the LAGDEP= option. The h statistic was then written as: $h = \hat{p} \sqrt{\frac{N}{1 N \overset{⌢}{V}}}$ where $\hat{p} = \sum_{l = 2}^{N} \begin{matrix} {\overset{⌢}{V}}_{l} {\overset{⌢}{V}}_{l - 1} \\ \sum_{l = 1}^{N} {\overset{⌢}{V}}_{l}^{2} \end{matrix}$ , and $\overset{⌢}{V}$ was the least squares variance estimate for the coefficient of the lagged dependent variables. Durbin’s t test consists of regressing the OLS residuals ${\overset{⌢}{V}}_{l}$ on explanatory observational al variables and ${\overset{⌢}{V}}_{l - 1}$ for quantifying the significance of the estimate for coefficient of ${\overset{⌢}{V}}_{l - 1}$ .^[10]

In PROC AUTOREG, an estimation method was then used to construct multiple first-order autoregressive error matrices using the Yule-Walker (YW) method. The equation defining the AR processes in the models was constructed using $X_{t} = \sum_{i = 1}^{p} φ_{i} X_{t - i} + ε_{t}$ . The YW equations we used included $γ_{m} = \sum_{k = 1}^{p} φ_{k} γ_{m - k} + σ_{ε}^{2} δ_{m, 0}$ , where m = 0,⋯,p, which yielded p + 1 equations; where γ_m was the autocorrelation function of X, σ_ε was the standard deviation of the input noise process; and, δ_m,0 was the Kronecker delta function. The Kronecker's delta, in a robust vector insect larval habitat regression distribution model is a function of two sampled independent observations, usually integers where 1 is represented as 1 and everything above 1 and 0 is equal to 0.^[8]

Because the last part of our equations was non-zero only when m = 0, the first-order error estimation equations were solved by representing the sampled riverine larval habitat data as a matrix for m > 0, thus, $[\begin{matrix} γ_{1} \\ γ_{2} \\ γ_{3} \\ ⋮ \end{matrix}] = [\begin{matrix} γ_{0} & γ_{1} & γ_{2} & \dots \\ γ_{1} & γ_{0} & γ_{- 1} & \dots \\ γ_{2} & γ_{1} & γ_{0} & \dots \\ ⋮ & ⋮ & ⋮ & ⋱ \end{matrix}] [\begin{matrix} φ_{1} \\ φ_{2} \\ φ_{3} \\ ⋮ \end{matrix}]$ solved all φ. Further, for m = 0 the model rendered $γ_{0} = \sum_{k = 1}^{p} φ_{k} γ_{- k} + σ_{ε}^{2}$ which subsequently allowed us to solve $σ_{ε}^{2}$ . The full autocorrelation function was then derived by recursively calculating $ρ (τ) = \sum_{k = 1}^{p} α_{k} ρ (k - τ)$ in the estimators. In this research, the YW equations were γ₁ = φ₁γ₀ + φ₂γ₋₁ and γ₂ = φ₁γ₁ + φ₂γ₀ when γ_−k = γ_k. The equations then yielded $ρ_{1} = γ_{1} / γ_{0} = \frac{φ_{1}}{1 - φ_{2}}$ and the recursion formula rendered $ρ_{1} = γ_{1} / γ_{0} = \frac{φ_{1}^{2} - φ_{2}^{2} + φ_{1}}{1 - φ_{2}}$ in the residual variance.

The equations defining the AR processes in the models were then defined using $X_{t} = \sum_{i = 1}^{p} φ_{i} X_{t - i} + ε_{t}$ . Thereafter, we multiplied both sides by X_{t − m} and impute the expected values which yielded $E [X_{i} X_{t - m}] = E [\sum_{i - 1}^{p} φ_{i} X_{t - i} X_{t - m}] + E [ε_{t} X_{t - m}]$ in the models. We then noted that E[X_tX_t−m] = γ_m was the autocorrelation function in both models. The values of the noise function in the models were independent on each other and X_{t − m} was independent of ε_t where m was greater than zero. The autoregressive estimates revealed for m >0, E[ε_tX_{t − m}]=0 in the model residuals. Further, we noted that when m was 0, $E [ε_{t} X_{t}] = E [ε_{t} (\sum_{i = 1}^{p} φ_{i} X_{i - 1} + ε_{t})] = \sum_{i = 1}^{p} φ_{i} E [ε_{t} X_{t - i}] + E [ε_{t}^{2}] = 0 + σ_{ε}^{2}$ . This equation also rendered $γ_{m} = E [\sum_{i = 1}^{p} φ_{i} X_{t - i} X_{t - m}] + σ_{c}^{2} δ_{m}$ when m ≥ 0 Thereafter, we employed $E [\sum_{i = 1}^{p} φ_{i} X_{t - i} X_{t - m}] = \sum_{i = 1}^{p} φ_{i} E [X_{t} X_{t - m + i}] = \sum_{i = 1}^{p} φ_{i} γ_{m - i}$ , which yielded $γ_{m} - \sum_{i = 1}^{p} φ_{i} γ_{m - i} + σ_{c}^{2} δ_{m}$ and $γ_{m} = γ_{- m} = \sum_{i = 1}^{p} φ_{i} γ_{| m | - i} + σ_{c}^{2} δ_{m}$ for m ∧ 0.

We then let φ represent the vector of the residual parameter estimators, φ = (φ₁,φ₂,⋯,φ_m)′, and we let the variance matrix of the error vector be v = (v₁,⋯,v_N)′Σ, E(vv Σ=σ²v). If the vector of auto-regressive parameters φ is known, the matrix v can be computed using regression-based estimators and Σ which can then delineate σ²v.^[10] Given Σ the efficient estimates of the S. damnosum s.l. larval habitat regression parameters, β was computed using GLS for both models. The GLS yielded the unbiased estimate of the variance σ² in the model residuals.

The calculation of v from for the general S. damnosum s.l. larval habitat AR analyses was complicated as it was completely dependent on the number of sampled observations in both models. Instead of actually calculating v and performing GLS in the usual way, we used a Kalman filter algorithm to transform the sampled data and compute the GLS results through a recursive process. The Kalman filter, also known as linear quadratic estimation, is an algorithm which uses a series of measurements observed over time, containing noise (i..e, random variations) and other inaccuracies, and produces estimates of unknown variables that tend to be more precise than those that would be based on a single measurement alone. ^[10] More formally, in this research, the Kalman filter operated recursively on streams of the noisy input riverine larval habitat data to produce statistically optimal estimators. The Shapiro-Wilk test was then used to test the null hypothesis that the sampled riverine larval habitat estimators x₁,⋯,x_n. came from a normally distributed population.

1.11 Bayesian analyses

We then used PROC MCMC for generating the multivariate density functions in a Bayesian estimation matrix. In PROC MCMC we used the logarithm of LOGMPDFWISHART for determining the Wishart distribution and, thereafter, the logarithm LOGMPDFIWISHART for attaining a robust inverted-Wishart distribution. We let × be an n-dimensional random vector with mean vector µ and covariance matrix Σ. The density in the models was then $p d f (x, μ, Σ) = \frac{exp (- \frac{1}{2} {(x - μ)}^{T} Σ^{- 1} (x - μ))}{\sqrt{(2 π) n | Σ |}}$ where | Σ | was the determinant of the covariance matrix Σ. The density functions from the Wishart distribution in the models was then

p d f (x, μ, Σ) = \frac{1}{C_{n} (μ)} | Σ |^{- \frac{μ}{2}} | x |^{\frac{μ - n - 1}{2}} exp (- \frac{1}{2} t r (Σ^{- 1} x))

with µ∨ n. The trace of the square matrices A was then re-written as: $t r (A) = \sum_{i} a_{i i} C_{n} (μ) = 2^{\frac{μ n}{2}} Γ_{n} (\frac{μ}{2}) Γ_{n} (z) = π^{\frac{n (n - 1)}{4}} \prod_{i - 1}^{n} Γ (z - \frac{t - 1}{2})$ The density function from the inverse-Wishart distribution was

p d f (x, μ, Σ) = \frac{1}{D_{n} (μ)} | Σ |^{\frac{μ - n - 1}{2}} | x |^{- \frac{μ}{2}} exp (- \frac{1}{2} t r (Σ x^{- 1}))

for µ∨ 2n, and $D_{n} (μ) = 2^{\frac{μ - n - h ε}{2}} Γ_{n} (\frac{μ - n - 1}{2})$ in both models.

The marginal and conditional distributions from the inverse Wishart-distributed matrices were then further evaluated using A ~ W⁻¹(ψ,m). We partitioned the matrices for determining if ψ was con formable with each other using: $A = [\begin{matrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{matrix}]$ and $ψ = [\begin{matrix} ψ_{11} & ψ_{12} \\ ψ_{21} & ψ_{22} \end{matrix}]$ where A_ij and ψ_ij were p_i x p_j matrices. We then determined if: i) A₁₁ was independent of $A_{11}^{- 1} A_{22}$ and A₂₂, when $A_{22 • 1} = A_{22} A_{21} A_{11}^{- 1} A_{12}$ was the Schur complement of A₁₁ ii) A₁₁ ~ W⁻¹(ψ₁₁,m − p₂) ;iii) A_↓11^↑ (−1)A_↓12 | A_↓ (22•1) | ~ MN_↓ (p_↓1xp_↓2)(ψ _↓12.A_↓(22•1) ⊗ ψ_↓11^↑ (−1)) where MN_pxq was a matrix normal distribution rendered from the sampled estimators in each epidemiological riverine breeding study site; and, iv) A_22•1 ~ W⁻¹(ψ₁₁,m).

We used the conjugate distribution to make inference about a covariance matrix Σ for each model whose prior had a W⁻¹(ψ₁₁,m) distribution. We assumed if the sampled riverine larval habitat observations X = x₁,⋯,x_n were independent p-variate Gaussian variables drawn from a georeferenced distribution pattern, then the conditional distribution had a W⁻¹(A + ψ, n + m) when A =XX^T was n times the sample covariance matrix. Because in this research, the prior and posterior distributions were from the same family, the inverse Wishart distribution was the conjugate to the multivariate Gaussian derived from the sampled estimators in both models.

In this research the probability density function was $p (B | Ψ, m) = \frac{| Ψ |^{m / 2} | B |^{- (m + p + 1) / 2} exp (- tr (Ψ B^{- 1}) / 2)}{2^{m p / 2} Γ_{p} (m / 2)}$ where Ψ = Σ⁻¹ and Γ_p(•) was the multivariate gamma function. The multivariate gamma function, [i.e., Γ_p], is a generalization of the gamma function which is useful in multivariate statistics which commonly appears in the probability density function of the inverse Wishart distributions.^[10].. We also noticed that the gamma function had two equivalent expressions in both model estimates. One was Γ_p(a) = ∫_{S∨ 0} exp(−trace(S)) | S |^a−(p+1)/2 dS where S∨ 0, meaning S was positive-definite. The other one, was $Γ_{p} (a) = π^{p (p - 1) / 4} \prod_{j = 1}^{p} Γ [a + (1 - j) / 2]$ from which we determined the recursive relationships in the S. damnosum s.l. riverine larval habitat parameter estimators sampled at the epidemiological riverine study sites when $Γ_{p} (a) - π^{(p - 1) / 2} Γ (a) Γ_{p - 1} (a - \frac{1}{2}) - π^{(p - 1) / 2} Γ_{p - 1} (a) Γ [a + (1 - p) / 2]$ . Thus, in this research Γ₁(a) = Γ(a), Γ₂(a) = π^{1 / 2}Γ(a)Γ(a − 1 / 2) and Γ₃(a) = π^{3 / 2}Γ(a)Γ(a − 1/2)Γ(a − 1).

1.12 Model validation

The residual riverine larval habitat parameter estimators were then validated using weighted cumulative models. The approach was implemented following the line of goodness of fit testing. Initially, a test statistic using a cumulative residual formulation was developed which was generalized to binary/discrete data with proper link functions. We aimed for parsimony and plausibility of the predicted auto-regressive residual intra-cluster-based error estimates. Under the null hypothesis that the outcome was independent of the sampled riverine larval habitat data, we generated [i.e., (s_i, r_i)], which was conditional on our spatiotemporal-sampled data attributes (X_i). We assumed that the outcome could be verified using $Y_{i} = X_{i} β + e_{i}, e_{i} \overset{ind}{~} (0, σ^{2} / w_{i})$ where β was a p × 1 vector of the regression-based parameters (i.e., σ²) which itself was an unknown variance parameter when w_i > 0. The weights assigned to in the validation models were i(i = 1, ⋯, n). The error terms, e_i, was then independent with mean 0 and variance σ²/w_i in the cluster-based regression residuals.

In the validation models, the weights, w_i, represented the extra regional variability in Y_i. We estimated β, and σ² by σ̂² employing $U_{β} (β) = \sum_{i = 1}^{n} U_{i β} = \sum_{i = 1}^{n} x_{i} X_{i}^{T} (Y_{i} - X_{i} β) / σ^{2} = 0$ and $U_{σ} (σ) = σ^{2} - 1 / n \sum_{i = 1}^{n} w_{i} {(Y_{i} - X_{i} β)}^{2} = 0$ which simultaneously solved both equations. These conjectures were derived assuming $e_{i} \overset{ind}{~} N (0, σ^{2} / w_{i})$ . From these explanatory error estimating equations the residuals, ê_i = Y_i − X_iβ̂ was used to test the ABR-stratified cluster-based larval habitat predictor covariate coefficient patterns in the Bagan and Sarakawa-Kpleou riverine epidemiological study sites for quantifying higher than expected sum of residuals. Sum of residuals is a natural test statistic to use for regression validation as it has a defined distribution, and it has monotonic properties such that areas with higher sum of residuals can indicate areas with higher than expected outcomes.^[9] Thereafter, a two-dimensional moving block process was used over the forecasted locations of the prolific larval habitats employing (x₁, x₂), Z_loc(x₁, x₂|b)] in each riverine study site which rendered

Z_{l o c} (x_{1}, x_{2} | b) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} W_{i} (x_{1}, x_{2} | b) {\hat{e}}_{i}

where W_i(x₁, x₂|b) = I (x₁ − b < s_i ≤ x₁ + b, x₂ − b < r_i ≤ x₂ + b) and w_i was a weighted indicator function.

2. Results

Our MBR-related histograms revealed that September had the highest MBR rates (1,800) at approximately 5.5 km while June, July and October had MBR rates of 1,100–1,200 MBR measured at 3.5 km. The lowest MBR was in April at 100 MBR at a distance less than 1 km. Low to zero levels of transmission was measured inside villages less than 1 km distant from the riverine-based variations in the vectorial efficiency of the fly populations ^[2] (Figure 3).

In this research, ANOVA provided a test of whether or not the means of the high and low density ABR within-cluster based regression residual estimates were equal. Our F-test's p-values did not approximate the permutation test's p-values; thus, indicating that the within residual cluster-based observational data did not have the same effect in the high and low ABR-clusters. A power analyses was then performed. The test provided the probability of rejecting a false null hypothesis (i.e., a Type II error) in both models. As power increases, the chances of a Type II error decrease. ^[10] In this research, the probability of a Type II error was the false negative rate (β); therefore, power in the larval habitat models was equal to 1−β. In Beta error probability sampling the power of a test is defined as 1-beta.^[10] The ANOVA contained repeated measures factors. The repeated measure design controlled for subject heterogeneity between the individual sampled larval habitat differences in the high and low ABR-stratified within residual cluster-based varying and constant predictor covariate coefficients.

We then tested for serial error correlation with lagged dependent variables in the models using the AUTOREG procedure for generating the Generalized DW Tests from the sampled estimators. Initially, we used the equation Y = X β + v, where X was an n x k data matrix, β was a k x 1 covariate coefficient vector and v was a n x 1 disturbance vector. The error term v was assumed to be derived by the jth-order autoregressive process: where v_I = ε_I − φ_jv_{I − j} where |φ_j| ∧ I, ε_I was a sequence of independent normal error terms generated from the explanatory estimators using a mean of 0 and the variance σ². We then used the DW statistic to test the null hypothesis H_o: φ₁ = 0 against −H₁: −φ₁ ∨ 0. This revealed that when the generalized DW statistic was: $d_{j} = \frac{\sum_{I = j + 1}^{N} {({\hat{ν}}_{I} - {\hat{ν}}_{I - j})}^{2}}{\sum_{I = 1}^{N} {\hat{ν}}_{I}^{2}}$ thus v̂ were OLS residuals estimates in both models We then used the matrix notation, $d_{j} = \frac{Y^{'} M A_{j}^{'} A_{j} M Y}{Y^{'} M Y}$ where M = I_N − X(X′X)⁻¹ X′ and A_j constituted a (N − j)× N matrix:

A_{j} = [\begin{matrix} - 1 & 0 & \dots & 0 & 1 & 0 & \dots & 0 \\ 0 & - 1 & 0 & \dots & 0 & 1 & 0 & \dots \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & \dots & 0 & - 1 & 0 & \dots & 0 & 1 \end{matrix}]

which only existed in the models when j−1 zeros were between −1 and 1 in each row of matrix A_j. The test revealed that the QR factorization of the design matrix yielded a n x n orthogonal matrix when Q: X = QR and when R was an n x k upper triangular matrix. The tests also revealed there existed n × (n-k) sub-matrices of Q such that $Q_{1} Q_{1}^{'} = M and Q_{1}^{'} Q_{1} = I_{N - k}$ in the residual estimates. Consequently, the generalized DW statistic was stated as a ratio of two quadratic forms: $d_{j} = \frac{\sum_{l = 1}^{n} λ_{j l} ξ_{l}^{2}}{\sum_{l = 1}^{n} ξ_{l}^{2}}$ where λ_jl ⋯ λ_jn when the upper n eigenvalues of $M A_{j}^{'} A_{j} M$ and ξ_l was a standard normal variate, and n = min(N − k,N − j) These eigenvalues were obtained by a singular value decomposition of $Q_{1} A_{j}^{'}$ . The singular value decomposition of an m×n real or complex matrix M is a factorization of the form M = UΣV^* where U is an m×m real or complex unitary matrix, Σ is an m×n diagonal matrix with non-negative real numbers on the diagonal, and V* (i.e., the conjugate transpose of V) is an n×n real or complex unitary matrix^[10] In this research, the diagonal entries Σ_i,i of Σ in the model estimates were the singular values of M. The m columns of U and the n columns of V were then the left singular vectors and right singular vectors of M, respectively in each riverine epidemiological larval habitat model.

In this research, the marginal probability for d_j rendered c_o in both model residual variance estimatations procedures were quantified using

Pr ob (\frac{\sum_{l = 1}^{n} λ_{j l} ξ_{l}^{2}}{\sum_{l = 1}^{n} ξ_{l}^{2}} \land c_{o}) = Pr ob (q_{j} \land 0)

where $q_{j} = \sum_{l = 1}^{n} (λ_{j l} - c_{o}) ξ_{l}^{2}$ . Additionally, when the null hypothesis H_o: φ₁ = 0 held, the quadratic form qj had the characteristic function $ϕ_{j} {(t) = \prod_{l = 1}^{n} (1 2 (λ_{j l} c_{o}) i t)}^{- \frac{1}{2}}$ . The distribution function was then uniquely determined by this characteristic function: $F (x) = \frac{1}{2} + \frac{1}{2 π} \int_{0}^{\infty} \frac{e^{i t x} ϕ_{j} (- t) - e^{- i t x} ϕ_{j} (t)}{i t} d t$ . We then tested H_o: φ₄ = 0 given φ₁ = φ₂ = φ₃ = 0 against H_o: −φ₄ ∨ 0, in each autoregressive S. damnosum s.l. riverine larval habitat model using the marginal probability (p-value) and $F (0) = \frac{1}{2} + \frac{1}{2 π} \int_{0}^{\infty} \frac{(φ_{4} (- t) - φ_{4} (t))}{i t} d t where φ_{4} (t) = \prod_{i = 1}^{n} {(1 - 2 (λ_{4 l} - {\hat{d}}_{4}) i t)}^{- \frac{1}{2}}$ and d̂₄ which was the calculated value of the fourth-order DW statistic.

The DW statistics were then used to determine whether the OLS regression estimates indicated significant serial uncertainty correlation with an estimated order of a lagged covariance of 1 in the larval habitat models. The AUTOREG procedure corrected for the serial correlation using the YW method. The DW statistics indicated that uncertainty correlation was only slightly significant in the YW corrected models. The YW estimates for the first-order serial autocorrelation Bagan model indicated a R²=0.574, F statistics of 37.159, and Durbin-Watson score of 3.877 while YW estimates for the Sarakaw-Kpleau model indicated a R²=0.4911, F statistics of 38.541 and Durbin-Watson score of 3.713.

The distribution of the error residuals in the second-order autocovariance matrices were then assessed. Initially, the parameter estimators sampled in the riverine epidemiological study sites were qualitatively assessed using their cluster-specific repeated predictor covariate coefficient indicator measurement values (Table 2). Variance decomposition based upon pseudo-R² values and model diagnostics was then obtained for each model by regressing the observed on the predicted standardized rates (Table 3). The maximum value of I was then obtained by all of the variation of z as explained by the eigenvector u_i which corresponded to the highest eigenvalues in the matrices in both models. Thereafter, cor²(u_i,z) = 1 and cor²(u_i,z) = 0 for i ≠ 1) and the maximum value of I was deduced for

I (x) = \sum_{i = 1}^{n - r} I (u_{i}) c o r^{2} (u_{i}, z)

which was equal to $I_{max} = λ_{i} (\frac{n}{1^{T} W 1})$ in the model residuals. The minimum value of I in the autocovariate parameter error matrices was then obtained as the variation of z which in this research was explained by the eigenvector u_n−r corresponding to the lowest eigenvalue λ_n−r. This minimum value was then determined to be equal to $I_{min} = λ_{n - r} (\frac{n}{1^{T} W 1})$ . The autoregressive error matrices then revealed if the sampled predictor covariate coefficient estimates were not spatialized; the part of the variance explained by each eigenvector was equal to $c o r^{2} (u_{i}, z) = \frac{1}{n} - 1$ . Because the data in z were randomly permuted in the riverine larval habitat models it was assumed that we would obtain this result. The set of n! random permutations in the models, then revealed that:

E_{R} (l) = \frac{n}{I^{T} W 1 (n - 1)} \sum_{i = 1}^{n} λ_{i} = \frac{n}{I^{T} W 1 (n - 1)} t r a c e (Ω)

Additionally, the riverine larval habitat model residuals demonstrated that $t r a c e (Ω) = \frac{I^{T} W 1}{n}$ and that $E_{R} (I) = - \frac{1}{n - 1}$ existed.

Table 2.

Similium damnosum s.l. parameter estimates with cluster specific repeated within cluster based covariate measures

Breeding site and sampled covariates	With no random effects		With random effects
Breeding site and sampled covariates	Parameter estimate	Standard error	Parameter estimate	Standard error
Bagan
intercept	−1.5555	0.1417	−1.5487	0.1228
TURB	0.0126	0.0032	0.0124	0.0028
RCKS	0.01459	0.0127	0.1509	0.0108
AQVEG	0.2222	0.0347	0.2449	0.0303
HGVEG	0.6090	0.0774	0.5230	0.0675
MMBR	0.3334	0.0693	0.2743	0.0600
random effects	0		1
scale	3.4338		2.9112
Sarakawa-Kpelou
intercept	−1.4404	0.1507	−1.4119	0.1463
DISHAB	0.0133	0.0032	0.0121	0.0031
DDVEG	0.0142	0.0128	0.1415	0.0123
HGVEG	0.2230	0.0347	0.2170	0.0338
AQVEG	−0.1852	0.0864	−0.1678	0.0842
random effects	0		1
Scale	3.4259		3.3137

Open in a new tab

Table 3.

Variance decomposition based upon pseudo-R² values and model error diagnostics obtained by regressing the observed on the predicted standardized rates for the Bagan and the Sarakawa-Kpelou study sites

Clustering for repeated measures	Bagan site	Sarakawa-Kpelou site
Common covariates	0.3102	0.3098
Clustering-specific covariates	0.0264	0.0308
SURE	0.1981	0.0282
Negative spatial autocorrelation	0.0172	0.0221
Positive spatial autocorrelation	0.0099	0.0041
Pseudo-R²	0.5618	0.3950
Spatial filter R²	0.1469	0.4375
Spatial filter MC	−0.03267	−0.32985
P(S-W) for random effects	∧ 0.0001	0.4609
P(S-W) for SURE	0.0005	0.5475

Open in a new tab

The diagonalization of the matrices generated from the sampled immature S. damnosum s.l.riverine habitat data also consisted of finding the normalized vectors u_i, which was stored as columns in the error matrices where, U = [u₁ ⋯ u_n], satisfied $Ω = H W H = U \land U^{T} = \sum_{i = 1}^{n} λ_{i} u_{i} u_{i}^{T}$ and where ∧ = diag(λ₁⋯λ_n), $u_{i}^{T} u_{i} = {‖ u_{i} ‖}^{2} = 1$ when $u_{i}^{T} u_{j} = 0$ for i ≠ j. The double centering of Ω implied that the eigenvectors u_i rendered from the sampled residual predictor covariate coefficient estimates from the riverine larval habitat models were centered and that at least one eigenvalue was equal to zero. Introducing these eigenvectors in the original formulation of the Moran’s coefficient generated from the sampled data led to

I (x) = \frac{n}{1^{T} W 1} \frac{x^{T} H W H x}{x^{T} H x} = \frac{n}{1^{T} W 1} \frac{x^{T} U \land U^{T} x}{x^{T} H x} = \frac{n}{1^{T} W 1} \frac{\sum_{i = 1}^{n} λ_{i} x^{T} u_{i} u_{i}^{T} x}{x^{T} H x}

in both models.

Marginal and conditional distributions from inverse Wishart-distributed matrices were then determined for spatially summarizing Bayesian inferences from the sampled riverine larval habitat data. In this research A ~ W⁻¹(Ψ,m) had an inverse Wishart distribution. We partitioned the matrices A and Ψ conformably with each other using $A = [\begin{matrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{matrix}], Ψ = [\begin{matrix} Ψ_{11} & Ψ_{12} \\ Ψ_{21} & Ψ_{22} \end{matrix}]$ where A_ij and Ψ_ij were p_i × p_j matrices. We then obtained A₁₁ which was independent of $A_{11}^{- 1} A_{12}$ and A_22•1, where $A_{22 • 1} = A_{22} - A_{21} A_{11}^{- 1} A_{12}$ where was the Schur complement of A₁₁ in A. Commonly, a finite element problem is split into non-overlapping sub-domains in an autoregressive cluster-based regression distribution models and the unknowns in the interiors of the sub-domains are eliminated. ^[10] In this research, the remaining Schur complement system on the unknowns associated with sub-domain interfaces was solved by the conjugate gradient method.

The model initially revealed that the conjugate gradient method was unstable with respect to the perturbations in the models. However, the conjugate gradient method we employed had an iterative method which provided monotonical improving approximations (i.e., x_k) which in this research was achieved using the required tolerance rates after a relatively small number of iterations was performed in both models. Our improvement was linear and its speed was quantified by the condition number κ(A) of the system matrix A; where, the larger the κ(A) the slower the improvement in the residual forecasted uncertainty estimates. Since some of our κ(A) were large, preconditioning was used to replace the original system Ax − b = 0 with M⁻¹(Ax − b) = 0 so that κ(M⁻¹ A) got smaller than κ(A). In most cases, preconditioning is necessary to ensure fast convergence of the conjugate gradient method. ^[11]

In this research, the preconditioned conjugate gradient method in the riverine larval habitat cluster-based autoregressive models took the form: x₀: = b − Ax₀: z₀: = M⁻¹r₀: p₀: = z₀: k: = 0: where : x_k+1: = x_k + α_kp_k: r_k+1: = r_k − α_kAp_k: if r_k+1 was sufficiently smaller than the exit loop end and if: $z_{k + 1} ≔ M^{- 1} r_{k + 1}, β_{k} ≔ \frac{z_{k + 1}^{T} r_{k + 1}}{z_{k}^{T} r_{k}}, p_{k + 1} ≔ z_{k + 1} + β_{k} p_{k}$ and k: = k + 1. In both models the above formulation was equivalent for applying the conjugate gradient method without preconditioning; whereby, E⁻¹ A(E⁻¹)^T x̂ = E⁻¹b and where EE^T=M and x̂ = E^Tx. The preconditioner matrix M, was symmetric positive-definite and fixed, (i.e., stationary from iteration to iteration) in the models. We then compared the number of iterations with empirical distribution functions using the autoregressive residual variance estimates. In this research, the empirical distribution function was the cumulative distribution function associated with the empirical measure of the sampled data in each riverine epidemiological breeding study sites. We then obtained $A_{11} ~ W^{- 1} (Ψ_{11}, m - p_{2}), A_{11}^{- 1} A_{12} | A_{22 • 1} ~ {MN}_{p 1 \times p 2} (Ψ_{11}^{- 1} Ψ_{12}, A_{22 • 1} \otimes Ψ_{11}^{- 1}),$ where MN_p×q was a matrix normal distribution and A_22•1 ~ W⁻¹(Ψ_22•1,m).

A conjugate distribution was then determined to make inferences about the covariance error matrices in the larval habitat models. (i.e., Σ) whose prior p(Σ) had a W⁻¹(Ψ,m) distribution. The residuals revealed that the sampled observations were independent p-variate Gaussian variables drawn from a N(0,Σ) distribution in both models. The conditional distribution p(Σ | X) of the sampled predictor covariate coefficients estimates had a W⁻¹(A + Ψ,n + m) distribution, where A=XX^T was the number of sampled S. damnosum s.l. larval habitats in each riverine epidemiological breeding study sites. Due to its conjugacy to the multivariate Gaussian, it was possible to integrate out the Gaussian-based georeferenced parameter estimators using: $P (X | Ψ, m) = \int P (X | Σ) P (Σ | Ψ, m) d Σ = \frac{| Ψ |^{\frac{m}{2}} Γ_{p} (\frac{m + n}{2})}{π^{\frac{np}{2}} | Ψ + A |^{\frac{m + n}{2}} Γ_{p} (\frac{m}{2})} .$ The variance of the diagonal used the same formula in both models when i = j, which was then simplified to: $var (b_{ii}) = \frac{2 ψ_{ii}^{2}}{{(m - p - 1)}^{2} (m - p - 3)} .$ The mean was then $E (B) = \frac{Ψ}{m - p - 1}$ . Thereafter, we calculated the variance of each element of B in the models which revealed $var (b_{ij}) = \frac{(m - p + 1) ψ_{ij}^{2} + (m - p - 1) ψ_{ii} ψ_{jj}}{(m - p) {(m - p - 1)}^{2} (m - p - 3)} .$

Other Bayesian specifications were then generated employing normal priors for each of the logistic regression-based coefficients. This solution had posterior mean regression coefficients and standard errors that were almost identical to those for a frequentist solution. In this research 160,000 Markov chain Monte Carlo (MCMC) replications were executed using the sampled data from each riverine epidemiological breeding study site. The first 10,000 were discarded as a burn-in set and the resulting 150,000 were weeded such that only every third replication result was retained.

The final MCMC dataset contained 50,000 replications. For quantifying ergodicity of the MCMC algorithm, assuming that all the samplers simultaneously satisfied the nested polynomial drift conditions, we determined that either when the number of nested drift conditions was greater than or equal to two, or when the number of drift conditions was one, the adaptive algorithm was ergodic. Ergodicity of an adative MCMC algorithm in a spatiotemporal vector insect larval habitat cluster-based regression model refers to positive recurrent aperiodic state of stochastic systems tending in probability to a limiting form that is independent of the initial conditions.^[8] For the Bagan epidemiological study site the diagnostic Shapiro-Wilk statistic had a null hypothesis probability of P(S-W) of 0.0025 and the random effects increased the pseudo-R² value to 0.8652 while the spatially structured random effects (SSRE) in the sampled data accounted for about 52% of the random effects in the model. For the Sarakawa-Kpelou study site the diagnostic Shapiro-Wilk statistic had a null hypothesis probability of P(S-W) = 0.1572 and the random effects increased the pseudo-R² value to 0.9973. The SSRE accounted for about 34% of the random effects in the model. The larval habitat map patterns in both study sites were characterized by overall negligible negative spatial autocorrelation for the high density ABR-stratified cluster; (the Moran’s coefficient was −0.0531 and −0.2240 for the Bagan and Sarakawa-Kpelou sites respectively).

The final model output for the Bagan epidemiological study site detailing the sequential decomposition of the variance is shown in Table 4. The logistic regression model mean response was then estimated with quasi-likelihood techniques because of the presence of severe underdispersion (i.e., 0.0227) which also comprised the sampled predictor covariate coefficients estimates as revealed in Table 5. The final model output for the Sarakawa-Kpelou epidemiological study site also detailed the sequential decomposition of variance as shown in Table 6. Similarly as the Sarakawa-Kpelou study site, the logistic regression mean responses were estimated for the Bagan model using quasi-likelihood techniques because of the presence of severe underdispersion (i.e., deviance = 0.0017) which contained the predictor covariate coefficients estimates in Table 7.

Table 4.

The final detailed sequential decomposition of variance for the Bagan study site:

Variance component	Partial pseudo-R²
four covariates	0.3066
latitude	0.0183
SURE	0.5414
negative spatial autocorrelation (− SA)	0.0781
positive spatial autocorrelation (+ SA)	0.0433
TOTAL	0.9876

Open in a new tab

Table 5.

The logistic regression model mean response, which was estimated with quasi-likelihood techniques for the Bagan riverine study site

Parameter	Estimate	Standard Error	Chi-square
Intercept	−1.1448	0.0236	2351.41
TURB	−0.0854	0.0020	1836.79
RCKS	0.1521	0.0071	459.18
latitude	0.0230	0.0015	249.57
AQVEG	0.4141	0.0150	762.26
HGVEG	−0.2247	0.0201	125.05
FLVEG	0.2477	0.0202	149.79
SURE	4.3325	0.0521	6924.57
E₂₉ (− SA)	0.6417	0.0448	241.59
E₃₇ (− SA)	−0.7354	0.0428	294.91
E₄₁ (− SA)	−0.4152	0.0414	100.01
E₄₇ (− SA)	0.8233	0.0419	387.29
E₄₉ (− SA)	−0.7189	0.0424	288.19
E₅₃ (+ SA)	0.4589	0.0425	117.54
Scale	0.1563

Open in a new tab

Table 6.

The final detailed sequential decomposition of variance for the Sarakawa-Kpelou study site

Variance component	Partial pseudo-R²
Two covariates	0.2971
SURE	0.4324
negative spatial autocorrelation (− SA)	0.1818
positive spatial autocorrelation (+ SA)	0.0415
TOTAL	0.9173

Open in a new tab

Table 7.

The logistic regression model mean response, which was estimated with quasi-likelihood techniques for Sarakawa-Kpelou study site

Parameter	Estimate	Standard Error	Chi-square
Intercept	−1.3475	0.0074	25652.9
I_adobe	3.9688	0.0871	2082.24
I_single	−0.5725	0.0174	1082.14
SURE	18.7079	0.2777	4539.44
E₁₃ (− SA)	0.2559	0.0110	539.36
E₁₇ (− SA)	0.3560	0.0109	1071.41
E₁₃ (+ SA)	−0.2333	0.0112	437.77
Scale	0.0412

Open in a new tab

We then constructed multiple simulation outputs from the validation model residuals. The simulation of each of our high and low ABR-stratified riverine larval habitat cluster-based regression parameter estimators was a two-step process. In Step 1, we simulated $X_{i} ~ N (γ Z_{i}^{*}, 1)$ and in Step 2: we simulated $Y_{i} ~ N (c \sqrt{2} Z_{i}^{*} + β X_{i}, 1 / w_{i})$ independently for i. We displayed the results when c = 1 and, thus, the outcome was Y_i, using the residual explanatory data from the initial autoregressive riverine larval habitat model outputs. We found that the power increased based on the positive relationship (i.e., γ > 0, β > 0) between the sampled georeferenced predictor covariate coefficient estimates. This was expected in models since $E (Y_{i} | Z_{i}^{*}) = (c \sqrt{2} + β γ) Z_{i}^{*},$ directly depended on the values of γ and β. We noticed that when we adjusted for X_i, the power of the validation models decreased revealing a stronger association between the predicted residual autoregressive uncertainty estimates using X_i, and $Z_{i}^{*} (γ \to \infty)$ . Our first simulation assessed the unadjusted analyses in the models when there was clustering of the residual within varying and constant predictor covariate coefficients effects which in this research was indirectly induced by $X_{i} (E (Y_{i} | Z_{i}^{*}) = β γ Z_{i}^{*}) .$ The individual-sampled riverine larval habitat level explanatory variable outcome values in each riverine epidemiological breeding study site was then quantified by $V_{i} = {\hat{U}}_{i} + {\hat{β}}_{I} {\bar{X}}_{ij}^{I} .$ There was a normal distribution in the estimating equations for e_ij and X_ij which was a vector in the models when the sampled predictor covariate coefficients measurement indicator values of the georeferenced indicator variable was quantified from the initial model outputs.

We then used the distribution on U_i and e_ij, based on $U_{i} \overset{ind}{~} (β_{R}, σ_{R}^{2})$ and $e_{ij} \overset{ind}{~} (0, σ^{2})$ for verifying the empirical estimates for U_i. Thereafter, we incorporated the individual-sampled residual intra-cluster predictive serial error correlation values using $B_{i} = \sum_{j = 1}^{n_{i}} [Y_{ij} - {\hat{β}}_{I} X_{ij}^{l} - {\hat{U}}_{i}] / n_{i} .$ The residual estimates were quantified based on the relationship between the sampled georeferenced riverine larval habitat locations in each epidemiological riverine study site and Y_ij, given X_ij and E(B_i\) = 0 and $Var (B_{i}) = (σ^{2} + σ_{R}^{2}) / n_{i}$ which, in turn, was used to verify all estimated weights from the autoregressive models using the inverse of the variance of B_i, $w_{i} = n_{i} / (σ^{2} + σ_{R}^{2}) .$ In the models, for a given sampled georeferenced riverine larval habitat [i.e., (x₁,x₂)], $\sqrt{n} Z_{loc}$ was the weighted sum of residuals. In this research, if a cluster-based predictive autoregressive estimate occurred in areas with a higher intensity of an outcome this implied a larger value of Z_loc(x₁,x₂|b).

Unfortunately, the exact distribution of Z_loc(x₁,x₂|b) could not be solved analytically in the models so an asymptotic equivalent distribution was used to approximate the true distribution of the sampled parameter estimators. We then considered the following expressions for validating the estimates using (x₁, x₂), Ẑ_loc(x₁,x₂ | b), where $v (x_{1}, x_{2} | b) = - \sum_{i = 1}^{n} W_{i} (x_{1}, x_{2} | b) \partial μ / \partial β = - \sum_{i = 1}^{n} W_{i} (x_{1}, x_{2} | b) X_{i} .$ Our validation models revealed I (β) = −∂U_β/∂β and G_i (i = 1,⋯,n) were independent displaying a mean of 0 and variance of 1. It thus followed that the asymptotic conditional distribution of the Ẑ_loc(x₁,x₂ | b), given the observed riverine larval habitat residual outputs (Y_i, X_i, s_i, r_i) (i = 1,⋯,n), were equivalent to the distribution of Z_loc(x₁, x₂|b), assuming that the georeferenced larval habitat geolocation, (s_i, r_i) was independent of the outcome (i.e., Y). These results were obtained by qualitatively assessing and then quantifying the independence between the forecasted residual predictor covariate error coefficient estimates under the null hypothesis. The asymptotic results from both riverine larval habitat models allowed us to approximate the null distribution of Ẑ_loc(x₁, x₂ | b) employing multiple n realizations of Ẑ_loc(x₁,x₂ | b), (Ẑ_1,loc(x₁,x₂ | b),⋯, Ẑ_N,loc(x₁,x₂ | b)) and by repeatedly simulating independent samples of (G₁,⋯,G_n), while adjusting the autoregressive residual estimates using (Y_i, X_i, s_i, r_i) (i = 1,⋯,n). A finite vector of length M of the explanatory estimates was also denoted by b = (b₁,⋯,b_M) in each validation model where each b_m represented the size of the ABR-stratified clusters. Accordingly, we defined the validation test statistics using

S_{l o c} = sup [sup_{x_{1}, x_{2}} Z_{l o c} (x_{1}, x_{2} | b_{1}), \dots, sup_{x_{1}, x_{2}} Z_{l o c} (x_{1}, x_{2} | b_{M})] .

. which was conditional on the sampled estimates using

{\hat{S}}_{l o c} = sup [sup_{x_{1}, x_{2}} {\hat{Z}}_{l o c} (x_{1}, x_{2} | b_{1}), \dots, sup_{x_{1}, x_{2}} {\hat{Z}}_{l o c} (x_{1}, x_{2} | b_{M})] .

Thereafter, the empirical p-values were computed as $p - value = \frac{\sum_{j = 1}^{N} I [{\hat{s}}_{loc} \leq {\hat{s}}_{jloc}]}{N},$ , for each riverine larval habitat cluster-based regression model where Ŝ_j,loc was the Ŝ_loc at the j^th realization of Ẑ_j,loc. The residual model outputs were [(x₁, x₂, b): Z_loc(x₁, x₂|b) ≥ Ŝ_(.95N)], when Ŝ_(.95N) was the 95^th percentile for all Ŝ_j,loc rendered from the forecasted estimates. In the riverine larval habitat models we noted that when $E (Y_{i} | Z_{i}^{*}, X_{i}) = c \sqrt{2} Z_{i}^{*} + β X_{i}$ and Var(Y_i) = 1, w_i was 1 for all i (i = 1,⋯,n) Additionally, $Z_{i}^{*}$ was an important indicator value of the sampled residual predictor error covariate coefficients in both models if i was within Z^* and X_i and had a varying β (i.e., dependence of Y_i on X_i) and γ (i.e., dependence of X_i on $Z_{i}^{*}$ ). We varied β employing −2 to 2 sequenced by 1 and γ = 0, .5, 1. Our models revealed that when β ≤ 0 and γ = 0.5 or 1.0 there was no power to detect any sampled explanatory estimates in either model. However, when we allowed β∨ 0 the power increased as β increased with a maximum power of approximately 0.35 when β = 2 and γ = 1. Thus, our power to detect a prolific riverine larval habitat based on spatiotemporal field-sampled autoregressive forecasted count data in each riverine epidemiological study site was within normal statistical thresholds, but it did increase as expected with more positive dependence between $Z_{i}^{*}$ and X_i (γ > 0, γ → ∞) and stronger positive association between X_i and Y_i (β > 0, β → ∞). Our models were conditional on X_i as there was independence between Y_i and $Z_{i}^{*} (Y_{i} | X_{i} ⊥ Z_{i}^{*}) .$ The autoregressive residual uncertainty outputs in both S. damnosum s.l. riverine models revealed that the predictive power was equal to the Type I error rate of 0.05.

3. Discussion

In conclusion, varying coefficient cluster-based regression residuals, diagnostic Shapiro-Wilks statistics, respecified Bayesian priors and QuickBird visible and NIR data determined latent negative spatial error autocorrelation components in the residual predictive autoregressive intra-cluster S. damnosum s.l larval habitat predictor covariate correlation analyses for both the Bagan and Sarkawa-Kpleau riverine epidemiological study sites. Designing and developing S.damnosum s.l riverine larval habitat management strategies in ArcGIS based on spatial statistical algorithms in SAS/GIS and PROC MCMC using sub-meter resolution satellite data and robust diagnostic residual intra-cluster predictor covariate error correlation estimates can provide an effective entomological tool to reduce prolific S.damnosum s.l. larval habitats based on spatiotemporal field-sampled count data in riverine ecosystems.

Footnotes

Notes on contributors

BENJAMIN G. JACOB received Ph.D. in environmental epidemiology from the School of Medicine at the University of Miami, FL, USA. Thereafter, Dr. Jacob completed a National Institute of Health (NIH) Post-doctoral fellowship in medical entomology at the University of Illinois at Urbana/Champaign in, IL, USA. Currently, Dr Jacob is a research assistant professor at the College of Public Health, Department of Global Health University of South Florida, Tampa, FL, USA.

References

1.Toe L, Merriweather A, Unnasch TR. DNA probe-based classification of Simulium damnosum s.l.-borne and human-derived filarial parasites in the onchocerciasis control program area. American Journal of Tropical Medicine and Hygiene. 1994;51(3):676–683. [PubMed] [Google Scholar]
2.Crosskey RW. A taxonomic study of the larvae of West African Simuliidae (Diptera: Nematocera) with comments on the morphology of the larval blackfly head. Bull. British Museum (Natural History) Entomology. 1960;10(6):1–74. [Google Scholar]
3.Paugy D, Fermon Y, Abban KE, Diop E, Traoré K. Onchocerciasis Control Programme in West Africa: a 20-year monitoring of fish assemblages. Aquatic Living Resources. 1999;12(7):363–378. [Google Scholar]
4.Toe L, Tang J, Back C, Katholi CR, Unnasch TR. Vector-parasite transmission complexes for onchocerciasis in West Africa. Lancet. 1997:163–166. doi: 10.1016/S0140-6736(96)05265-8. [DOI] [PubMed] [Google Scholar]
5.Garms R. The reinvasion of the onchocerciasis control programme area in the Volta River Basin by Similium damnosum s.l. the involvement of the different cytospecies and epidemiological implications. Annals Society Belgium Medicine Tropical. 61(2):193–198. [PubMed] [Google Scholar]
6.Boatin B, Molyneux DH, Hougard JM, Christensen OW, Alley ES, Yaméogo L, et al. Patterns of epidemiology and control of onchocerciasis in West Africa. Journal of Helminthology. 1997;71(7):91–101. doi: 10.1017/s0022149x00015741. [DOI] [PubMed] [Google Scholar]
7.Gu W, Novak RJ. Habitat-based modeling of impacts of mosquito larval interventions on entomological inoculation rates, incidence, and prevalence of malaria. American Journal of Tropical Medicine Hygiene. 2005;73(17):5460–5552. [PubMed] [Google Scholar]
8.Jacob BG, Griffith DA, Muturi EJ, Caamano, et al. A heteroskedastic error covoariance matrix estimator using a first-order conditional autoregressive Markov simulation for deriving asympotical efficient estimates from ecological sampled Anopheles arabiensis aquatic habitat covariates. Mal Journal. 2009;28(2):216–225. doi: 10.1186/1475-2875-8-216. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.McCulloch CE, Searle SR. Generalized, Linear and Mixed Models. New York, X: Wiley-Interscience; 2005. [Google Scholar]
10.Cressie N. Aggregation in geostatisticsal problems. Geostatistics troia. 1993 [Google Scholar]
11.Griffith DA. Spatial autocorrelation on spatial filtering. Springer; 2003. [Google Scholar]

[R1] 1.Toe L, Merriweather A, Unnasch TR. DNA probe-based classification of Simulium damnosum s.l.-borne and human-derived filarial parasites in the onchocerciasis control program area. American Journal of Tropical Medicine and Hygiene. 1994;51(3):676–683. [PubMed] [Google Scholar]

[R2] 2.Crosskey RW. A taxonomic study of the larvae of West African Simuliidae (Diptera: Nematocera) with comments on the morphology of the larval blackfly head. Bull. British Museum (Natural History) Entomology. 1960;10(6):1–74. [Google Scholar]

[R3] 3.Paugy D, Fermon Y, Abban KE, Diop E, Traoré K. Onchocerciasis Control Programme in West Africa: a 20-year monitoring of fish assemblages. Aquatic Living Resources. 1999;12(7):363–378. [Google Scholar]

[R4] 4.Toe L, Tang J, Back C, Katholi CR, Unnasch TR. Vector-parasite transmission complexes for onchocerciasis in West Africa. Lancet. 1997:163–166. doi: 10.1016/S0140-6736(96)05265-8. [DOI] [PubMed] [Google Scholar]

[R5] 5.Garms R. The reinvasion of the onchocerciasis control programme area in the Volta River Basin by Similium damnosum s.l. the involvement of the different cytospecies and epidemiological implications. Annals Society Belgium Medicine Tropical. 61(2):193–198. [PubMed] [Google Scholar]

[R6] 6.Boatin B, Molyneux DH, Hougard JM, Christensen OW, Alley ES, Yaméogo L, et al. Patterns of epidemiology and control of onchocerciasis in West Africa. Journal of Helminthology. 1997;71(7):91–101. doi: 10.1017/s0022149x00015741. [DOI] [PubMed] [Google Scholar]

[R7] 7.Gu W, Novak RJ. Habitat-based modeling of impacts of mosquito larval interventions on entomological inoculation rates, incidence, and prevalence of malaria. American Journal of Tropical Medicine Hygiene. 2005;73(17):5460–5552. [PubMed] [Google Scholar]

[R8] 8.Jacob BG, Griffith DA, Muturi EJ, Caamano, et al. A heteroskedastic error covoariance matrix estimator using a first-order conditional autoregressive Markov simulation for deriving asympotical efficient estimates from ecological sampled Anopheles arabiensis aquatic habitat covariates. Mal Journal. 2009;28(2):216–225. doi: 10.1186/1475-2875-8-216. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.McCulloch CE, Searle SR. Generalized, Linear and Mixed Models. New York, X: Wiley-Interscience; 2005. [Google Scholar]

[R10] 10.Cressie N. Aggregation in geostatisticsal problems. Geostatistics troia. 1993 [Google Scholar]

[R11] 11.Griffith DA. Spatial autocorrelation on spatial filtering. Springer; 2003. [Google Scholar]

PERMALINK

Quasi-Likelihood Techniques in a Logistic Regression Equation for Identifying Simulium damnosum s.l. Larval Habitats Intra-cluster Covariates in Togo

BENJAMIN G JACOB

ROBERT J NOVAK

LAURENT TOE

MOUSSA S SANFO

ABENA N AFRIYIE

MOHAMMED A IBRAHIM

DANIEL A GRIFFITH

THOMAS R UNNASCH

Abstract

Introduction

1 Material and methodology

1.1 Study site

1.2 Remote sensing data

1.3 Annual Biting Rate (ABR) cluster-based classification

Figure 1.

1.5 Habitat mapping

Figure 2.

Figure 3.

1.6 Environmental parameters

Table 1.

1.7 Spatial ecohydrological model

Figure 4.

1.8 Regression analyses

1.9 DW statistics

1.11 Bayesian analyses

1.12 Model validation

2. Results

Table 2.

Table 3.

Table 4.

Table 5.

Table 6.

Table 7.

3. Discussion

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases