Skip to main content
PLOS Neglected Tropical Diseases logoLink to PLOS Neglected Tropical Diseases
. 2020 Sep 14;14(9):e0008620. doi: 10.1371/journal.pntd.0008620

Identifying correlates of Guinea worm (Dracunculus medinensis) infection in domestic dog populations

Robert L Richards 1,2,*, Christopher A Cleveland 3,4, Richard J Hall 1,2,5, Philip Tchindebet Ouakou 6, Andrew W Park 1,2,5, Ernesto Ruiz-Tiben 7, Adam Weiss 7, Michael J Yabsley 3,4, Vanessa O Ezenwa 1,2,5
Editor: Jeremiah M Ngondi8
PMCID: PMC7515199  PMID: 32925916

Abstract

Few human infectious diseases have been driven as close to eradication as dracunculiasis, caused by the Guinea worm parasite (Dracunculus medinensis). The number of human cases of Guinea worm decreased from an estimated 3.5 million in 1986 to mere hundreds by the 2010s. In Chad, domestic dogs were diagnosed with Guinea worm for the first time in 2012, and the numbers of infected dogs have increased annually. The presence of the parasite in a non-human host now challenges efforts to eradicate D. medinensis, making it critical to understand the factors that correlate with infection in dogs. In this study, we evaluated anthropogenic and environmental factors most predictive of detection of D. medinensis infection in domestic dog populations in Chad. Using boosted regression tree models to identify covariates of importance for predicting D. medinensis infection at the village and spatial hotspot levels, while controlling for surveillance intensity, we found that the presence of infection in a village was predicted by a combination of demographic (e.g. fishing village identity, dog population size), geographic (e.g. local variation in elevation), and climatic (e.g. precipitation and temperature) factors, which differed between northern and southern villages. In contrast, the presence of a village in a spatial infection hotspot, was primarily predicted by geography and climate. Our findings suggest that factors intrinsic to individual villages are highly predictive of the detection of Guinea worm parasite presence, whereas village membership in a spatial infection hotspot is largely determined by location and climate. This study provides new insight into the landscape-scale epidemiology of a debilitating parasite and can be used to more effectively target ongoing research and possibly eradication and control efforts.

Author summary

The eradication of human infectious diseases has proven remarkably difficult. The world has only succeeded once, in the case of the smallpox virus. However, international efforts have driven the debilitating Guinea worm parasite closer to the brink of eradication than nearly any other parasite. Coordinated efforts by the Ministries of Health in endemic countries, the U.S. Centers for Disease Control, The Carter Center, and the World Health Organization have reduced the number of annual Guinea worm cases from millions in the 1980s to hundreds in the early 2010s, but recently a new threat has emerged. Guinea worm infections have been diagnosed in domestic dogs, particularly in the Republic of Chad, and numbers of infections have continued to increase. As in many countries where dracunculiasis is endemic, the campaign for eradication in Chad has focused intervention measures on interrupting transmission among humans, so infection in dogs jeopardizes eradication efforts. In this study, we used machine learning methods to identify demographic, geographic, and climatic factors associated with the presence of Guinea worm-infected dogs at the village level, and spatial clustering of dog cases regionally. A combination of demographic, geographic and climatic factors were important correlates of infection at the village level, but the importance of these factors varied between northern and southern populations of the parasite. At the larger village cluster level, the geographic position and climate of a village were most important. Some of our findings, including the importance of fishing villages and the difference in correlates between northern and southern villages can be used by researchers to guide additional data collection and by public health workers to better target eradication efforts. More generally, this work contributes to a broader understanding of the spatial patterning of multi-host infectious diseases of humans and animals.

Introduction

Few human parasites and pathogens have been driven as near to the brink of eradication as the Guinea worm (Dracunculus medinensis). Although rarely fatal, Guinea worm disease (dracunculiasis), can be extremely painful and debilitating [1]. With a historical distribution spanning 21 countries across Asia and Africa, in the 1980s, Guinea worm was initially targeted for eradication by the World Health Assembly [2]. As a result, the number of Guinea worm disease cases per year decreased from 3.5 million to hundreds between the mid-1980s and the early-2010s [3]. However, this consistent downward trajectory slowed in a handful of endemic countries by the late 2000s. For example, in Chad, after an eradication campaign from 1993 to 2000, there were no Guinea worm cases reported for 10 consecutive years (2000–2009), but then there was a small, unexpected, outbreak in 2010 [4]. Since then, disease cases have persisted and all available evidence links the persistence of human cases to the emergence of Guinea worm in domestic dogs (Canis lupus familiaris) [2,3,5,6].

D. medinensis infection classically occurs through the ingestion of infected cyclopoid copepods [7]. The transmission cycle begins when, after an approximately 10–14 month incubation period, a female worm emerges, often on the feet or legs of a host, and releases first-stage larvae into a water source. Guinea worm larvae are ingested by copepods where they undergo development to an infectious third-stage larva [4]. Typically, ingestion of infectious copepods occurs when mammalian hosts drink unfiltered water, but recent work suggests that transmission may also occur when humans or other mammalian hosts (including dogs, cats, and baboons) eat undercooked or raw fish or frogs [3,4,8,9]. These aquatic animals are known predators of copepods potentially allowing them to act as paratenic or transport hosts in transmission to mammals. Given the Guinea worm transmission cycle, classical eradication efforts have primarily involved strategies to prevent human ingestion of infectious copepods (e.g. use of safe water sources such as borehole wells, filtering potentially-contaminated water, chemical treatment of surface water), or to limit larval shedding by emerged female worms (e.g. incentives for reporting cases, containment of infected individuals) [7]. Many of the same control strategies applied to human disease are also now used to control Guinea worm transmission in dogs, along with new approaches such as recommendations to bury or burn fish entrails to prevent dog ingestion of the parasite [2]. However, while these efforts have been largely effective at interrupting human transmission, they have not influenced numbers of dog infections in the countries where the parasite remains endemic [10]. Thus, additional insight into the correlates of Guinea worm transmission is needed to help inform future elimination strategies for dogs.

The dependence of the Guinea worm life cycle on the environmental availability of water provides an intuitive starting point for exploring potential correlates of Guinea worm infections in dogs. In Chad, the endemic region for Guinea worm occurs in a riparian floodplain along the Chari River where fishing is an essential form of subsistence and commerce. Insight into Guinea worm transmission in dogs might therefore be gained by examining associations between disease patterns and demographic (e.g. fishing practices of a village), climatic (e.g. rainfall), and geographic (e.g. proximity to water sources) features that are relevant to the transmission process. Indeed, such approaches have been used to inform control efforts for other environmentally-dependent diseases [1113]. For example, spatial modeling of schistosomiasis risk has been used to identify correlates of risk and to target control efforts such as mass drug administration, alleviating the need for costly long-term monitoring [14,15].

In this study, we investigated potential factors associated with detection of Guinea worm infection in domestic dogs in Chad and the implications for disease elimination. We used a machine learning approach to identify the demographic, geographic, and climatic factors that explain variation in Guinea worm infection in over 2000 villages at two different spatial scales: the village scale and the multi-village spatial hotspot scale. The hotspots represent areas with significantly more dog infections than expected given the underlying size of the dog population. Overall, by providing information on factors that predispose villages for canine Guinea worm infection, our study can help guide ongoing research, surveillance, and potentially elimination efforts.

Methods

Infection presence and predictor variables

We obtained data on D. medinensis infections in domestic dogs from the Guinea worm eradication program led by the Chad Ministry of Health and supported by The Carter Center and the World Health Organization [10]. The surveillance data used in this study were collected between 2013–2017 from 2125 villages located along the Chari river. Surveillance involves regular searches of households for humans or animals showing signs of Guinea worm infection as well as a system of incentives for case reporting by community members. Infections are only recorded when a D. medinensis worm emerges from the skin, becoming visible to Guinea Worm Eradication Program (GWEP) staff, thus an infection report reflects the definitive occurrence of a D. medinensis infection by a single worm. Over the five-year study period, the village-level prevalence (proportion of villages with an infection) of D. medinensis in dogs was 19.1% (406/2125), but the number of infections per village varied widely (e.g. 2016 range: 0–71), and generally increased over time. In 2013, there were 57 infections in 39 villages, but this number increased to 1287 infections in 238 villages by 2016. This increase in infections is likely the result of both the spread of the parasite and increased intensity of surveillance [10]. For our analyses, we considered a village to be positive for Guinea worm if there had been at least one dog infection reported at any time during our five-year study window. This approach allowed us to minimize the effect of increased surveillance effort over time and to better align infection data with the temporal resolution of environmental predictor variables. Our approach also better matches the needs of the GWEP to identify currently uninfected villages likely to become infected. For example, climatological data (see below) are based on a time-averaged interpolation from 1970–2000. Finally, given recent work on the genetic structure of Guinea worm in Chad identifying Northwestern and Southeastern parasite sub-populations based on a clear spatial division of genetic relatedness ([6], Fig 1), we also accounted for spatial structure of the parasite population in our analytical approach (see Data Analysis).

Fig 1. Map of spatial hotspots of D. medinensis infection in dogs.

Fig 1

Orange points represent villages where an infection was present (n = 312), while green points are villages without a history of dog infection (n = 1280). Circles represent spatial hotspots identified by hotspot analysis. The horizontal line marks the rough geographic delineation between Northern and Southern subpopulations of the parasite.

To understand potential correlates of D. medinensis infection patterns in dogs, we collated data on 36 variables that might contribute to transmission and 2 variables estimating surveillance intensity. All predictor variables were collated at the village level and fell into one of four general categories: demographic, climatic, geographic, and surveillance (see S1 Table). Demographic variables, such as dog and human population size estimates, were collected by GWEP staff during house-to-house surveys conducted each January and a mean value, for the five-year study period, was calculated for each village. These population-based variables were included in our model because of well-described positive associations between host population size and parasite transmission in other systems [1618], and as a method to control for the increased probability of parasite detection in larger, more populated areas. The identity of a village as a ‘fishing village’ was of particular interest because of the possible connection between unattended fish scraps and D. medinensis transmission to dogs [8]. Villages were identified by GWEP staff as “fishing” if greater than 50% of families in that village fish.

Climatological variables, including variables related to temperature and precipitation, were included as predictors in our model because of the relevance of ephemeral and seasonal water sources to the life cycle of the D. medinensis [1]. We conducted all analyses with both high spatial resolution climate data from the WorldClim dataset [19](primary analysis) and with lower spatial resolution data collected via remote sensing over the same time period as the parasite observations (alternative analysis). Remotely sensed temperature data were derived from the Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2) dataset [20,21], while precipitation data were obtained from the African Rainfall Climatology, version 2 dataset (ARC2) [22](See S1 Table). Model results based on the primary analysis with WorldClim data are presented in the main text but differences between results of the WorldClim and alternative remote sensing analyses are noted where relevant.

Geographic variables such as land-cover type (e.g. vegetation type and proportion of cover), remotely sensed surface water, and elevation were included in our model because they might dictate the location, size, and permanence of water sources that are crucial to the Guinea worm parasite life cycle. Both climatic and geographic covariates were extracted for villages according to latitude and longitude coordinates. GWEP workers collected these coordinates at the center of each village using portable GPS devices.

Finally, to control for spatial variation in surveillance, we included two measures of surveillance effort in our models, the mean number of healthcare workers present in a village annually and the total number of healthcare supervisor visits to a village from 2013–2017. These two metrics capture different components of surveillance intensity: the first variable summarizes spatial variation in average surveillance intensity over the study period, while the second variable encompasses some degree of the spatiotemporal variation in surveillance since villages have no visits for years during which they were unsurveilled.

Data analysis

Spatial hotspot analysis

We used spatial scan analyses to identify whether ‘hotspots’ of D. medinensis infections in dogs exist (i.e. whether villages with dog cases are spatially clustered). Prior to identifying hotspots, all villages with missing records for dog population size (N = 135) were removed leaving 1990 villages in the hotspot analysis, including 1592 in the training set and 398 in the held-out evaluation set. The spatial analysis specified a discrete Poisson incidence model which assumes that the mean number of detected dog infections in a village is proportional to the population of dogs in that village and conducts a spatial scan to identify areas with significantly more infections than expected. Spatial cluster analyses were performed using the rsatscan package [23] in R version 3.5.1. Because the spatial scan is performed at varying radiuses centered on all villages, overlapping or concentric clusters are possible [24]. We classified villages based on whether or not they were members of a spatial infection hotspot.

Boosted regression tree analysis

We used boosted regression trees (BRTs) to identify variables (see S1 Table) of importance for predicting D. medinensis infection in dogs at the local (village) and regional (hotspot) scales. This method has been widely used both for the predictive modeling of the spatial extent of infectious disease risk [2527] and for the identification of important predictors of risk [1113].

Prior to including all variables in our boosted regression tree models we diagnosed significant co-linearity in all types of predictors using Ward-clustering based on the spearman correlation matrix [28,29]. Co-linear clusters were reduced to the single most central variable in the cluster (the variable with highest mean correlation to all others) according to [28]. After co-linearity reduction, villages were randomly sub-sampled into training (80%, 1700 villages) and testing (20%, 425 villages) sets for the purpose of final model validation. Due to constraints of the hotspot analysis (villages without records of dog population could not be included in the model) we only included 1592 (80%) and 398 (20%) villages in the training and testing set respectively for the boosted regression tree trained to hotspot identity.

Our BRT models identified variables predicting: (i) whether D. medinensis was detected in a village over the five-year study period and (ii) whether a village was a member of an infection hotspot (note that membership in an infection hotspot was not limited to villages with previous infections since uninfected villages in close proximity to villages with large numbers of cases can be identified as members of a hotspot). To evaluate any potential effects of parasite genetic structure, we also re-fit models using data exclusively from the northern/western and southern/eastern parasite sub-populations respectively. All models were fit with the gbm.step function from the dismo package [30] in R version 3.5.1 which iteratively fits boosted regression tree models with increasing numbers of trees and assesses the fit of the model through 10-fold cross validation. The tree-number that minimizes the residual deviance in the response variable was chosen as the best model. Models were assessed for their fit through cross-validation and to held-out evaluation data using area under the receiver operator curve (AUC), a measure of a model’s ability to discriminate between positive and negative responses [31]. Model parameters (bag fraction, learning rate, and tree-complexity) were tuned to maximize AUC through cross validation while minimizing the difference between training and cross-validation AUC. This served as a control on overfitting to training data. Bag fraction denotes the size of the random sample on which each tree is trained, learning rate sets the shrinkage of the contribution of a given tree to the overall model, and tree-complexity defines the number of branch splits allowed in the tree. Tuning for tree complexity requires fitting interactions between covariates. The strength of the interactions and a partial dependence plot of the strongest interaction for the presence-absence (S3 Table, S4 Fig) and hotspot (S4 Table, S5 Fig) models are reported in the supplementary material since they do not provide information that qualitatively changes the interpretation of the non-interaction based results. The relative importance of each covariate was determined by calculating the relative improvement in model fit when a covariate is included in the model, weighted by how often the covariate appeared in the collection of trees [32]. Values were then rescaled such that they summed to 100, with larger values representing a larger relative influence on model fit [33,34]. We visualized the effect of each covariate on the response variable using partial dependence plots [33,34].

Results

Over the five-year study period, the village-level prevalence (proportion of villages with an infection) of D. medinensis in dogs was 19.1% (406/2125), but the number of infections per village varied widely (e.g. 2016 range: 0–71), and generally increased over time.

Identifying spatial hotspots of infection

Seventeen spatial hotspots of D. medinensis infection were identified with a p-value < 0.05. Overall, 43.1% (857 out of 1990) of villages were identified as belonging to a spatial hotspot (Fig 1). 29.5% (253/857) of hotspot villages actually had a detected case during the five-year study period. Hotspot villages were located primarily along the northern/western reaches of the Chari river (88.1% of hotspot villages are in the North compared to 74.9% of all villages), with smaller clusters in the southern/eastern reaches (11.9% of hotspot villages are in the South compared to 25.1% of all villages). The total number of infections was, however, relatively evenly split between northern/western (1361) and southern/eastern (1195) villages, suggesting that cases are more spatially concentrated in the southern/eastern region.

Evaluating predictors of infection at different scales

Two clusters of predictor variables were identified from our co-linearity analysis (S2 Table). The first co-linear cluster contained longitude, latitude, mean elevation, and the majority of the bioclimatic variables, and this cluster was reduced to the one central variable, mean annual precipitation (Bioclim 12). The second co-linear cluster contained only human population and number of households in a village and was reduced to human population size. The BRT model fit data on D. medinensis presence across all villages with a mean AUC of 0.887 in cross-validation and an evaluation AUC of 0.889. Demographic and geographic variables were the most important predictors of D. medinensis presence across all villages, with dog population size (22.2% Importance; Fig 2A, Fig 3A), number of healthcare supervisor visits (17.8% Importance; Fig 2A, Fig 3B) identity as a fishing village (14.6% Importance; Fig 2A, Fig 3C), standard deviation in elevation (11.1% Importance; Fig 2A, Fig 3D), and mean annual precipitation (Bioclim 12; representing covariate cluster 1, 9.4% Importance; Fig 2A) emerging as key predictor variables of importance. Splitting villages into northern (CV AUC: 0.905, evaluation AUC: 0.915) and southern (CV AUC: 0.826, evaluation AUC: 0.810) parasite sub-populations, revealed strong distinctions between the two regions. Fishing village identity and standard deviation in elevation were important in northern villages, while the importance of climate-related variables, represented by mean annual precipitation (Bioclim 12; cluster 1) and mean temperature of the driest quarter (Bioclim 9) only emerged in southern villages (Fig 2B and 2C, S1 Fig, S2 Fig).

Fig 2. Relative importance estimates.

Fig 2

Relative importance of covariates in predicting (a) D. medinensis infection presence in a village, (b) infection presence in Northern villages, (c) infection presence in Southern villages, and (d) hotspot identity across all villages. See S1 Table for notes on variable abbreviations (e.g. ASV Visits corresponds to the total number of healthcare supervisor visits to a village from 2013 to 2017).

Fig 3. Village-level partial dependence plots.

Fig 3

Partial dependence plots showing the effect of (a) dog population, (b) number of healthcare supervisor visits, (c) identity as a fishing village, and (d) standard deviation in elevation on probability of dog infection. Histograms represent the distribution of values for these covariates amongst all training villages.

When fit to spatial hotspot identity, the BRT model achieved a mean AUC in cross validation of 0.988 with an evaluation AUC of 0.989. The most important variables included cluster 1, represented by mean annual precipitation (Bioclim 12, 36.0% Importance; Fig 2D, Fig 4A), number of healthcare supervisor visits (26.0% Importance; Fig 2D, Fig 4B), mean temperature of the coldest quarter (Bioclim 11, 10.2% Importance; Fig 2D, Fig 4C) and mean temperature of the driest quarter (Bioclim 9, 9.6% Importance; Fig 2D, Fig 4D). These findings were not qualitatively different when analyses were performed on northern and southern villages separately (S3 Fig).

Fig 4. Hotspot-level partial dependence plot.

Fig 4

Partial dependence plots showing the effect of (a) cluster 1 (represented by annual precipitation [Bioclim 12]), (b) number of healthcare supervisor visits, (c) temperature of the coldest quarter [Bioclim 11], and (d) temperature of the driest quarter [Bioclim 9] on probability of dog infection. Histograms represent the distribution of values for these covariates amongst all training villages.

Finally, results at the village and scale were fairly robust to variation between our two climate data sources, the WorldClim data used in our primary analysis and the contemporary, lower spatial resolution, remotely sensed data used in an alternative analysis. Climate related variables were relatively less influential in models in the alternative analysis, possibly because of the coarser spatial resolution of these data. A lack of explanatory power of climate variables in the alternative analysis emerged in the hotspot analysis as well. Remotely sensed population density (12.9% Importance; S6 Fig, S8 Fig) and standard deviation in elevation (7.6% Importance; S6 Fig, S8 Fig) increased in importance while the only climatic variable retained among the top variables was precipitation of the warmest quarter (Bioclim 18, 11.7% Importance; S6 Fig, S8 Fig). Precipitation of the warmest quarter was one of the variables contained in the highly important cluster 1 which emerged as important in our primary analysis with WorldClim data (S2 Table).

Discussion

We found that a combination of demographic, geographic, climatic, and surveillance variables were important predictors of detection of Guinea worm infection in dogs in the villages of Chad. The surveillance variable of healthcare supervisor visits was an important predictor of Guinea worm infection at both the individual village and spatial hotspot scales. The size of the dog population and the status of a village as a fishing village were demographic predictors of Guinea worm infection at the village scale, while standard deviation in elevation emerged as a geographic predictor at this scale. At the hotspot scale, climate and geography were paramount. Overall, our results suggest that some demographic and geographic features of villages (e.g. fishing village identity and variation in elevation) can be used to inform local Guinea worm control efforts. In contrast, broad scale hotspots of infection are largely determined by climate or geographical location suggesting that spatial epidemiology may be most appropriate for identifying infection risks at this broader scale.

At the smaller scale of our analysis (the village), our models identified dog population size, fishing village identity, and number of visits from healthcare supervisors as key variables predicting presence of D. medinensis infection in dogs in a village. The importance of dog population size to parasite infection at the village scale is unsurprising, but the mechanisms behind this relationship are uncertain. Larger dog populations represent more chances that an infected copepod in the environment will encounter a viable host but not necessarily a larger risk to an individual dog or human. A larger dog population may also simply increase the probability that a Guinea worm infection is detected. In this case, dog population may be an indicator of surveillance effort, similar to healthcare worker surveillance. Unfortunately, the temporal scale of our analysis prevents us from drawing strong conclusions about the mechanisms underlying this pattern. Future work based on finer-scale temporal data is needed to better understand the role of dog population size as a potential driver of Guinea worm infection risk.

The importance of fishing village identity in driving infection risk at the village scale was also notable. This finding corroborates the evolving understanding of the possible role of fish in the transmission of Guinea worm. Because of the predominance of fishing activity in fishing villages, these are the locations where dogs are most likely to have access to raw fish and fish entrails which may serve as transport hosts of the parasite [8,35]. Fishing villages may also have other features in common besides the presence of fish remains, however, we account for many such features in our models by including variables such as distance to permanent water, surface water area, longitude, latitude, human and dog population sizes, and surveillance by healthcare workers. Intriguingly, increased infection risk among fishermen or fishing cultures has been found in other freshwater-associated parasitic diseases, like schistosomiasis [36]. However, the elevated risk for schistosomiasis is likely due to increases in high risk behaviors, like wading, rather than to proximity to water per se [37,38]. Our parallel finding that a fishing culture is associated with higher risk of Guinea worm infection in the dogs of a village suggests that addressing human behaviors that facilitate transmission through fishing-related activities may enhance disease control. However, future work is needed to understand the specific human behaviors captured by the ‘fishing village’ designation that translate to increased infection risk in dogs.

The standard deviation of elevation also emerged as a highly important variable at the village level. Variation in elevation may contribute to a variety of factors influencing parasite transmission, but the most likely of these is that increasing variation in elevation increases the number and size of areas where ephemeral water sources can form, either after rainfall events or as flood waters recede. This idea is supported by the fact that the elevation data used in our models (NASA Shuttle Radar Topography Mission based Digital Elevation Model data) are commonly employed in disaster management and flood projection with the assumption that low elevation areas on a landscape are most likely to flood first and recede last [3941]. These data have also been regularly used to explain infection risk for other water associated diseases such as West Nile virus and malaria [42,43]. Importantly, ephemeral pools may serve as water sources for dogs, possibly concentrating infected copepods and increasing the likelihood of Guinea worm transmission [4].

Intriguingly, the relative importance of fishing village identity and standard deviation in elevation for predicting Guinea worm infection risk varied regionally. The Guinea worm population in Chad forms two distinct genetic sub-populations, one northwest of Manda National Park and one southeast of the park [6]. Our work suggests that dog exposure risk in these two sub-populations is driven by different factors. While dog population size and the number of visits from healthcare supervisors were important in all regions, the difference between fishing villages and non-fishing villages was smaller in southern/eastern villages. The downgrading of fishing village identity as a key predictor variable in the south may suggest differences in the predominant mode of parasite transmission between the two regions. In particular, the two regions likely vary in the relative importance of consumption of fish scraps as a mechanism of transmission. Additionally, the emergence of key climate-related variables for southern villages in our analysis of southern vs. northern villages, and the downgrading of standard deviation in elevation (Fig 2, S1 Fig, S2 Fig), suggests that precipitation may be more important than topography for facilitating the type of dog-water contacts that promote parasite transmission in the southern region. Indeed, the tendency for the drivers of human and animal parasites to differ regionally due to social and environmental differences has been described in other systems [44,45]. These differences can pose significant challenges for disease control, in part, because they generate cryptic heterogeneity in the efficacy of control measures [4648]. For Guinea worm in Chad, our findings suggest that control strategies might need to be tailored by region to maximize efficacy.

In addition to our village level analyses, the identification of spatial hotspots allowed us to investigate the correlates of larger scale patterns of infection. At this broader scale, covariates which vary significantly between nearby villages (e.g. demographic variables like dog population size) provided minimal discriminatory power, with the notable exception of healthcare supervisor visits. Instead, climatic and geographic factors accounted for 55.8% of total variable importance. The fact that the cluster 1 variable, represented by mean annual precipitation and reflecting key geographic and climatic factors, was a more important predictor of infection presence at the hotspot compared to village scale may, in part, be the result of some important village-level variables (e.g. dog population size, fishing village identity) becoming obscured at the hotspot scale. The inclusion of uninfected villages into spatial hotspots may explain this effect. At the hotspot scale, climatic and geographic variables, which vary gradually across space, are likely to show more consistency across a hotspot than demographic variables, thereby serving as more reliable predictors of hotspots. However, the fact that mean annual precipitation clusters with so many climatic and geographic variables makes it difficult to ascertain exactly what factors drive hotspot risk beyond the strong predictive importance of climate and geography. Alternatively, the location of these large, high infection, hotspots may be driven by spatial infection processes including the flow of infected humans and dogs between adjacent villages. In this case, climatic and geographic variables may simply serve as proxies for the importance of proximity to certain high-risk areas. This effect may be particularly relevant in our model due to the very large infection hotspot identified in the north-west Chari region (Fig 1), which likely dominated the boosted regression tree analysis. Alternative modeling strategies which explicitly consider transmission paths between villages and hotspots should be used to generate more nuanced predictions about the spatial drivers of infection risk [18,49].

There are a few important caveats that should to be considered alongside our findings. First, we aggregated our data across a five-year period, assuming no change in the correlates of presence or absence of Guinea worm across the entire study period. This assumption was necessary given the structure of our data but may prove misleading if changes in predictors over the study period were associated with a rapid rise in numbers of cases. Second, we explicitly included seasonality only for the climatic and surface water variables in our models. However, it is likely that other variables, such as dog and human population sizes, also vary seasonally in ways that could influence the incidence of D. medinensis [50]. In this case, the value reported for these predictors in our models may not represent the period for which the effect of these predictors is most pronounced. This issue could have obscured the importance of demographic covariates in our models. Finally, Guinea worm surveillance effort in Chad varied both spatially and temporally during the study period, which may have complicated our analysis. We limited the effect of temporal variation in surveillance in our presence-absence analysis by analyzing data aggregated across multiple surveillance years. We also controlled for spatial variation by including multiple measures of surveillance effort in our models. Importantly, these surveillance covariates accounted for a substantial amount of variation in parasite presence and hotspot identity, suggesting that the remaining variation between villages may be largely independent of differences in surveillance intensity. Despite these limitations, the results of our study align with the general biological understanding of Guinea worm and contribute new information on the correlates of transmission of this parasite.

Current efforts to eradicate Guinea worm disease as a human health concern will need to deal with the presence of a new animal host: the domestic dog. This includes efforts already underway to elucidate new pathways involved in the parasite life cycle (e.g. [4,8,9,35]) and to clarify the modes of indirect transmission between dogs and humans (e.g. [6]). In this study, we approach the problem from a different, but synergistic, perspective. We used prior knowledge of the natural history and epidemiology of the system to generate a set of variables that might influence transmission and asked which are most important for predicting the detection of D. medinensis infection in villages across Chad. Our results support the epidemiological importance of demographic and geographic factors such as human fishing and elevation, at the local scale in northern villages, while emphasizing the importance of climatic correlates in southern villages. In aggregate, our study suggests that localized control measures at the village level, such as the treatment of water sources, should be targeted based on demographic and geographic factors in the north and geographic and climatic risk factors in the south. In contrast, regional control measures, like public information campaigns, should be targeted based on known geographic risk areas. In particular, future work should aim to validate our models with additional surveillance and experimental data to uncover the mechanisms linking Guinea worm transmission to key predictor variables identified in this study.

Supporting information

S1 Table. Model covariates.

A list of the source, resolution, and means (+/- SE), where appropriate, for all variables included in the boosted regression tree models. Climatic, surface water, and floodwater summaries were calculated from monthly minimum, maximum, mean, and total values using the protocol of [28]. Climatic variables in the main text use WorldClim 2.0 data while additional present climate analyses were trained on present climate data with a coarser spatial grain. Variables included after co-linearity reduction are indicated in bold.

(PDF)

S2 Table. Covariate clusters in primary analyses.

This table reports the identity of covariates in each of two identified co-linear clusters. The central variable in each cluster, with the highest mean correlation with all other variables in the cluster, used to represent the cluster in analyses, appears in bold;the Pearson correlation between each covariate and that central variable is also reported.

(PDF)

S3 Table. Interaction strength in village-level model.

This table reports the interaction strength of top ranked pair-wise interactions in the boosted regression tree model for infection presence at the village scale.

(PDF)

S4 Table. Interaction strength in hotspot-level model.

This table reports the interaction strength of top ranked pair-wise interactions in the boosted regression tree model for hotspot identity.

(PDF)

S5 Table. Covariate clusters in present climate analyses.

This table reports the identity of covariates in each of two identified co-linear clusters. The central variable in each cluster, with the highest mean correlation with all other variables in the cluster, used to represent the cluster in analyses, appears in bold; the Pearson correlation between each covariate and that central variable is also reported.

(PDF)

S1 Fig. Northern partial dependence plots.

This figure depicts partial dependence plots showing the effect of (a) identity as a fishing village, (b) dog population, (c) number of healthcare supervisor visits, and (d) standard deviation in elevation on probability of parasite presence in northern villages. Histograms represent the distribution of values for these covariates amongst all training villages.

(PDF)

S2 Fig. Southern partial dependence plots.

Partial dependence plots showing the effect of (a) Cluster 1 (represented by annual precipitation [Bioclim12]), (b) number of healthcare supervisor visits, (c) dog population, and (d) temperature of the driest quarter on probability of parasite presence in southern villages. Histograms represent the distribution of values for these covariates amongst all training villages.

(PDF)

S3 Fig. Hotspot relative importance estimates in northern and southern villages.

This figure depicts relative importance of covariates in predicting hotspot identity in (a) northern and (b) southern villages.

(PDF)

S4 Fig. Village-level strongest interaction partial dependence plot.

This figure depicts the strongest pair-wise interaction in the boosted regression tree model for parasite presence, between remotely sensed human population (x-axis) and fishing village identity. The dotted line represents villages designated as fishing villages and the solid line those not designated as fishing villages.

(PDF)

S5 Fig. Hotspot-level strongest interaction partial dependence plot.

This figure depicts the strongest pair-wise interaction in the boosted regression tree model for village hotspot identity, between cluster 1, represented by annual precipitation (Bioclim 12), and mean temperature of the coldest quarter (Bioclim 11). In this three-dimensional plot the vertical z-axis represents probability of hotspot identity.

(PDF)

S6 Fig. Present climate relative importance estimates.

This figure depicts relative importance of covariates in predicting (a) D. medinensis presence and (b) hotspot identity. Unlike the main analysis, here models are trained on present climate data with a coarser spatial grain than WorldClim data.

(PDF)

S7 Fig. Present climate presence-absence partial dependence plot.

This figure depicts partial dependence plots showing the effect of (a) dog population, (b) number of healthcare supervisor visits, (c) fishing village identity, and (d) standard deviation in elevation on probability of parasite presence in all villages when present climate estimates are used rather than WorldClim variables. Histograms represent the distribution of values for these covariates amongst all training villages.

(PDF)

S8 Fig. Present climate hotspot partial dependence plot.

Partial dependence plots showing the effect of (a) number of healthcare supervisor visits, (b) gridded population density, (c) precipitation of the warmest quarter, and (d) standard deviation in elevation on probability of a village being part of a spatial infection hotspot. Histograms represent the distribution of values for these covariates amongst all training villages.

(PDF)

S1 Code. Sample analysis and visualization code.

This code displays data-sourcing and processing and conducts analyses and visualizations for this paper. Code is an.Rmd file for use with the knitr and rmarkdown packages in the RStudio software.

(RMD)

S1 Data. Training village presence-absence data.

Dataset of villages under surveillance. Village data include the five year case count, presence over the five years, and all covariates detailed in S1 Data are split between S15 and S16 to include the training/testing split for BRT modeling.

(CSV)

S2 Data. Testing Village presence-absence data.

Dataset of villages under surveillance. Village data include the five year case count, presence over the five years, and all covariates detailed in S1 Data are split between S15 and S16 to include the training/testing split for BRT modeling.

(CSV)

S3 Data. Training village hotspot data.

Dataset of villages under surveillance which were able to be scored as within or outside of a spatial hotspot. These data are included independently of S16 and S17 to allow replication of hotspot analyses both with and without employing the SatScan software. Village data include the five year case count, presence over the five years, presence in a hotspot,and all covariates detailed in S1 Data are split between S17 and S18 to include the training/testing split for BRT modeling.

(CSV)

S4 Data. Testing village hotspot data.

Dataset of villages under surveillance which were able to be scored as within or outside of a spatial hotspot. These data are included independently of S16 and S17 to allow replication of hotspot analyses both with and without employing the SatScan software. Village data include the five year case count, presence over the five years, presence in a hotspot,and all covariates detailed in S1 Data are split between S17 and S18 to include the training/testing split for BRT modeling.

(CSV)

Acknowledgments

We thank the Chadian Ministry of Health as well as all Technical Advisors for collecting and providing surveillance data. We would also like to thank Mark Eberhard, Hubert Zirimwabagabo, Elisabeth Chop, Karmen Unterwegner, and The Carter Center for insight into ongoing eradication efforts. http://www.cartercenter.org/donate/corporate-government-foundation-partners/index.html Finally, we thank Sarah Guagliardo and Ashton Griffin for consultation on data management and analysis.

Data Availability

Data and code required to reproduce analyses are included in the Supporting Information files.

Funding Statement

This study was financially supported by The Carter Center (https://www.cartercenter.org) and members of this organization were involved in the study. Ernesto Ruiz-Tiben from The Carter Center helped to conceive the study to serve the needs of the Guinea Worm Eradication Program. The Carter Center curated data which were collected by the Guinea Worm Eradication Program in coordination with the Chadian Ministry of Health. Adam Weiss and Ernesto Ruiz-Tiben from The Carter Center provided substantive comments on the manuscript prior to submission. No representatives from The Carter Center were involved in the design of the statistical methodology, statistical analysis, or preparation of the original manuscript draft. RLR was supported by a National Science Foundation Graduate Research Fellowship (https://www.nsfgrfp.org/) [grant number 2017203341]. Additional support for CAC was received from the ARCS Atlanta chapter (https://atlanta.arcsfoundation.org/). NSF and ARCS had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Muller R. The possible mode of action of some chemotherapeutic agents in guinea worm disease. Trans R Soc Trop Med Hyg. 1971;65(6):843–4. 10.1016/0035-9203(71)90107-6 [DOI] [PubMed] [Google Scholar]
  • 2.Molyneux D, Sankara DP. Guinea worm eradication: Progress and challenges—should we beware of the dog? PLoS Negl Trop Dis. 2017;11(4):e0005495 10.1371/journal.pntd.0005495 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Eberhard ML, Ruiz-Tiben E, Hopkins DR. Dogs and Guinea worm eradication. Lancet Infect Dis. 2016;16:1225–6. 10.1016/S1473-3099(16)30380-2 [DOI] [PubMed] [Google Scholar]
  • 4.Eberhard ML, Ruiz-Tiben E, Hopkins DR, Farrell C, Toe F, Weiss A, et al. The peculiar epidemiology of dracunculiasis in Chad. Am J Trop Med Hyg. 2014;90(1):61–70. 10.4269/ajtmh.13-0554 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Williams BM, Cleveland CA, Verocai GG, Swanepoel L, Niedringhaus KD, Paras KL, et al. Dracunculus infections in domestic dogs and cats in North America; an under-recognized parasite? Vet Parasitol Reg Stud Rep. 2018;13:148–155. [DOI] [PubMed] [Google Scholar]
  • 6.Thiele EA, Eberhard ML, Cotton JA, Durrant C, Berg J, Hamm K, et al. Population genetic analysis of Chadian Guinea worms reveals that human and non-human hosts share common parasite populations. PLoS Negl Trop Dis. 2018;12(10):e0006747 10.1371/journal.pntd.0006747 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Biswas G, Sankara DP, Agua-Agum J, Maiga A. Dracunculiasis (guinea worm disease): eradication without a drug or a vaccine. Philos Trans R Soc B Biol Sci. 2013;368(1623):20120146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Cleveland CA, Eberhard ML, Thompson AT, Smith SJ, Zirimwabagabo H, Bringolf R, et al. Possible role of fish as transport hosts for Dracunculus spp. larvae. Emerg Infect Dis. 2017;23(9):1590 10.3201/eid2309.161931 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Cleveland CA, Eberhard ML, Thompson AT, Garrett KB, Swanepoel L, Zirimwabagabo H, et al. A search for tiny dragons (Dracunculus medinensis third-stage larvae) in aquatic animals in Chad, Africa. Sci Rep. 2019;9(1):375 10.1038/s41598-018-37567-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Hopkins DR. Progress Toward Global Eradication of Dracunculiasis—January 2017–June 2018. MMWR Morb Mortal Wkly Rep. 2018;67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Randall CJ, van Woesik R. Contemporary white-band disease in Caribbean corals driven by climate change. Nat Clim Change. 2015;5(4):375. [Google Scholar]
  • 12.Mansiaux Y, Carrat F. Detection of independent associations in a large epidemiologic dataset: a comparison of random forests, boosted regression trees, conventional and penalized logistic regression for identifying independent factors associated with H1N1pdm influenza infections. BMC Med Res Methodol. 2014;14(1):99. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zhang W, Du Z, Zhang D, Yu S, Hao Y. Boosted regression tree model-based assessment of the impacts of meteorological drivers of hand, foot and mouth disease in Guangdong, China. Sci Total Environ. 2016;553:366–371. 10.1016/j.scitotenv.2016.02.023 [DOI] [PubMed] [Google Scholar]
  • 14.Clements AC, Lwambo NJ, Blair L, Nyandindi U, Kaatano G, Kinung’hi S, et al. Bayesian spatial analysis and disease mapping: tools to enhance planning and implementation of a schistosomiasis control programme in Tanzania. Trop Med Int Health. 2006;11(4):490–503. 10.1111/j.1365-3156.2006.01594.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Woodhall DM, Wiegand RE, Wellman M, Matey E, Abudho B, Karanja DMS, et al. Use of Geospatial Modeling to Predict Schistosoma mansoni Prevalence in Nyanza Province, Kenya. PLOS ONE. 2013. August 14;8(8):e71635 10.1371/journal.pone.0071635 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Dobson AP. Models for multi-species parasite-host communities. In: Parasite communities: patterns and processes. Springer; 1990. p. 261–288. [Google Scholar]
  • 17.Arneberg P, Skorping A, Grenfell B, Read AF. Host densities as determinants of abundance in parasite communities. Proc R Soc Lond B Biol Sci. 1998;265(1403):1283–1289. [Google Scholar]
  • 18.Kramer AM, Pulliam JT, Alexander LW, Park AW, Rohani P, Drake JM. Spatial spread of the West Africa Ebola epidemic. R Soc Open Sci. 2016;3(8):160294 10.1098/rsos.160294 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Fick SE, Hijmans RJ. WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas. Int J Climatol. 2017;37(12):4302–4315. [Google Scholar]
  • 20.Gelaro R, McCarty W, Suárez MJ, Todling R, Molod A, Takacs L, et al. The modern-era retrospective analysis for research and applications, version 2 (MERRA-2). J Clim. 2017;30(14):5419–5454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Rienecker MM, Suarez MJ, Gelaro R, Todling R, Bacmeister J, Liu E, et al. MERRA: NASA’s modern-era retrospective analysis for research and applications. J Clim. 2011;24(14):3624–3648. [Google Scholar]
  • 22.Novella NS, Thiaw WM. African rainfall climatology version 2 for famine early warning systems. J Appl Meteorol Climatol. 2013;52(3):588–606. [Google Scholar]
  • 23.Kleinman K. rsatscan: Tools, classes, and methods for interfacing with SaTScan stand-alone software. CRAN R-Proj Orgpackage Rsatscan R Package Version 03. 2015;9200. [Google Scholar]
  • 24.Kulldorff M. SaTScan user guide for version 9.4. 2017.
  • 25.Zeimes CB, Olsson GE, Ahlm C, Vanwambeke SO. Modelling zoonotic diseases in humans: comparison of methods for hantavirus in Sweden. Int J Health Geogr. 2012;11(1):39. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Hay Simon I., Battle Katherine E., Pigott David M., Smith David L., Moyes Catherine L., Bhatt Samir, et al. Global mapping of infectious disease. Philos Trans R Soc B Biol Sci. 2013. March 19;368(1614):20120250. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Bhatt S, Gething PW, Brady OJ, Messina JP, Farlow AW, Moyes CL, et al. The global distribution and burden of dengue. Nature. 2013. April;496(7446):504–7. 10.1038/nature12060 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Dormann CF, Elith J, Bacher S, Buchmann C, Carl G, Carré G, et al. Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography. 2013. January 1;36(1):27–46. [Google Scholar]
  • 29.Kaufman L, Rousseeuw PJ. Finding groups in data: an introduction to cluster analysis. Vol. 344 John Wiley & Sons; 2009. [Google Scholar]
  • 30.Hijmans RJ, Phillips S, Leathwick J, Elith J, Hijmans MRJ. Package ‘dismo.’ Circles. 2017;9(1):1–68. [Google Scholar]
  • 31.Phillips SJ, Elith J. POC plots: calibrating species distribution models with presence-only data. Ecology. 2010;91(8):2476–2484. 10.1890/09-0760.1 [DOI] [PubMed] [Google Scholar]
  • 32.Breiman L. Using iterated bagging to debias regressions. Mach Learn. 2001;45(3):261–277. [Google Scholar]
  • 33.Elith J, Leathwick JR, Hastie T. A working guide to boosted regression trees. J Anim Ecol. 2008;77(4):802–813. 10.1111/j.1365-2656.2008.01390.x [DOI] [PubMed] [Google Scholar]
  • 34.Dallas TA, Han BA, Nunn CL, Park AW, Stephens PR, Drake JM. Host traits associated with species roles in parasite sharing networks. Oikos. 2018; 10.1111/oik.05802 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Eberhard ML, Yabsley MJ, Zirimwabagabo H, Bishop H, Cleveland CA, Maerz JC, et al. Possible role of fish and frogs as paratenic hosts of Dracunculus medinensis, Chad. Emerg Infect Dis. 2016;22(8):1428 10.3201/eid2208.160043 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Ross AG, Yuesheng L, Sleigh AS, Yi L, Williams GM, Wu WZ, et al. Epidemiologic features of Schistosoma japonicum among fishermen and other occupational groups in the Dongting Lake region (Hunan Province) of China. Am J Trop Med Hyg. 1997;57(3):302–308. 10.4269/ajtmh.1997.57.302 [DOI] [PubMed] [Google Scholar]
  • 37.Dalton PR, Pole D. Water-contact patterns in relation to Schistosoma haematobium infection. Bull World Health Organ. 1978;56(3):417 [PMC free article] [PubMed] [Google Scholar]
  • 38.Sama MT, Oyono E, Ratard RC. High risk behaviours and schistosomiasis infection in Kumba, South-West Province, Cameroon. Int J Environ Res Public Health. 2007;4(2):101–105. 10.3390/ijerph2007040003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Schumann G, Matgen P, Cutler MEJ, Black A, Hoffmann L, Pfister L. Comparison of remotely sensed water stages from LiDAR, topographic contours and SRTM. ISPRS J Photogramm Remote Sens. 2008. May 1;63(3):283–96. [Google Scholar]
  • 40.Yan K, Di Baldassarre G, Solomatine DP, Schumann GJ-P. A review of low-cost space-borne data for flood modelling: topography, flood extent and water level. Hydrol Process. 2015. July 15;29(15):3368–87. [Google Scholar]
  • 41.Winsemius H, Donchyts G, Eilander D, Chen J, Leskens A, Coughlan E, et al. Urban topography for flood modeling by fusion of OpenStreetMap, SRTM and local knowledge. In 2016. p. EPSC2016-14027. [Google Scholar]
  • 42.Kabaria CW, Molteni F, Mandike R, Chacky F, Noor AM, Snow RW, et al. Mapping intra-urban malaria risk using high resolution satellite imagery: a case study of Dar es Salaam. Int J Health Geogr. 2016. July 30;15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Epp TY, Waldner C, Berke O. Predictive risk mapping of West Nile virus (WNV) infection in Saskatchewan horses. Can J Vet Res. 2011. July;75(3):161–70. [PMC free article] [PubMed] [Google Scholar]
  • 44.Medlock JM, Hansford KM, Bormane A, Derdakova M, Estrada-Peña A, George J-C, et al. Driving forces for changes in geographical distribution of Ixodes ricinus ticks in Europe. Parasit Vectors. 2013;6(1):1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Rudge JW, Webster JP, Lu D-B, Wang T-P, Fang G-R, Basáñez M-G. Identifying host species driving transmission of schistosomiasis japonica, a multihost parasite system, in China. Proc Natl Acad Sci. 2013;110(28):11457–11462. 10.1073/pnas.1221509110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Real LA, Biek R. Spatial dynamics and genetics of infectious diseases on heterogeneous landscapes. J R Soc Interface. 2007;4(16):935–948. 10.1098/rsif.2007.1041 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Mathias DK, Ochomo E, Atieli F, Ombok M, Bayoh MN, Olang G, et al. Spatial and temporal variation in the kdr allele L1014S in Anopheles gambiae ss and phenotypic variability in susceptibility to insecticides in Western Kenya. Malar J. 2011;10(1):10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Deming R, Manrique-Saide P, Barreiro AM, Cardeña EUK, Che-Mendoza A, Jones B, et al. Spatial variation of insecticide resistance in the dengue vector Aedes aegypti presents unique vector control challenges. Parasit Vectors. 2016;9(1):67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Maher SP, Kramer AM, Pulliam JT, Zokan MA, Bowden SE, Barton HD, et al. Spread of white-nose syndrome on a network regulated by geography and climate. Nat Commun. 2012;3:1306 10.1038/ncomms2301 [DOI] [PubMed] [Google Scholar]
  • 50.Aagaard-Hansen J, Nombela N, Alvar J. Population movement: a key factor in the epidemiology of neglected tropical diseases. Trop Med Int Health. 2010;15(11):1281–1288. 10.1111/j.1365-3156.2010.02629.x [DOI] [PubMed] [Google Scholar]
PLoS Negl Trop Dis. doi: 10.1371/journal.pntd.0008620.r001

Decision Letter 0

Jeremiah M Ngondi, Banchob Sripa

22 Sep 2019

Dear Mr. Richards:

Thank you very much for submitting your manuscript "Identifying drivers of Guinea worm (Dracunculus medinensis) infection in domestic dogs" (#PNTD-D-19-01263) for review by PLOS Neglected Tropical Diseases. Your manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the manuscript as it currently stands. These issues must be addressed before we would be willing to consider a revised version of your study. We cannot, of course, promise publication at that time.

We therefore ask you to modify the manuscript according to the review recommendations before we can consider your manuscript for acceptance. Your revisions should address the specific points made by each reviewer.

When you are ready to resubmit, please be prepared to upload the following:

(1) A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

(2) Two versions of the manuscript: one with either highlights or tracked changes denoting where the text has been changed (uploaded as a "Revised Article with Changes Highlighted" file); the other a clean version (uploaded as the article file).

(3) If available, a striking still image (a new image if one is available or an existing one from within your manuscript). If your manuscript is accepted for publication, this image may be featured on our website. Images should ideally be high resolution, eye-catching, single panel images; where one is available, please use 'add file' at the time of resubmission and select 'striking image' as the file type.

Please provide a short caption, including credits, uploaded as a separate "Other" file. If your image is from someone other than yourself, please ensure that the artist has read and agreed to the terms and conditions of the Creative Commons Attribution License at http://journals.plos.org/plosntds/s/content-license (NOTE: we cannot publish copyrighted images).

(4) If applicable, we encourage you to add a list of accession numbers/ID numbers for genes and proteins mentioned in the text (these should be listed as a paragraph at the end of the manuscript). You can supply accession numbers for any database, so long as the database is publicly accessible and stable. Examples include LocusLink and SwissProt.

(5) To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see http://journals.plos.org/plosntds/s/submission-guidelines#loc-methods

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

We hope to receive your revised manuscript by Nov 21 2019 11:59PM. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by replying to this email.

To submit a revision, go to https://www.editorialmanager.com/pntd/ and log in as an Author. You will see a menu item call Submission Needing Revision. You will find your submission record there.

Sincerely,

Jeremiah M. Ngondi, MB.ChB, MPhil, MFPH, Ph.D

Associate Editor

PLOS Neglected Tropical Diseases

Banchob Sripa

Deputy Editor

PLOS Neglected Tropical Diseases

***********************

Reviewer's Responses to Questions

Key Review Criteria Required for Acceptance?

As you describe the new analyses required for acceptance, please consider the following:

Methods

-Are the objectives of the study clearly articulated with a clear testable hypothesis stated?

-Is the study design appropriate to address the stated objectives?

-Is the population clearly described and appropriate for the hypothesis being tested?

-Is the sample size sufficient to ensure adequate power to address the hypothesis being tested?

-Were correct statistical analysis used to support conclusions?

-Are there concerns about ethical or regulatory requirements being met?

Reviewer #1: L132 ‘We obtained data on D. medinensis incidence in domestic dogs…’

Should the term ‘incidence’ be replaced with ‘cases’?

If incidence is the correct term, it would be helpful to know more information on how it was calculated given incidence requires knowledge on the population at risk e.g. was the population defined as dogs from the villages studied in this investigation? Given dogs less than 10-14 months old can’t be GW positive, does the population include puppies? Also, is a dog that has multiple worms counted twice?

L141-142 ‘…but the number of infections per village varied widely (e.g. 2016 range: 0-71), and generally increased over time (Supplemental Fig 1).’

Given dog infections were only realised in ~2012, is this increase due to the increased surveillance effort for dog cases over time?

What supplementary figure is this referring to? S1 is a table and I can’t work out which figure it would be, perhaps it wasn’t uploaded?

L147-150 ‘Finally, given recent work on the genetic structure of Guinea worm in Chad identifying Northern and Southern parasite sub populations…’

This is a neat idea. However, I can’t find any mention in the methods as to how this division was made? Were clear criteria used to determine this variable for villages? Some clarity on how this was done would be useful.

L151-152 ‘…we collated data on 32 environmental and demographic variables…’

Has there been any assessment for the models being overfitted or for collinearity?

I would want to see a correlation matrix for the predictor variables, or at least some report of the absence/presence of collinearity. I suspect that there is strong multicollinearity. If so, caution should be taken as even BRT models are not safe from the effects of strong collinearity (see Dormann et al. "Collinearity: a review of methods to deal with it and a simulation study evaluating their performance." Ecography 36.1 (2013): 27-46).

L 156-157 ‘…a single summary value, for the 6-year study period, was included for each village.’

What is the summary value (max, mean, median)?

L165-166 ‘…lower spatial resolution data collected via remote sensing over the same time period as the parasite observations.’

What is the source of this remote sensing data?

L174 ‘Spatial hotspot analysis…’

How was the location of the villages collected, via the GWEP staff?

More importantly, did the spatial analysis require a projection, if so what projection was used? In the Rcode in the supplementary material EPSG:26978 is used. However, this projection is for Kansas south and not Chad. This could lead to distortions that could impact any spatial analysis.

L187-191 ‘…(i) predicting whether D. medinensis was detected in a village over the 6-year study period and (ii) predicting whether…’

This sentence is very helpful as it clearly states the purpose of the BRTs and mentions an important consideration of villages in hotspots. In fact, this whole paragraph for BRT methods is very well written.

My only criticism is that AUC is not mentioned here and yet it is a key part of the reporting in the results (L224).

L191-194 ‘…we also (i) allowed interactions between longitude/latitude and other covariates…’

(1) Were interactions allowed between collinear predictors?

(2) If surveillance effort changed over the course of the observation period, could this be accounted for in the BRT models?

(3) There is also the effect of control measures that could influence the status of a village. Again, can the BRT models account for a measure of variation in control strategies used in villages e.g. abate might be more aggressively/frequently applied in some areas than others.

I realise that 2 & 3 might be hard to include in models, but they could be very important. It would be good to at least acknowledge them as caveats in the discussion.

Reviewer #2: Yes the study desing is appropriate to unearth factors predicting endemicity of Gunea worm infection. The population of villages and corresponding number of dogs was also appropriate. Appropriate discussion is presented as to the limitations that may arise due to sample size. The statistical tests performed on the data was also satisfactory as was the conclusions drawn from them. No ethical concenrs are required for these studies.

Reviewer #3: See below. There are issues with the confounding effects of changes in disease surveillance that are likely happening at the same time as the animal epidemic is increasing. It will be hard, if not impossible, to disentangle these. There are other smaller issues, e.g. how fishing villages are classified.

--------------------

Results

-Does the analysis presented match the analysis plan?

-Are the results clearly and completely presented?

-Are the figures (Tables, Images) of sufficient quality for clarity?

Reviewer #1: L209-217 ‘3.1 Identifying spatial hotspots of infection…’

I find this entire section very confusing. Some of the numbers do not add up (see my comments below) or, if they do, require some explaining.

In addition, I think some small adjustments could help to better guide the reader through the results. For example, rather than reporting the number of ‘GW positive villages / total villages’ in the methods (L140), it would flow better if put in the results. I would then want to know how many villages were identified in hotspots and, within this, how many were GW positive and GW negative villages. This would help to identify how many GW positive villages have not been identified in hotspots and how many GW negative villages are included in hotspots. It would then be intuitive to provide the same break down for the north and south area.

If possible I would also like to see the sample size/means ± SE for the data on villages that is used in the BRT models. My suggestion would be to either provide a new table or to add this data to table S1 (with a column for all villages, a second for northern villages and a third for southern villages). Maybe just for the WorldClim data?

L211-212 ‘Overall, 44.2% (703 out of 1592) of villages were identified as belonging to a spatial hotspot (Fig 1)…’

Wasn’t there 2125 villages (reported in the methods L135), where has 1592 come from?

L212-214 ‘… 53.9% (637 out of 1182) of villages belonged to a hotspot, with fewer clusters in the southern/eastern reaches where only 16.1% (66 out of 410) of villages belonged to a hotspot.’

The totals in the brackets sum to the total mentioned in L211 (1592), but again I’m not sure where this total has come from. Am I right in thinking the 1182 and 410 is the total number of villages studied in the north and south respectively?

L224-225 ‘The BRT model fit data on D. medinensis presence across all villages with a mean AUC of 0.895 in cross-validation and an evaluation AUC of 0.897.’

As mentioned in a previous comment, it would be helpful to introduce the use of AUC in the methods.

L232-235 ‘Fishing village identity was only important in northern villages (as suggested in Fig 3c)…’

Mentioning Fig 3 here is misleading as it could be interpreted as this figure showing data for the analysis on just the northern villages (when it is for the analysis on all villages).

L246-249 ‘…although there were many important interactions, the largest of which was between latitude and mean diurnal temperature range…’

It isn’t clear to me what the interaction is between latitude and mean diurnal temperature? I don’t think this is mentioned in the discussion and so it isn’t clear why this is important? Also, why is no comment made on these other important interactions? There are some further interacting variables with longitude, do they not hint as to what is going on in the southern villages?

L254-257 ‘Precipitation of the coldest quarter emerged as important in predicting dog infection at the village scale (10.2% Importance). This change…’

This point would be clearer if the actual change was mentioned.

Figure 1

This might produce an ugly figure, but if possible it would be great if some/all of the following could be identified in the map:

(1) The 14 spatial hotspots. It would help the reader gauge the appropriateness of the spatial scan and provide visual information on the spatial clusters (i.e. their distribution and variation in size).

(2) The north/south divide

(3) GW positive & negative villages

Figure 2

Nice plots, just need to add the panel labels (A, B, C, D).

Figure 3

Need to add the panel labels. I would also suggest changing the y axis for the ’probability of infection’ so that they are the same in each plot.

Are the histograms for the frequency of GW positive villages or for all villages? This should be clearer in the figure description. I assume (given the numbers presented) that this is for all villages. If true, would it not be more informative in the context of these plots to show the frequency of GW positive and negative villages? Currently it is hard to see if the model is a good fit to the data.

S1 Table

Make sure that the names of variables are consistent with the plots for relative importance e.g. in the table there is 'Distance to nearest permanent water', but this does not appear in the plots for relative importance. I assume that 'RiverDist' is the equivalent variable?

Reviewer #2: Yes the analysis matches the pan and results are clearly presented. I found the figures to be reasonably presented. Figures 3 and 4 could use a bit clearer labels on y axes.

Reviewer #3: See below. Some of the results are interesting and informative, e.g. the interaction between fishing status and longitude and the variation in elevation. I have reservations about the other results.

--------------------

Conclusions

-Are the conclusions supported by the data presented?

-Are the limitations of analysis clearly described?

-Do the authors discuss how these data can be helpful to advance our understanding of the topic under study?

-Is public health relevance addressed?

Reviewer #1: (No Response)

Reviewer #2: Yes to all these points.

Reviewer #3: See below. The conclusions in relation to targeting and control and poorly supported by the analyses.

--------------------

Editorial and Data Presentation Modifications?

Use this section for editorial suggestions as well as relatively minor modifications of existing data that would enhance clarity. If the only modifications needed are minor and/or editorial, you may wish to recommend “Minor Revision” or “Accept”.

Reviewer #1: (No Response)

Reviewer #2: (No Response)

Reviewer #3: L94 – in relation to the classical (typical) life cycle, presumably, the ingestion of copepods in water while drinking applies to humans and to other mammalian hosts, as much as the non-classical cycle mentioned in the next line.

L97 – there is a “potentially” missing here, to read “potentially allowing them” since it is not clear that they do act in this way to humans or other hosts.

L106 – yes the number of dog infections recorded has increased, though it is not clear that this is due to a genuine increase in incidence or to an improvement in recording.

L135-136. Is surveillance really a ‘daily search’ across >2000 villages? Or rather an elicitation of case reporting, via rewards schemes and a reporting network? This description suggests something much more labour intensive, comprehensive and systematic than the latter.

L142 – Figure S1 and Figure S2 were not included in the review manuscript and were not available for download.

--------------------

Summary and General Comments

Use this section to provide overall comments, discuss strengths/weaknesses of the study, novelty, significance, general execution and scholarship. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. If requesting major revision, please articulate the new experiments that are needed.

Reviewer #1: The authors present a village/regional level analysis of the environmental and anthropogenic predictors of Guinea worm infection in dogs in Chad. The most striking result suggests there are different drivers of infection in northern and southern villages. In the north, ‘fishing’ villages have a higher probability of infection, while environmental variables seem more predictive of cases in the south. These results are important for informing control efforts of an eradication program that has entered its final stages.

Overall, this is a well written manuscript on a very topical and interesting subject. However, I think there are several areas for improvement/clarity that need to be made before the manuscript can be accepted for publication:

(1) The analytical methods are appropriate and articulated clearly, but there is no mention of any checks for model overfitting or collinearity of the predictors. Collinearity could cause problems in the interpretation of the BRT model output. If strong collinearity is discovered and the models need to be altered, the results could be substantially changed given that BRT models are sensitive to even small perturbations of the data input.

(2) There needs to be some presentation of the sample sizes for the variables being used in the BRT models. This would help the reader to get a better understanding of the data being analysed.

(3) Some of the results do not align with that mentioned in the methods and need to either be corrected or an explanation provided. There is also the scope to provide more summary information that would be useful to the reader.

Reviewer #2: (No Response)

Reviewer #3: The eradication of Guinea worm has progressed well and animals are now a barrier to further progress. The topic of this paper is therefore of considerable interest and is ideally suited to this journal. Understanding the biological drivers of Guinea worm infection/incidence in animals and dogs in particular would be valuable to future control efforts. However, I have several reservations about the specifics of this analysis, which, in my view, limit the utility and relevance of this contribution.

In their summary, the authors offer the results as guidance for targeting control and surveillance, and point towards elevated risks associated with whether villages are engaged in fishing, larger dog populations, longitude, variation in elevation. They also analyse village location in hotspots. Of the effects reported, variation in elevation and, possibly fishing activity (if this is independent of survey methods and variation in surveillance intensity - see below), are potentially the most biologically informative. However, the other effects might alternatively be framed as recommendations to look for dog infections where there are lots of dogs and, if surveillance effort follows infections, to look for more infections where surveillance has been focused previously. Similarly, the effects of longitude are hard to translate into guidance, other than in relation to fishing, and presence in a hotspot appears compromised by the overwhelming effect of apparently categorising the NW Chari into a single hotspot. I detail concerns about each of these areas in the following points.

There is no mention of variation in surveillance, which presumably has varied spatially and temporally over the duration of the study and since the re-emergence of Guinea worm in Chad. It is important to quantify how surveillance might have changed and to take this into account. A glance at the detail of Guinea worm surveillance activities in the “Guinea worm wrap-up” suggests that surveillance has been stepped up across the board in Chad: “Some of the increase in infected dogs reported probably resulted from expansion of the number of villages under active surveillance (VAS) from 1,895 at the end of 2018 to 2,138 as of May 2019.” Villages have also apparently been classified as Level 1 to Level 3 villages, presumably reflecting varying surveillance effort. Which villages were subject to the greatest surveillance and how did this vary over time as both the apparent epidemic and surveillance were growing? Are the authors actually demonstrating the effects of an increase, or spatial intensification, in surveillance effort. For control purposes, this variation in effort might very sensibly be focused on big villages, fishing villages, those in particular regions, or in infection hotspots (as in Levels 1-3). It is important to isolate and account for these obvious and influential logistical, practical changes in surveillance as a driver of variation in the detection of infections, as much as genuine biological risk factors. Either way, these potential effects cannot reliably be described as drivers of infection in an epidemiological sense.

The effect of dog population size on the classification of a village as an “infected village” is not at all surprising, and would be apparent if the incidence of infection was genuinely uniform across all areas. Indeed, it would be surprising if this were not an effect. This “per village” rate of infection, is not to be confused with an effect of dog population size on the “per dog” rate of infection, though the latter has not been quantified or tested. This is likely to be the unavoidable consequence of being more likely to find a dog infection when there are more dogs to look at. The authors acknowledge this at L276. However, it would clearly be a mistake to interpret this as a driver of infection and/or to target surveillance or control on this basis. The distribution of dog population sizes in Figure 3 suggests that almost all villages have small dog populations, perhaps <~100 dogs and so many small villages would still harbor dog infections if surveillance was focused upon the few, larger villages.

Similarly, the longitude effect, which is the largest single effect across the paper, is not clearly spelled out and so the biology and targeting consequences remain unclear. According to Figure 1, the Chari River runs N-S, but the southern affected area lies to the E and the Northern Area to the W. If we look at Figure 1, this maps out villages in hotspots and those that are not, but not the hotspots themselves. There are no green dots north of about half-way up the country. So are all the orange dots in the North, i.e. all the infected villages in the north, all in one big hotspot, and the others all in very much smaller separate hotspots? The text suggests there are 14, but 14 are not apparent from the figure. The Figure is also cut off at the bottom and would benefit from lines of longitude. Presentational issues aside, the differing magnitude of these hotspots seems likely to introduce a heavy influence of this single large NW hotspot in later analyses. Thus, the analysis reported at L250-257 and in Figure 4 appears largely to distinguish the location of the “NW big hotspot”, i.e. all the villages in the NW, to be in the NW hotspot. Again, this does not appear helpful and would be misleading in targeting.

L159-162 It is not made clear how villages were classified as a “fishing village” or otherwise, or how the “majority of the population” was determined? According to the supplementary information, this was done by the Guinea worm survey field teams, it is therefore possible that this introduced a systematic bias with respect to Guinea worm incidence. This needs to be shown to be otherwise. That said, the interaction between fishing village classification and location in the NW or SE is potentially of interest.

L262 it is not elevation, but standard deviation in elevation that was influential. This is potentially an important distinction, given the variation in relief associated with surface water availability.

Incidentally, the analyses appear not to distinguish between villages in which one infection has been recorded and those in which infection is a more frequent occurrence? It appears as though the analysis distinguishes between a binary classification. This seems like a missed opportunity, given the availability of data and the very wide variation in reported cases over the study.

--------------------

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Jared K Wilson-Aggarwal

Reviewer #2: No

Reviewer #3: No

PLoS Negl Trop Dis. doi: 10.1371/journal.pntd.0008620.r003

Decision Letter 1

Jeremiah M Ngondi, Banchob Sripa

17 Feb 2020

Dear Mr. Richards,

Thank you very much for submitting your manuscript "Identifying drivers of Guinea worm (Dracunculus medinensis) infection in domestic dogs" for consideration at PLOS Neglected Tropical Diseases. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Jeremiah M. Ngondi, MB.ChB, MPhil, MFPH, Ph.D

Associate Editor

PLOS Neglected Tropical Diseases

Banchob Sripa

Deputy Editor

PLOS Neglected Tropical Diseases

***********************

Reviewer's Responses to Questions

Key Review Criteria Required for Acceptance?

As you describe the new analyses required for acceptance, please consider the following:

Methods

-Are the objectives of the study clearly articulated with a clear testable hypothesis stated?

-Is the study design appropriate to address the stated objectives?

-Is the population clearly described and appropriate for the hypothesis being tested?

-Is the sample size sufficient to ensure adequate power to address the hypothesis being tested?

-Were correct statistical analysis used to support conclusions?

-Are there concerns about ethical or regulatory requirements being met?

Reviewer #1: L161 – 164 ‘…we collated data on 35 variables that might contribute to transmission….’

There are still some inconsistencies and areas for clarification in the use of language in the methods, Table S1 and Figure 2. See my specific comments for Table S1 and Figure 2 below.

An example is ‘RemotePop’ which appears in Figure 2. From the results it looks like this is a measure of the population from remote sensing data? However, there is no mention of this variable (or how it is calculated) in the methods, and it is not in Table S1.

Aren’t there more than 35 variables considered when you include the controls for surveillance effort?

L172 ‘Villages were identified by GWEP staff as “fishing” if the majority of families...’

Be more explicit, what is meant by majority, >50%?

L173 - 176 ‘To control for spatial variation in surveillance, we also included two measures of surveillance effort…’

Add a few words to explain the differences in what these control for, it is not immediately clear. Perhaps just add a line to the same effect of that in your reply to reviewer comments ‘The second variable encompasses some degree of spatio-temporal variation in surveillance as villages have no visits for years that they went unsurveilled.’

L211 – 213 ‘…we diagnosed significant co-linearity in predictors…’

Did you include the variables for sampling effort in the co-linearity analysis? It would be interesting to know whether or not measures of sampling effort are correlated with demographic parameters.

L213 – 214 ‘Co-linear clusters were reduced to the single most central variable…’

My understanding is that this method involves omitting variables identified in a cluster (except the central variable) from the model. With this in mind, why do variables in the clusters (that are not the central variable) appear in the relative importance plots and interaction tables? For example BIO9, BIO11 and remotely sensed population. Is this because you only omitted variables in a cluster if they went above a threshold e.g. r > 0.7?

L224 – 227 ‘…we also (i) allowed interactions between longitude/latitude and other covariates…’

Why are interactions added into the models when they are ignored in the results and discussion? Furthermore, it isn’t entirely obvious to me how to interpret the interaction results reported in the supplementary tables and figures. I would expect that these interactions are quite important to our understanding and interpretation of the results and therefore some elaboration is required.

Also, I interpreted this to mean that separate N/S analyses were done for both the village and hotspot scales. However, this does not seem to be the case. I would suggest either adding the N/S analyses for the hotspot scale or to be more explicit and to say that this analysis was only conducted for the village scale.

Reviewer #3: In relation to objectives and design:

I think that in this revision there remains a mismatch between the prominently stated ambition to identify drivers of infection in dogs and the reality, which is identification of correlates of village-level detection of dog infections and a coarse analysis of locations of infection hotspots. I don't think this is being fussy. The authors just need to be a bit more honest and conservative in the headline elements of the paper when it comes to talking about what they are able to do with the data they have. Therefore, the Methods adopted are appropriate to a different aim of understanding correlative patterns in the long-term detection of dog infections in villages, but they are not suitable, and the data underpinning the analyses are not suitable, for the stated aim of an analysis of causative drivers of infection in dogs. I don't think therefore that it is possible to support the title or Abstract with the data used and Methods adopted with any amount of revision of the approach. Rather the stated ambition, title and scope of the paper needs to shift to fit with the data, Methods and analyses.

I have no concerns about ethical or regulatory requirements.

--------------------

Results

-Does the analysis presented match the analysis plan?

-Are the results clearly and completely presented?

-Are the figures (Tables, Images) of sufficient quality for clarity?

Reviewer #1: L292 – 295 ‘The most important variables included cluster 1 (represented by Bioclim 12, 36.0% Importance)… mean temperature of the coldest quarter (10.2% Importance) and mean temperature of the driest quarter (Bioclim 9, 9.6% Importance).’

To help the reader there needs to be some consistency when referring to Bioclim variables. I would suggest using the format you use at the end of this sentence when referring to Bioclim9 i.e. say what it actually is and add the shorthand in brackets.

Figure 1

Can you provide a reason for why the spatial hotspots overlap? This doesn’t seem right to me and I would suggest that something has gone wrong, however, I do not know the intricacies of the spatial scan and whether or not this is normal behaviour. The obvious answer would be due to temporal overlap, but this cannot be the case in this analysis as GW positive villages are identified using the entire observation period. Is there something in the R package guidance or literature that could explain this, or is this an error? Perhaps the scan is using the temporal aspects of the data rather than ignoring it?

Table S1

A few suggestions to help guide the reader:

1) Include the abbreviations used in Figure 2 in brackets or add them to the caption for Figure 2.

2) Add RemotePop

3) Add a section of rows for the variables controlling for sampling effort and report their mean & SE

4) For the last three column headings keep the ‘N=’ but change the rest of the text to something like ‘All villages’, ‘Northern villages’ and ‘Southern villages’. Then, in the legend, put something to the effect of ‘Where appropriate the means (± SE) are reported’.

5) Include the number and/or proportion of villages that were identified as fishing villages, rather than leaving it blank.

6) My understanding of the choice of collinearity analysis is that many of these covariates were omitted from the model. If so, consider highlighting those that were used in the model by putting them in bold.

Table S2

This is not referenced in the main text. Furthermore the relevance of these interactions and how to interpret them should be discussed, currently it is not clear why they were considered in the first place or what they mean.

Captions and legends

Check these carefully as some are not correct e.g. for Figure S7 it mentions panels a-d, but only two graphs are presented.

Reviewer #3: (No Response)

--------------------

Conclusions

-Are the conclusions supported by the data presented?

-Are the limitations of analysis clearly described?

-Do the authors discuss how these data can be helpful to advance our understanding of the topic under study?

-Is public health relevance addressed?

Reviewer #1: L320 – 322 ‘Overall, our results suggest that…’

The final part of this sentence is a bit repetitive of the previous and doesn’t really add much. How can correlates with the broad hotspot scale help the GWEP?

L383 – 402 ‘In addition to our village level analyses, the identification of spatial hotspots…’

I think this paragraph discussing the hotspot analysis could be improved. While the points raised are interesting, I think more needs to be said about the variables that have been identified to characterise a hotspot and how these can be used by the GWEP. The model suggests that hotspots might be predicted/characterised by the mean annual precipitation, and the correlation of this variable with a number of other variables is worth discussion. There were also interactions that were identified, however, as I mentioned earlier, many of these are with variables in the same collinear cluster and will not be present once removed from the model.

Why was the hotspot analysis not separated into N/S as with the village level analysis?

L417 – 419 ‘…suggesting that they served as an effective control.’

I initially understood this to mean effective control for GW, rather than biases in the sampling of infections. I would suggest rewording this sentence to make it clear that you are referring to sampling bias.

Reviewer #3: In relation to the take-home messages in the Abstract, please see my comments above and below.

More specifically, given that health-care worker visitation was a key factor in detection of infections, I would not consider this an "intrinsic" (or demographic, as in the previous paragraph) characteristic of the village.

Climate variables are somewhat helpful as is elevation, but human population size and health worker visitation rate are the main contributors to the models' explanatory power. I think that when it comes to the models being "highly predictive" a little more care should be taken to relate the predictability to which factors are epidemiologically informative. There remain some assertive phrases, e.g. line 320-1, about how this might be used in targeting control efforts that I am not sure are strongly supported.

--------------------

Editorial and Data Presentation Modifications?

Use this section for editorial suggestions as well as relatively minor modifications of existing data that would enhance clarity. If the only modifications needed are minor and/or editorial, you may wish to recommend “Minor Revision” or “Accept”.

Reviewer #1: (No Response)

Reviewer #3: Thanks to the authors for having dealt with, or acknowledged and responded positively too, many of the substantive concerns in my initial review.

Nevertheless, I find I have a large number of presentational and editorial points. I think the paper would still be considerably better if these were addressed. I have tried not to be too particular, but some of these are quite important points. These primarily relate to the points raised above about being a bit more conservative in relation to the scope and outcomes of the study.

Title

Drivers: Given all the caveats that are now more thoroughly explored, it is probably not justified to use "Drivers" in the title. It might be better to use "Correlates of" or "Patterns in"

Infection in dogs: The analyses would better be a description of "villages in which dog infections are recorded" or some more accurate way of conveying the subject of the analysis, which is not of dog infections, but of recording detection of infections at a village level over an extended period.

Abstract

Guinea in Guinea worm is usually capitalised, as it derives from the country. (e.g. Line 38).

Overall, I found the wording of the first Background paragraph of the Abstract to be problematic and imprecise. For example: It is not necessarily the case that the epidemiology changed, or that this happened "at the same time". Why is 2012 selected? Domestic dogs were observed to be hosts to Guinea worm multiple times and in multiple countries before 2012 and outside of Chad, and before they were found in Chad. It is not the dogs that challenge efforts exactly, they are indifferent, but the presence of infection in a non-human host creates a challenge for those charged with eradication. See my point above about "driving" and drivers. Although these would be good to know, this study does not, in the end, reveal them. I am sorry to be picky about these finer points, but attention to this sort of detail will help with the tone of the rest of the paper.

Might it be better throughout to talk about the "detection of infection" rather than "infection" per se? (e.g. Line 47)

Similarly, it is not "Guinea worm parasite presence" but "detection of Guinea worm infection in a village" (Line 54).

I would suggest that n healthcare visitors is not "intrinsic to the village", since it is increased or decreased by the eradication program. (Line 54)

I would also suggest that in terms of "landscape scale ecology" the insight is limited, since the relevant factors were not especially ecological. But this is a subjective point. (Line 56)

--------------------

Summary and General Comments

Use this section to provide overall comments, discuss strengths/weaknesses of the study, novelty, significance, general execution and scholarship. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. If requesting major revision, please articulate the new experiments that are needed.

Reviewer #1: The authors have made appropriate responses to comments on the previous version of this manuscript, and many of the issues and concerns have been addressed. Overall, this is a well written manuscript and this body of work has the potential to significantly contribute to our understanding of a zoonotic disease that is on the brink of eradication. However, the methodology and results newly presented by the authors still raise considerable questions, and there are several areas for clarification/improvements that need to be addressed before the manuscript can be accepted for publication:

(1) I have several concerns about the hotspot analysis. While figure 1 is much improved and helps the reader see the results of the spatial scan analysis, it is surprising to see that there is considerable overlap of hotspots. This suggests to me that there is an error as I would only expect overlaps if the data were temporal in nature. This either needs rectifying or requires some explanation in the methods.

(2) The authors have appropriately adapted their analysis to consider collinearity however, there are some discrepancies that could require a re-run of the analysis. ‘Co-linear clusters were reduced to the single most central variable’, implying that the non-central variables were removed from the model, but this does not seem to be the case. This needs clarification as to if a different method was used or, alternatively, the models will need to be corrected and run again.

(3) Interactions are included in models, but they are overlooked in the results and discussion. The interactions between variables will be important for the interpretation of the results and should be given more attention. Currently it is not clear why they are included in the model or how they should be interpreted.

Reviewer #3: As indicated above, I feel that although the response document acknowledge much of the earlier critique, the revisions have not sufficiently addressed the issue of scope and ambition of the analyses. In essence, this is an analysis of correlates of the village-level detection of infection over an extended period. It is not an analysis of the drivers of dog infections. Some of this change of tone is in the main text, but the Title, Abstract and Author Summary remain too assertive.

--------------------

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Jared Wilson-Aggarwal

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see https://journals.plos.org/plosntds/s/submission-guidelines#loc-methods

PLoS Negl Trop Dis. doi: 10.1371/journal.pntd.0008620.r005

Decision Letter 2

Jeremiah M Ngondi, Banchob Sripa

26 May 2020

Dear Mr. Richards,

Thank you very much for submitting your manuscript "Identifying correlates of Guinea worm (Dracunculus medinensis) infection in domestic dog populations" for consideration at PLOS Neglected Tropical Diseases. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.  

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. 

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Jeremiah M. Ngondi, MB.ChB, MPhil, MFPH, Ph.D

Associate Editor

PLOS Neglected Tropical Diseases

Banchob Sripa

Deputy Editor

PLOS Neglected Tropical Diseases

***********************

Editorial comments: Data Availability

As per PLOS data policy (https://journals.plos.org/plosntds/s/data-availability) the reviewers have raised concerns with the data availability statement that you have provided. I have noted that there is a recent publication on a similar subject and locations from Chad https://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0008170 that has made the datasets publicly available as per the PLOS data policy. Notably, Dr. Tchonfienet Moundai is a co-author on this recent paper. I therefore suggest that you consult Dr. Tchonfienet Moundai to cross-check if the minimum data sets for this paper can be made available to the public as per the PLOS data policy.

Reviewer's Comments

Editorial and Data Presentation Modifications?

Use this section for editorial suggestions as well as relatively minor modifications of existing data that would enhance clarity. If the only modifications needed are minor and/or editorial, you may wish to recommend “Minor Revision” or “Accept”.

Reviewer #1: (No Response)

Reviewer #3: For the avoidance of doubt, none of these points is substantive and all are easily dealt with, with no further recourse to the reviewers.

L77 - This suggests that eradication might have been achieved in the 1980s, whereas the target of eradication was set in the 1980s.

L84 - I would say that it is not so much the annual occurrence as the persistence of human cases.

L85 - Should you add the species name for dogs?

L91-3 - Recent work has hypothesised this pathway, and identified some contributing elements, without demonstrating it.

L138-9 and L259-50 - Clarify village-level prevalence, as this could still be misconstrued as the prevalence among dogs.

L139 and L250 - Clarify, this is dog infections and whether this is separate infections, or dogs with worms, as many dogs have >1 worm infection.

L322 - "Canine Guinea worm" implies that it is a dog parasite, better to say Guinea worm infection in dogs

L357 - It is not higher risk of Guinea worm infection in dogs, but greater likelihood of detecting dog infections in villages.

L439 - Clarify that there is no transmission between dogs and humans, as such.

--------------------

Summary and General Comments

Use this section to provide overall comments, discuss strengths/weaknesses of the study, novelty, significance, general execution and scholarship. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. If requesting major revision, please articulate the new experiments that are needed.

Reviewer #1: The authors have made appropriate changes to the manuscript and provided clarification on the methods used. I have no further suggestions or queries and think this manuscript is acceptable for publication. I would like to congratulate the authors on producing a well written manuscript with interesting results that will help guide future research towards the control of Guinea worm disease.

Reviewer #3: The authors have tackled my last set of suggestions and recommendations in good part,

Some minor points that are definitely optional but which might improve understanding and clarity are in the box above. None of these requires any further review or assessment.

One easy point that does require correction, at L326, and which was raised previously is that it is not elevation that was found to be influential, but standard deviation in elevation.

I have no other concerns.

--------------------

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Jared Wilson-Aggarwal

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see http://journals.plos.org/plosntds/s/submission-guidelines#loc-materials-and-methods

PLoS Negl Trop Dis. doi: 10.1371/journal.pntd.0008620.r007

Decision Letter 3

Jeremiah M Ngondi, Banchob Sripa

20 Jul 2020

Dear Mr. Richards,

We are pleased to inform you that your manuscript 'Identifying correlates of Guinea worm (Dracunculus medinensis) infection in domestic dog populations' has been provisionally accepted for publication in PLOS Neglected Tropical Diseases.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Neglected Tropical Diseases.

Best regards,

Jeremiah M. Ngondi, MB.ChB, MPhil, MFPH, Ph.D

Associate Editor

PLOS Neglected Tropical Diseases

Banchob Sripa

Deputy Editor

PLOS Neglected Tropical Diseases

***********************************************************

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 16.0px; font: 14.0px Arial; color: #323333; -webkit-text-stroke: #323333}span.s1 {font-kerning: none

PLoS Negl Trop Dis. doi: 10.1371/journal.pntd.0008620.r008

Acceptance letter

Jeremiah M Ngondi, Banchob Sripa

3 Sep 2020

Dear Mr. Richards,

We are delighted to inform you that your manuscript, "Identifying correlates of Guinea worm (Dracunculus medinensis) infection in domestic dog populations," has been formally accepted for publication in PLOS Neglected Tropical Diseases.

We have now passed your article onto the PLOS Production Department who will complete the rest of the publication process. All authors will receive a confirmation email upon publication.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any scientific or type-setting errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Note: Proofs for Front Matter articles (Editorial, Viewpoint, Symposium, Review, etc...) are generated on a different schedule and may not be made available as quickly.

Soon after your final files are uploaded, the early version of your manuscript will be published online unless you opted out of this process. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Neglected Tropical Diseases.

Best regards,

Shaden Kamhawi

co-Editor-in-Chief

PLOS Neglected Tropical Diseases

Paul Brindley

co-Editor-in-Chief

PLOS Neglected Tropical Diseases

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Model covariates.

    A list of the source, resolution, and means (+/- SE), where appropriate, for all variables included in the boosted regression tree models. Climatic, surface water, and floodwater summaries were calculated from monthly minimum, maximum, mean, and total values using the protocol of [28]. Climatic variables in the main text use WorldClim 2.0 data while additional present climate analyses were trained on present climate data with a coarser spatial grain. Variables included after co-linearity reduction are indicated in bold.

    (PDF)

    S2 Table. Covariate clusters in primary analyses.

    This table reports the identity of covariates in each of two identified co-linear clusters. The central variable in each cluster, with the highest mean correlation with all other variables in the cluster, used to represent the cluster in analyses, appears in bold;the Pearson correlation between each covariate and that central variable is also reported.

    (PDF)

    S3 Table. Interaction strength in village-level model.

    This table reports the interaction strength of top ranked pair-wise interactions in the boosted regression tree model for infection presence at the village scale.

    (PDF)

    S4 Table. Interaction strength in hotspot-level model.

    This table reports the interaction strength of top ranked pair-wise interactions in the boosted regression tree model for hotspot identity.

    (PDF)

    S5 Table. Covariate clusters in present climate analyses.

    This table reports the identity of covariates in each of two identified co-linear clusters. The central variable in each cluster, with the highest mean correlation with all other variables in the cluster, used to represent the cluster in analyses, appears in bold; the Pearson correlation between each covariate and that central variable is also reported.

    (PDF)

    S1 Fig. Northern partial dependence plots.

    This figure depicts partial dependence plots showing the effect of (a) identity as a fishing village, (b) dog population, (c) number of healthcare supervisor visits, and (d) standard deviation in elevation on probability of parasite presence in northern villages. Histograms represent the distribution of values for these covariates amongst all training villages.

    (PDF)

    S2 Fig. Southern partial dependence plots.

    Partial dependence plots showing the effect of (a) Cluster 1 (represented by annual precipitation [Bioclim12]), (b) number of healthcare supervisor visits, (c) dog population, and (d) temperature of the driest quarter on probability of parasite presence in southern villages. Histograms represent the distribution of values for these covariates amongst all training villages.

    (PDF)

    S3 Fig. Hotspot relative importance estimates in northern and southern villages.

    This figure depicts relative importance of covariates in predicting hotspot identity in (a) northern and (b) southern villages.

    (PDF)

    S4 Fig. Village-level strongest interaction partial dependence plot.

    This figure depicts the strongest pair-wise interaction in the boosted regression tree model for parasite presence, between remotely sensed human population (x-axis) and fishing village identity. The dotted line represents villages designated as fishing villages and the solid line those not designated as fishing villages.

    (PDF)

    S5 Fig. Hotspot-level strongest interaction partial dependence plot.

    This figure depicts the strongest pair-wise interaction in the boosted regression tree model for village hotspot identity, between cluster 1, represented by annual precipitation (Bioclim 12), and mean temperature of the coldest quarter (Bioclim 11). In this three-dimensional plot the vertical z-axis represents probability of hotspot identity.

    (PDF)

    S6 Fig. Present climate relative importance estimates.

    This figure depicts relative importance of covariates in predicting (a) D. medinensis presence and (b) hotspot identity. Unlike the main analysis, here models are trained on present climate data with a coarser spatial grain than WorldClim data.

    (PDF)

    S7 Fig. Present climate presence-absence partial dependence plot.

    This figure depicts partial dependence plots showing the effect of (a) dog population, (b) number of healthcare supervisor visits, (c) fishing village identity, and (d) standard deviation in elevation on probability of parasite presence in all villages when present climate estimates are used rather than WorldClim variables. Histograms represent the distribution of values for these covariates amongst all training villages.

    (PDF)

    S8 Fig. Present climate hotspot partial dependence plot.

    Partial dependence plots showing the effect of (a) number of healthcare supervisor visits, (b) gridded population density, (c) precipitation of the warmest quarter, and (d) standard deviation in elevation on probability of a village being part of a spatial infection hotspot. Histograms represent the distribution of values for these covariates amongst all training villages.

    (PDF)

    S1 Code. Sample analysis and visualization code.

    This code displays data-sourcing and processing and conducts analyses and visualizations for this paper. Code is an.Rmd file for use with the knitr and rmarkdown packages in the RStudio software.

    (RMD)

    S1 Data. Training village presence-absence data.

    Dataset of villages under surveillance. Village data include the five year case count, presence over the five years, and all covariates detailed in S1 Data are split between S15 and S16 to include the training/testing split for BRT modeling.

    (CSV)

    S2 Data. Testing Village presence-absence data.

    Dataset of villages under surveillance. Village data include the five year case count, presence over the five years, and all covariates detailed in S1 Data are split between S15 and S16 to include the training/testing split for BRT modeling.

    (CSV)

    S3 Data. Training village hotspot data.

    Dataset of villages under surveillance which were able to be scored as within or outside of a spatial hotspot. These data are included independently of S16 and S17 to allow replication of hotspot analyses both with and without employing the SatScan software. Village data include the five year case count, presence over the five years, presence in a hotspot,and all covariates detailed in S1 Data are split between S17 and S18 to include the training/testing split for BRT modeling.

    (CSV)

    S4 Data. Testing village hotspot data.

    Dataset of villages under surveillance which were able to be scored as within or outside of a spatial hotspot. These data are included independently of S16 and S17 to allow replication of hotspot analyses both with and without employing the SatScan software. Village data include the five year case count, presence over the five years, presence in a hotspot,and all covariates detailed in S1 Data are split between S17 and S18 to include the training/testing split for BRT modeling.

    (CSV)

    Attachment

    Submitted filename: Response to reviewers_V3.docx

    Attachment

    Submitted filename: GW BRT Second Response To Reviewers.docx

    Attachment

    Submitted filename: GW BRT Third Response to reviewers.docx

    Data Availability Statement

    Data and code required to reproduce analyses are included in the Supporting Information files.


    Articles from PLoS Neglected Tropical Diseases are provided here courtesy of PLOS

    RESOURCES