Graphical abstract
Keywords: Soil-transmitted helminths, Water, Sanitation and hygiene, Risk factors, Logistic regression, Recursive partitioning, Bayesian networks
Highlights
-
•
We compared logistic regression, recursive partitioning and Bayesian networks to identify risk factors for STH infection.
-
•
Logistic regression identified fewest variables associated with STH infections compared with the two alternative methods.
-
•
Recursive partitioning identified more demographic and WASH variables, and Bayesian networks more environmental variables.
-
•
Model performance was similar across all three statistical techniques.
-
•
Recursive partitioning can identify at-risk population subgroups, while Bayesian networks can run real-time scenarios.
Abstract
Soil-transmitted helminths (STHs) are parasitic intestinal worms that infect almost a fifth of the global population. Sustainable control of STHs requires understanding the complex interaction of factors contributing to transmission. Identifying risk factors has mainly relied on logistic regression models where the underlying assumption of independence between variables is not always satisfied. Previously demonstrated risk factors including water, sanitation and hygiene (WASH) access and behaviours, and socioeconomic status are intrinsically linked. Similarly, environmental factors including climate, soil and land attributes are often strongly correlated. Alternative methods such as recursive partitioning and Bayesian networks can handle correlated variables, but there are no published studies comparing these methods with logistic regression in the context of STH risk factor analysis. Baseline cross-sectional data from school-aged children in the (S)WASH-D for Worms study were used to compare risk factors identified from modelling the same data using three different statistical techniques. Outcomes of interest were infection with Ascaris spp. and any hookworm species (Necator americanus, Ancylostoma duodenale, and Ancylostoma ceylanicum). Mixed-effects logistic regression identified the fewest risk factors. Recursive partitioning identified the most WASH and demographic risk factors, while Bayesian networks identified the most environmental risk factors. Recursive partitioning produced classification trees that visualised potentially at-risk population sub-groups. Bayesian networks helped visualise relationships between variables and enabled interactive modelling of outcomes based on different scenarios for the predictor variables of interest. Model performance was similar across all techniques. Risk factors identified across all techniques were vegetation for Ascaris spp., and cleaning oneself with water after defecating for hookworm. This study adds to the limited body of evidence exploring alternative data modelling approaches in identifying risk factors for STH infections. Our findings suggest these approaches can provide novel insights for more robust interpretation.
1. Introduction
Soil-transmitted helminths (STHs) are parasitic worms that include hookworms (Necator americanus, Ancylostoma duodenale, and Anycylostoma ceylanicum), roundworm (Ascaris lumbricoides), whipworm (Trichuris trichiura) and threadworm (Strongyloides stercoralis). Almost a fifth of the global population is infected with STHs, resulting in a burden of 3.8 million disability-adjusted life years (Kyu et al., 2018, World Health Organization and UNICEF, 2015). Long-term health consequences of STH infections in children include impaired growth and cognition, as well as malnutrition and iron-deficiency anaemia (Bethony et al., 2006).
STH transmission occurs due to contamination of soil, water sources and fresh produce with STH eggs that are released in faeces of infected individuals (WHO, 2020 https://www.who.int/news-room/fact-sheets/detail/soil-transmitted-helminth-infections).
Since STHs spend part of their life cycle in soil, environmental factors also play an important role in transmission. Previous studies have identified associations between STH infections and water, sanitation and hygiene (WASH), including both access and behaviours (Campbell et al., 2017, Freeman et al., 2017). Environmental factors such as vegetation, precipitation, temperature, elevation, land cover and soil attributes have also been shown to be associated with STH infections (Campbell et al., 2017, Wardell et al., 2017). However, specific WASH and environmental risk factors are not consistently identified across studies, even those conducted within similar contexts, impacting the robustness of evidence (Campbell et al., 2017, Freeman et al., 2017, Wardell et al., 2017).
Attempts to identify risk factors for STHs have been centred around the approach of logistic regression (LR) (Hosmer Jr et al., 2013). However, LR relies on the assumption of independence between predictor variables. This proves difficult in the STH context as many predictors are intrinsically associated, such as socioeconomic status (SES) and access to adequate WASH (Spratt et al., 2013, Ranganathan et al., 2017). Additionally, the assumption of independence between variables means potential causality cannot be explored (Nguefack-Tsague, 2011).
Novel statistical approaches enable data to be handled differently, potentially producing novel insights into risk factors. In health research, methods such as recursive partitioning (RP) and Bayesian networks (BNs) have mostly been applied in genomics and chronic disease contexts (Lemon et al., 2003, Needham et al., 2007). RP is a non-parametric approach that allows consideration of correlated data and is an attractive method for identifying at-risk population sub-groups (Lemon et al., 2003, Spratt et al., 2013, Gass et al., 2014). Only one study has utilised RP to explore the associations between WASH and STHs (Gass et al., 2014). More recently, studies have utilised RP to examine risk factors for other neglected tropical diseases (NTDs) including schistosomiasis and dengue fever (Gazzinelli et al., 2017, Ong et al., 2018). Few studies have compared RP with LR and no studies have done so in the context of STH or NTD risk factors (Lemon et al., 2003). BNs provide the potential to explore causality, while the other approaches do not (Nguefack-Tsague, 2011, Needham et al., 2007). While a small number of studies have explored BNs in the context of infectious diseases such as leptospirosis (Lau et al., 2017, Mayfield et al., 2018), none have focused on STH infections and none have compared findings with LR.
Sustainable control of STHs requires appropriate risk factor identification, so that appropriate community-based interventions can be developed (Montresor, 2012, World Health Organization, 2017). RP and BNs are promising alternatives to LR and the comparison between these techniques in the context of WASH and environmental factors for STH infections warrants investigation.
This study aimed to compare three different statistical approaches (LR, RP and BNs) in determining risk factors for STH infections. The specific objectives of the study were:
-
(i)
Identify risk factors for STH infections using LR, RP and BNs.
-
(ii)
Compare similarities and differences in the types of risk factors identified.
-
(iii)
Qualitatively evaluate each technique to explain relationships between risk factors.
2. Materials and methods
2.1. WASH and STH infection data sources
This study is a secondary analysis of baseline data from the (S)WASH-D for Worms pilot study (Clarke et al., 2016, Clarke et al., 2018). The study took place in six primary schools in Aileu and Manufahi municipalities in Timor-Leste, a country where STHs are endemic (Martins and McMinn, 2012, Campbell et al., 2016), and where access to improved water and sanitation is poor (WHO, 2015; United Nations Development Program, 2018). Participants were children in grades 1–6 who attended those schools and had written informed consent from their parents or guardians. Study participants came from a total of 17 communities including the communities in which the schools were located, and neighbouring communities (Supplementary Table S1).
Data were collected in the form of questionnaires (conducted as interviews) and stool samples between April and June 2015 (Clarke et al., 2016, Clarke et al., 2018). Questionnaires were completed by both participants and their caregivers. Children responded to questions on personal WASH behaviours including handwashing, defecation and shoe-wearing practises. Caregivers answered questions on household water access, storage and treatment, and socio-economic characteristics. Additional questionnaires for the school principals asked questions about the schools’ sanitation facilities. SES was derived from principal component analysis of household income, animal ownership, house construction (wall and floor type), appliance ownership and vehicle ownership (Filmer and Pritchett, 2001, Campbell et al., 2016). Based on the number of eigenvalues above 1, four principal components were identified to produce a wealth score that was categorised into quintiles from 1 (poorest) to 5 (wealthiest) (Campbell et al., 2016). A full list of variables is provided in Supplementary Table S2.
Stool samples were collected from study participants at schools and sent to the QIMR Berghofer Medical Research Institute in Brisbane, Australia, for diagnostic analysis using real-time multiplex quantitative PCR (qPCR) to detect and quantify STH species (Inpankaew et al., 2014, Llewellyn et al., 2016). The primary outcomes assessed in this study were Ascaris spp. infection and any hookworm infection (N. americanus and Ancylostoma spp.).
2.2. Environmental data and processing
Environmental data were obtained at the community level. For the six communities in which study schools were located, we used GPS coordinates of the schools. These were collected during the field study. For the remaining 11 communities, we used coordinates representing the geographic centroid of the community (Statistics Timor-Leste, 2019a, Statistics Timor-Leste, 2019b). To ensure coverage of most households within each community, a 1 km buffer around each community coordinate was used when extracting environmental data. This buffer size was chosen with consideration of the sizes of communities in Timor-Leste and aerial maps were utilised to ensure that this buffer included a majority of households in each community.
Environmental data including climate, soil and land attributes were sourced from publicly available databases summarised in Table 1 (Loveland and Belward, 1997, Garcia et al., 1978, Belward et al., 1999; (Land Processes Distributed Active Archive Center and (LP DAAC), 2000, Weier and Herring, 2000, Hijmans et al., 2005, Didan et al., 2015, Wardell et al., 2017). Data were processed using ArcMap version 10.7 (Esri, Redlands, CA, USA). Slope was calculated from elevation data using the slope (spatial analyst) tool in ArcMap. Soil categorisations were determined by visual inspection of colour categorised maps. All other environmental data were processed using the zonal statistics (spatial analyst) tool in ArcMap. For land cover, the most common raster cell value within the buffer was used to classify each community. For temperature, precipitation, vegetation indexes (enhanced vegetation index (EVI) and normalised difference vegetation index (NDVI)), elevation and slope, the mean cell raster value was extracted. A range of temperature and precipitation variables were produced by calculating variations of monthly climate data, detailed in Supplementary Table S2 (Wardell et al., 2017).
Table 1.
Environmental data type | Source | Temporal resolution | Spatial resolution |
---|---|---|---|
Temperature (°C) | WorldClima | Monthly average precipitation from 1970 – 2000 | 1000 m |
Maximum | |||
Minimum | |||
Mean | |||
Precipitation (cm) | WorldClima | Monthly average precipitation from 1970 − 2000 | 1000 m |
Maximum | |||
Minimum | |||
Mean | |||
Elevation per 100 m | ASTER GDEMb | 30 January 2000–30 November 2013 | 30 m |
Vegetation | |||
Terra and Aqua MODISb | 1 April 2015–30 June 2015 | 250 m | |
Enhanced vegetation index (EVI) | |||
Normalised difference vegetation index (NDVI) | |||
Soil pH | Os Solos De Timorc | 1960s | N/A |
Soil Texture | |||
Land cover | MODISb | 2015 | 250 m |
ASTER, Advanced Spaceborne Thermal Emission and Reflection Radiometer; GDEM, Global Digital Elevation Model version 3; MODIS, Moderate Resolution Imaging Spectroradiometer.
Worldclim version 2.0 is a database providing monthly averaged climate data for mean, minimum and maximum precipitation and temperature.
Elevation, vegetation and land cover data were sourced from the NASA EOSDIS Land Processes Distributed Active Archive Centre database (LP DAAC) using the Application for Extracting and Exploring Analysis Ready Samples (AppEEARS). We used data from the ASTER GDEM and Terra and Aqua MODIS satellites.
Soil texture and pH data were derived from ‘Os Solos de Timor’, an extensive soil study of Timor-Leste conducted in the 1960s.
2.3. Statistical analysis
2.3.1. Descriptive statistics
Descriptive analysis was conducted using Stata version 15 (College Station, TX, USA). Point prevalence and 95% confidence intervals (CIs) were calculated for all categorical variables using the proportion function in Stata. For continuous variables, mean values, S.D.s) and their 95% CIs were calculated using the mean function in Stata.
2.3.2. Logistic regression
Generalised linear mixed models were constructed with school and community as nested random effects (communities nested within schools), and age and sex as fixed effects using the meglm function in Stata. Bernoulli logistic regression was used to produce an odds ratio for each variable. First, univariable analysis was conducted and variables retained if P < 0.2 (Aw et al., 2019, Vaz Nery et al., 2019). A two-stage approach was then used to build multivariable models (Aw et al., 2019, Vaz Nery et al., 2019). Retained variables were grouped into “within-domain” multivariable models adjusted for age and sex, for each of the following domains: demographic, individual hygiene, individual sanitation, school sanitation, household sanitation, household water and household socioeconomic variables. Prior to finalising within-domain models, a collinearity check was conducted using the collin function in Stata. Collinear variables were removed individually from within-domain models and tested for Akaike Information Criterion (AIC), with the lowest AIC determining which variable to remove. This process was repeated until all variables had a Variance Inflation Factor (VIF) < 5.0. Variables with P < 0.1 in “within-domain” models were retained for a full multivariable model adjusting for age and sex. Backward stepwise regression was used to produce the final multivariable model including only age, sex and variables with P < 0.05.
To examine model performance, data were first divided into three groups: each of the two largest schools, and all other schools combined. This was done because a much higher infection prevalence was observed in the two largest schools. Using these strata, the data were randomly partitioned 70:30. Coefficients for the variables in the final multivariable models were produced using data from 70% of the participants. Probability of infection was predicted in the remaining 30% of participants using the predict function in Stata. Participants were classified as infected if the predicted probability of infection was greater than 0.5. The area under the curve (AUC) was calculated using the roctab function in Stata.
2.3.3. Recursive partitioning
The RP approach repeatedly partitions data into binary groups where subsequent groups are more likely to have the same outcome response (Strobl et al., 2009). The algorithm allows for different patterns after each partition. RP classification trees were produced for each outcome using the rpart package version 4.1.15 in R version 3.6.3.
Data were partitioned in the same way as for LR. Classification tree models were built using data from 70% of participants. The RP models underwent 10 cross-validations (on the 70% of data) which were averaged to produce complexity parameter (CP) values with associated cross-validation errors (x-errors). Over-fitting is a common problem with RP models and to address this, each classification tree was pruned (Gass et al., 2014). Typically, the final number of nodes (independent variables) of a tree is based on the CP value with the smallest x-error, where adding an additional node will no longer provide better classification (Therneau and Atkinson, 1997). In instances where the smallest x-error corresponded to a CP value for one or two nodes, the model was “lightly” pruned by the next smallest x-error that corresponded to a meaningful number of nodes. R does not allow “light” pruning if there is no substantial difference in performance compared with the unpruned model (Therneau and Atkinson, 1997). The remaining 30% of data were used to predict whether participants would be classified as infected or not. The AUC was calculated using the pROC package in R Studio.
2.3.4. Bayesian networks
BNs consist of two main components: (i) directed acrylic graphs which contain nodes representing variables and arrows between nodes defining dependency; and (ii) node probability tables (NPTs) (Fenton and Neil, 2012). A child node is one that is conditionally dependent on one or more parent nodes. A conditional probability table (CPT) is the NPT of a child node. For feasible interpretation of CPTs, all continuous variables were discretised into three categories where data could be grouped in even intervals with sufficient sample size in each category (Supplementary Table S2) (Netica Software Corp., 2020). Since temperature and precipitation variables were variations of the same averaged monthly data, only mean annual temperature, mean monthly precipitation and one other temperature and precipitation variable (selected based on highest variance reduction) were included. For Ascaris spp., precipitation in the wettest quarter was removed as most of its variance reduction was explained by the school variable. For consistency, data were categorised in the same way described for LR.
Firstly, naïve networks were built to examine the independent relationship between each variable and the outcome. Variance reduction of all variables was ranked to identify the most influential variables, which would be included in the final model. Variables with less than 0.5% variance reduction were automatically removed from the model. Next, the variable contributing the next lowest variance reduction was removed, and the model was retrained using data from 70% of participants and tested with the remaining 30%, with an AUC calculated. Variables were iteratively removed, model re-trained and tested until there was a noticeable decrease in AUC performance, at which point only school, age and sex variables remained for consistency with LR. The final combination of variables was selected based on the iteration with the highest AUC.
The final variables retained in the respective naïve models for Ascaris spp. and hookworm were used to learn tree augmented naïve (TAN) networks. TAN networks automatically model relationships between variables with each variable having at most one parent node in addition to the target node (Friedman et al., 1997). Scenario analyses were conducted on each TAN model where the probability of a particular response in variables of interest (WASH access, behaviours and SES) was set to 100%. Estimated infection prevalence was then compared with the infection prevalence in the prior state.
Expert structured models were built based on existing literature on risk factors for Ascaris spp. and hookworm (Strunz et al., 2014, Campbell et al., 2017, Freeman et al., 2017, Wardell et al., 2017). Due to limited examples in the data to sufficiently represent combinations of responses, a limit of four parent nodes were linked to each outcome. Variables with the lowest variance reduction were removed (e.g. handwashing after food contact and household water source).
To examine model performance, 100 trials were conducted on all networks, training each on a new random sample of 70% of data and testing on a new random sample of 30% data. BN analysis was conducted on Netica (Norsys Software Corp, Vancouver, BC, Canada).
For all models across all techniques, sensitivity and specificity were also calculated. By adding sensitivity and specificity and subtracting the value of 1, a true skill statistic (TSS) was calculated where values greater than zero indicate that model performance was better than random classification (Somodi et al., 2017).
3. Results
The study population consisted of 464 participants who completed a questionnaire and provided a stool sample. Participant characteristics are summarised in Supplementary Table S3. Overall Ascaris spp. prevalence was 39.2% (95% CI: 15.1–70.1) and hookworm prevalence was 14.7% (95% CI 0.4–39.2). Nearly all hookworm infections (94%) were caused by N. americanus. The mean participant age was 9 years (S.D. 2.5) and 49.4% (95% CI: 44.3–54.4) were male. Over two-thirds of participants (67.6%, 95% CI: 42.8–85.4) reported having a household latrine; however, 57.8% (95% CI: 48.7–66.3) of participants practised open defecation. A protected household water source using the WHO/UNICEF Joint Monitoring Program definition (WHO, 2015) was reported by 60.9% (95% CI: 40.7–77.9) of participants. Hand hygiene behaviour was variable with 33.8% (95% CI: 25.2–43.8) of participants reporting handwashing before food contact and 56.0% (95% CI: 39.2–71.6) reporting handwashing after defecating.
3.1. Environmental characteristics
Environmental characteristics are summarised in Supplementary Table S4. Average monthly precipitation ranged from 103 to 105 mm while annual mean temperature ranged from 18.3 to 26.3 °C. NDVI and EVI generally indicated tropical vegetation. Savanna was the most predominant land cover type. Elevation between communities varied from sea level to 1515 metres. Acidic soil type (pH 5.5–6.49) was most common.
3.2. Logistic regression
Various WASH, demographic and environmental variables were significant at P < 0.2 in univariable analyses for Ascaris spp. and hookworm infection (Supplementary Tables S5 and S6). Results of the final multivariable models for Ascaris spp. and hookworm are presented in Table 2, Table 3, respectively.
Table 2.
Covariate | n/meana | % infected/S.D.a | aOR | 95% CI | P value |
---|---|---|---|---|---|
Age (years)b | 9 | 2.5 | 1.02 | 0.93–1.11 | 0.714 |
Sex | |||||
Female | 235 | 41.3 | Ref | ||
Male | 229 | 37.1 | 0.83 | 0.53–1.31 | 0.432 |
School toilet use | |||||
School does not have toilet | 357 | 34.7 | Ref | ||
Does not use school toilet | 85 | 65.9 | 2.35 | 1.27–4.35 | 0.007 |
Uses school toilet | 19 | 5.2 | 0.27 | 0.03–2.52 | 0.248 |
Household water is treated by boiling | |||||
Yes | 385 | 40.5 | 0.28 | 0.12–0.64 | 0.003 |
No | 74 | 29.7 | Ref | ||
Mean temperature in coldest month (°C)c | 14.6 | 3.1 | 0.68 | 0.47–0.83 | <0.001 |
NDVId | 0.75 | 0.05 | 1.10 | 1.01–1.20 | 0.027 |
Random effects variance (95% CI) | |||||
School | 0.61(0.10–3.62) | ||||
Village | <0.001 |
n, sample size; % infected, proportion infected; aOR, adjusted odds ratio; 95% CI, 95% confidence interval; Ref, reference category; NDVI, normalised difference vegetation index. Bold text indicates P < 0.05.
The model includes 456 participants from six schools and 17 communities, with school and community as random effects. Values presented are produced from 100% of the data.
Age, mean temperature in coldest month and NDVI are all continuous variables so the relevant mean and S.D. are presented instead of sample size and % infected, respectively.
aOR for age refers to a 1 year increase in age.
Mean temperature in coldest month refers to the average mean temperature in July from 1970-2000.
NDVI is an index quantifying vegetation and is measured between −1 to 1 with values closer to 1 indicating more vegetation. aOR for NDVI refers to an 0.01 unit increase in NDVI as the variable was transformed by a factor of 100.
Table 3.
Covariate | n/meana | % infected/S.D.a | aOR | 95% CI | P value |
---|---|---|---|---|---|
Age (years)b | 9 | 2.5 | 1.24 | 1.10–1.40 | <0.001 |
Sex | |||||
Female | 235 | 13.2 | Ref | ||
Male | 229 | 16.2 | 1.23 | 0.68–2.22 | 0.324 |
Cleaning oneself with water after defecating | |||||
Yes | 222 | 6.7 | 0.37 | 0.18–0.76 | 0.002 |
No | 235 | 22.1 | Ref | ||
Socioeconomic quintile | |||||
Quintile 1 or 2 (poorest) | 119 | 27.7 | Ref | ||
Quintile 3 | 105 | 7.6 | 0.19 | 0.08–0.47 | <0.001 |
Quintile 4 | 114 | 13.2 | 0.32 | 0.14–0.69 | 0.004 |
Quintile 5 | 126 | 9.5 | 0.42 | 0.17–1.02 | 0.056 |
Random effects variance (95% CI) | |||||
School | 0.76(0.19–3.06) | ||||
Village | <0.0001 |
n, sample size; % infected/S.D., proportion infected or standard deviation; aOR, adjusted odds ratio; 95% CI, 95% confidence interval; Ref, reference category. Bold text indicates P < 0.05.
The model includes 456 participants from six schools and 17 communities, with school and community as random effects. Values presented are produced from 100% of the data.
Age is a continuous variable so mean and S.D. were provided instead of sample size and % infected, respectively.
aOR for age refers to a 1 year increase in age.
For Ascaris spp., odds of infection were significantly increased in children who had access to, but did not use, their school toilet compared with children whose school had no toilet (adjusted odds ratio (aOR) 2.35, 95% CI: 1.27–4.35) (Table 2). Children from households where water was treated by boiling had significantly decreased odds of infection (aOR 0.28, 95% CI: 0.12–0.62). Higher temperature in the coldest month (July) was associated with decreased odds of infection (aOR 0.68, 95% CI: 0.47–0.83), while a higher NDVI or denser vegetation was associated with increased odds of infection (aOR 1.10, 95% CI: 1.01–1.20). In a sensitivity analysis excluding environmental variables, the same non-environmental variables remained significant (Supplementary Table S7).
For hookworm, odds of infection were significantly higher with increasing age. Cleaning oneself with water after defecating was associated with significantly decreased odds of infection (aOR 0.37, 95% CI: 0.18–0.76). Belonging to socioeconomic quintiles 3 or 4 both decreased odds of infection compared with quintile 1 (poorest) or 2. No environmental variables were included in the final model for hookworm.
For both Ascaris spp. and hookworm outcomes, random effects variance indicated some variation at the school level but very little variation at the community level (Table 2, Table 3).
Model diagnostics comparing predicted and actual classification of infection revealed the Ascaris spp. model had a sensitivity of 53.7%, specificity of 83.3%, TSS of 0.370 and AUC of 0.738. The hookworm model had a sensitivity of 15.4%, specificity of 98.2%, TSS of 0.136 and AUC of 0.757.
3.3. Recursive partitioning
Classification trees for Ascaris spp. and hookworm identified a range of demographic WASH and environmental variables as influential (Fig. 1). The final tree for Ascaris spp. was lightly pruned to seven nodes using a CP of 0.17 as complete pruning based on lowest x-error resulted in a tree with two nodes. A total of eight outcomes were identified. The variables in order of conditionality were school, NDVI, household toilet structure, elevation and household water source availability, followed by age and wearing shoes outside the home.
For hookworm, complete pruning resulted in zero nodes so an unpruned final model with five nodes and a CP of 0.01 was used. Six outcomes were identified. The variables in order of conditionality were school, cleaning oneself with water after defecating, household with more than six people, age and handwashing after defecating.
Model diagnostics revealed that the Ascaris spp. model had a sensitivity of 38.7%, specificity of 89.5%, TSS of 0.282 and AUC of 0.760. The hookworm model had a sensitivity of 23.8%, specificity of 90.6%, TSS of 0.144 and AUC of 0.691. A sensitivity analysis on an additional two different partitioned datasets revealed differences in the variables identified but the proportion of environmental, WASH and demographic variables remained relatively consistent (Supplementary Figs. S1, S2).
3.4. Bayesian networks
Naïve, TAN and expert structured networks were constructed for Ascaris spp. and hookworm. In the Ascaris spp. naïve network, a total of 16 independent variables were included, consisting of environmental variables, household water source, school toilet use, school, age and sex (Fig. 2A). A TAN network was constructed based on the same variables, revealing how school was correlated with almost all variables, with the exception of a few environmental variables that were correlated with one another (Fig. 3A).
For hookworm, the initial naïve model with the highest AUC had 0% sensitivity. The variables for annual mean temperature and precipitation in driest month were added back into the model to improve the sensitivity performance, selected based on next highest variance reductions. Additional variables included in the hookworm naïve network were cleaning oneself with water after defecating, SES, school, age and sex (Fig. 2B). The TAN network revealed that all variables were correlated with school (Fig. 3B).
Scenario analyses were conducted on each TAN network to explore how changing the probability of a response category within a variable would impact the predicted outcome (Table 4). In the scenario where all participants had access to and used their school toilet, and all households had an improved water source, a 36.05% decrease in Ascaris spp. infection was predicted. In the scenario where all participants cleaned themselves with water after defecating and all participants had wealth equivalent to those in SES quintile 3, 4, or 5, an 11.70% decrease in hookworm infection was predicted.
Table 4.
Predicted % infection | Change in % infection | |
---|---|---|
Scenarios for Ascaris spp. infection | ||
A: All schools have toilets that are used by participants | 5.19 | −34.01 |
B: All household water sources are tubewell or piped | 31.30 | −7.90 |
C: Scenario A and B combined | 3.15 | −36.05 |
Scenarios for hookworm infection | ||
A: All participants clean themselves with water after defecating | 5.52 | −9.18 |
B: All participants same wealth as participants in socioeconomic quintile 3, 4 and 5 | 9.29 | −5.41 |
C: Scenario A and B combined | 3.00 | −11.70 |
In each scenario the response category under investigation was set to 100%. “Predicted % infection” is the predicted prevalence of Ascaris spp. infection in each scenario. “Change in % infection” refers to the difference between predicted prevalence in each scenario and the existing prevalence in the actual data (Ascaris spp. 39.2%; hookworm 14.7%).
Results of expert structured models are shown in Supplementary Fig. S3. Scenario analyses for expert structured BNs are not presented as they were outperformed by the TAN networks.
Model diagnostics indicated little overall variation in median AUC values across the 100 trials, but a greater range of AUC values between trials was observed for the hookworm TAN model (Fig. 4). Only positive values (excluding outliers) for TSS were observed in all models. Hookworm naïve and TAN models had lower median TSSs compared with Ascaris spp. models. Overall, hookworm models tended to have higher specificity compared with both Ascaris spp. models, and the opposite was observed for sensitivity.
3.5. Comparisons among techniques
Table 5 provides a summary of risk factors identified by each modelling approach. For Ascaris spp., only NDVI was identified across all three statistical techniques. For hookworm, cleaning oneself with water after defecating was identified across all techniques and only BNs identified environmental factors. RP classification trees identified the most demographic and WASH variables. Logistic regression identified the fewest variables overall, with no variables uniquely identified for hookworm.
Table 5.
Predictor | Techniques that identified predictor as a risk factor |
|
---|---|---|
Ascaris spp. | Hookworm | |
School toilet use | LR, BN | |
Household water treated by boiling | LR | |
Household toilet has slab | RP | |
Household water availability | RP | |
Washes hands after defecating | RP | |
Cleans themself with water after defecating | LR, RP, BN | |
Shoe wearing outside home | RP | |
Age | RP | LR, RP |
More than six people in household | RP | |
Socioeconomic status | LR, BN | |
Elevation | RP, BN | |
Household water source | BN | |
Precipitationa | BN | BN |
Slope | BN | |
Soil texture | BN | |
Soil pH | BN | |
Land cover | BN | |
EVI | ||
Temperatureb | LR, BN | BN |
NDVI | LR, RP, BN |
EVI, enhanced vegetation index; NDVI, normalised difference vegetation index.
Risk factors presented for BNs refer to the naïve models which were the same for tree augmented naïve (TAN) models.
Different variations of the precipitation variable were identified by Ascaris spp. BN (mean precipitation in wettest quarter, monthly average precipitation) and for hookworm BN (mean precipitation in driest month).
Different variations of the temperature variable were identified by BN (mean temperature in warmest quarter, annual mean temperature) and LR (mean temperature in coldest month).
4. Discussion
Studies examining WASH and environmental risk factors for STH infections have not consistently identified the same variables associated with infection. Almost all studies have used the standard LR approach. In this study we hypothesised that using alternative statistical techniques may provide further insights and more robust analyses for identifying risk factors. In analysing the same data by LR and the more novel techniques of RP and BNs, our study provided additional insights into risk factors for STH infections.
For Ascaris spp., NDVI (vegetation index) was the only variable consistently identified across all techniques. A previous study from Timor-Leste using LR also found NDVI to be associated with Ascaris spp. infection (Campbell et al., 2017). Higher NDVI suggests denser vegetation, indicative of tropical regions with moist soil that are ideal for survival of STH eggs and larvae (Gordon et al., 2017). For hookworm, only cleaning oneself with water after defecating was consistently identified by all techniques.
For both infection outcomes, fewer influential variables were identified by LR. This may be due to the limitation of LR when handling correlated data (Spratt et al., 2013, Ranganathan et al., 2017). In RP, the model considers which variable is most strongly associated with the outcome, conditional on all the variables selected prior (Gass et al., 2014). While some issues with accuracy of measurement error have been raised in other contexts (Zeger et al., 2000, Tolbert et al., 2007, Strobl et al., 2009), this approach enables all correlated variables in the dataset to be considered. Notably, of the three techniques, RP identified the largest number of WASH variables. BNs can also model correlated variables as seen in our naïve and TAN models that included many environmental variables, whereas in LR, most were removed due to collinearity.
In terms of model performance, median AUC and TSS were similar for all three types of models. Specificity was higher than sensitivity across all models, consistent with a RP analysis of WASH risk factors for STH infections, likely due to the imbalanced data sample having more non-infected than infected participants (Gass et al., 2014). This may also explain why hookworm models tended to have lower sensitivity compared with the respective Ascaris spp. models as there was a higher prevalence of Ascaris spp. infection.
One of the major benefits of LR is its use in multi-level modelling where accounting for clustering is possible. Our findings suggest school level variation was important in our final multivariable LR models. One previous study has demonstrated a method of accounting for clustering in BNs; however, this was limited to naïve networks (Fernández et al., 2014). There are no known methods of accounting for clustering in RP (Gass et al., 2014). To attempt to address this limitation, the school variable was included as a fixed effect in BN and RP models. For RP, the school variable was the first partition in both classification trees, indicating that all other variables identified were conditional on school. It is possible then that some variables may not have been identified if they did not have a conditional relationship with school. In BNs, both TAN models revealed that almost all variables had a relationship with the school variable. This leads to limitations in drawing causal inferences from our BNs which is often an attractive reason for using this method (Joffe et al., 2012).
Both BNs and RP produced graphical representations that provided novel insights beyond identifying risk factors. In RP, the Ascaris spp. and hookworm classification trees depicted eight and six outcome groups respectively. For some outcomes, much higher prevalence of infection was predicted, which may indicate at-risk population sub-groups (Lemon et al., 2003). BNs have the advantage of visualising relationships between variables, and being able to model scenarios in real-time without further analyses (Lau et al., 2017). While odds ratios have the benefit of having effect sizes for predictors which are critical in research processes such as determining drug efficacy (Knol et al., 2012), the novel insights from RP and BNs provide a more informative and holistic evidence base for informing policy.
This study had several important limitations. Given the small sample size, risk factors are not generalisable. Additionally, LR and RP diagnostics were derived from fewer trials than BNs and should therefore be compared cautiously. Directions for future studies include using a larger dataset, increasing trials for model diagnostics and exploring other types of BNs and ensembles of trees produced from RP. A larger sample would enable school-level analysis to overcome the inability to account for clustering in RP and BNs, and may also improve the performance of expert structured BNs by providing more examples to train models.
To the best of our knowledge, this is the first study that used the same dataset to compare LR, BNs and RP in the context of risk factor identification for STH infections. While BNs and RP are used in other areas of public health (Martha Maria et al., 2017, Arora et al., 2019, Wang et al., 2020), such techniques have rarely been utilised in NTD research. An additional strength of this study is the inclusion of a wide of range of variables including environmental, demographic, and WASH access and behaviours. This revealed trends in the risk factors identified by each statistical technique. While these techniques have different approaches and capacities, considerations were also made to keep variables as consistent as possible across techniques.
As research methods continue to evolve, alternative analytical approaches must be explored. As shown in our study, such approaches can reveal novel insights to support more robust evidence-based conclusions. Such evidence may contribute to the development and implementation of tailored interventions to help achieve sustainable STH control.
Acknowledgments
We would like to acknowledge the participants involved in the SWASH-(D) study and their parents, teachers and community leaders; Salvador Amaral and the (S)WASH-D for Worms field team and data clerks for collecting and entering field data; Stacey Llewellyn (QIMR Berghofer Medical Research Institute, Australia) for conducting the qPCR analysis; and Kinley Wangdi and Rebecca Wardell for providing guidance on environmental data processing. Primary data collection that preceded this analysis was funded by a Bill & Melinda Gates Foundation Grand Challenges Explorations Grant, USA (OPP1119041), awarded to SVN https://www.gatesfoundation.org/. The funders had no involvement in study design, collection, analysis, interpretation of data, preparation of manuscript or decision to submit the article for publication.
Footnotes
Supplementary data to this article can be found online at https://doi.org/10.1016/j.ijpara.2021.01.005.
Appendix A. Supplementary data
The following are the Supplementary data to this article:
References
- Arora P., Boyne D., Slater J.J., Gupta A., Brenner D.R., Druzdzel M.J. Bayesian networks for risk prediction using real-world data: a tool for precision medicine. Value Health. 2019;22:439–445. doi: 10.1016/j.jval.2019.01.006. [DOI] [PubMed] [Google Scholar]
- Aw J.Y.H., Clarke N.E., McCarthy J.S., Traub R.J., Amaral S., Huque M.H., Andrews R.M., Gray D.J., Clements A.C.A., Vaz Nery S. Giardia duodenalis infection in the context of a community-based deworming and water, sanitation and hygiene trial in Timor-Leste. Parasit. Vectors. 2019;12:491. doi: 10.1186/s13071-019-3752-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Belward A.S., Estes J.E., Kline K.D. The igbp-dis global 1-km land-cover data set discover: a project overview. Photogrammetric Eng. Remote Sens. 1999;65:1013–1020. [Google Scholar]
- Bethony J., Brooker S., Albonico M., Geiger S.M., Loukas A., Diemert D., Hotez P.J. Soil-transmitted helminth infections: ascariasis, trichuriasis, and hookworm. Lancet. 2006;367:1521–1532. doi: 10.1016/S0140-6736(06)68653-4. [DOI] [PubMed] [Google Scholar]
- Campbell S.J., Nery S.V., D’Este C.A., Gray D.J., McCarthy J.S., Traub R.J., Andrews R.M., Llewellyn S., Vallely A.J., Williams G.M., Amaral S., Clements A.C.A. Water, sanitation and hygiene related risk factors for soil-transmitted helminth and Giardia duodenalis infections in rural communities in Timor-Leste. Int. J. Parasitol. 2016;46:771–779. doi: 10.1016/j.ijpara.2016.07.005. [DOI] [PubMed] [Google Scholar]
- Campbell, S.J., Nery, S.V., Wardell, R., D’Este, C.A., Gray, D.J., McCarthy, J.S., Traub, R.J., Andrews, R.M., Llewellyn, S., Vallely, A.J., Williams, G.M., Clements, A.C.A., 2017. Water, Sanitation and Hygiene (WASH) and environmental risk factors for soil-transmitted helminth intensity of infection in Timor-Leste, using real time PCR. PLoS Neg. Trop. Dis. 11, e0005393. [DOI] [PMC free article] [PubMed]
- Clarke, N.E., Clements, A.C.A., Amaral, S., Richardson, A., McCarthy, J.S., McGown, J., Bryan, S., Gray, D.J., Nery, S.V., 2018. (S)WASH-D for Worms: A pilot study investigating the differential impact of school- versus community-based integrated control programs for soil-transmitted helminths. PLoS Negl. Trop. Dis. 12, e0006389-e0006389. [DOI] [PMC free article] [PubMed]
- Clarke, N.E., Clements, A.C.A., Bryan, S., McGown, J., Gray, D., Nery, S.V., 2016. Investigating the differential impact of school and community-based integrated control programmes for soil-transmitted helminths in Timor-Leste: the (S)WASH-D for Worms pilot study protocol. Pilot Feasibility Stud. 2, 69-69. [DOI] [PMC free article] [PubMed]
- Didan K., Munoz A.M., Solano R.M., Huete A. University of Arizona; Arizona: 2015. MODIS Vegetation Index User’s Guide. [Google Scholar]
- Fenton N., Neil M. CRC Press; Florida: 2012. Risk Assessment and Decision Analysis with Bayesian Networks. [Google Scholar]
- Fernández A., Gámez J.A., Rumí R., Salmerón A. Data clustering using hidden variables in hybrid Bayesian networks. Prog. Artificial Intell. 2014;2:141–152. [Google Scholar]
- Filmer D., Pritchett L.H. Estimating wealth effects without expenditure data–or tears: an application to educational enrollments in states of India. Demography. 2001;38:115–132. doi: 10.1353/dem.2001.0003. [DOI] [PubMed] [Google Scholar]
- Freeman M.C., Garn J.V., Sclar G.D., Boisson S., Medlicott K., Alexander K.T., Penakalapati G., Anderson D., Mahtani A.G., Grimes J.E.T., Rehfuess E.A., Clasen T.F. The impact of sanitation on infectious disease and nutritional status: a systematic review and meta-analysis. Int. J. Hyg. Environ. Health. 2017;220:928–949. doi: 10.1016/j.ijheh.2017.05.007. [DOI] [PubMed] [Google Scholar]
- Friedman N., Geiger D., Goldszmidt M. Bayesian network classifiers. Machine Learn. 1997;29:131–163. [Google Scholar]
- Garcia, J.C., Cardoso, J, C., 1978. Os Solos De Timor. Memórias da junta de investigacoe scientiicas do ultramar. Lisbon, Portugal.
- Gass K., Addiss D.G., Freeman M.C. Exploring the relationship between access to water, sanitation and hygiene and soil-transmitted helminth infection: a demonstration of two recursive partitioning tools. PLoS Neg. Trop. Dis. 2014;8 doi: 10.1371/journal.pntd.0002945. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gazzinelli A., Oliveira-Prado R., Matoso L.F., Veloso B.M., Andrade G., Kloos H., Bethony J.M., Assunção R.M., Correa-Oliveira R. Schistosoma mansoni reinfection: Analysis of risk factors by classification and regression tree (CART) modeling. PloS One. 2017;12 doi: 10.1371/journal.pone.0182197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gordon C.A., Kurscheid J., Jones M.K., Gray D.J., McManus D.P. Soil-transmitted helminths in tropical Australia and Asia. Trop. Med. Infect. Dis. 2017;2:56. doi: 10.3390/tropicalmed2040056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hijmans R.J., Cameron S.E., Parra J.L., Jones P.G., Jarvis A. Very high resolution interpolated climate surfaces for global land areas. Int. J. Climatol. 2005;25:1965–1978. [Google Scholar]
- Hosmer D.W., Jr, Lemeshow S., Sturdivant R.X. John Wiley & Sons; New York: 2013. Applied Logistic Regression. [Google Scholar]
- Inpankaew T., Schär F., Dalsgaard A., Khieu V., Chimnoi W., Chhoun C., Sok D., Marti H., Muth S., Odermatt P., Traub R.J. High prevalence of Ancylostoma ceylanicum hookworm infections in humans, Cambodia, 2012. Emerg. Infect. Dis. 2014;20:976–982. doi: 10.3201/eid2006.131770. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Joffe M., Gambhir M., Chadeau-Hyam M., Vineis P. Causal diagrams in systems epidemiology. Emerg. Themes Epidemiol. 2012;9:1. doi: 10.1186/1742-7622-9-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Knol M.J., Knol M.J., Algra A., Groenwold R.H.H. How to deal with measures of association: a short guide for the clinician. Cerebrovasc. Dis. 2012;33:98–103. doi: 10.1159/000334180. [DOI] [PubMed] [Google Scholar]
- Kyu H.H., Abate D., Abate K.H., Abay S.M., Abbafati C., Abbasi N. Global, regional, and national disability-adjusted life-years (DALYs) for 359 diseases and injuries and healthy life expectancy (HALE) for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018;392:1859–1922. doi: 10.1016/S0140-6736(18)32335-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lau C.L., Mayfield H.J., Lowry J.H., Watson C.H., Kama M., Nilles E.J., Smith C.S. Unravelling infectious disease eco-epidemiology using Bayesian networks and scenario analysis: a case study of leptospirosis in Fiji. Environ. Model. Software. 2017;97:271–286. [Google Scholar]
- Lemon S.C., Roy J., Clark M.A., Friedmann P.D., Rakowski W. Classification and regression tree analysis in public health: methodological review and comparison with logistic regression. Ann. Behav. Med. 2003;26:172–181. doi: 10.1207/S15324796ABM2603_02. [DOI] [PubMed] [Google Scholar]
- Llewellyn S., Inpankaew T., Nery S.V., Gray D.J., Verweij J.J., Clements A.C.A., Gomes S.J., Traub R., McCarthy J.S. Application of a multiplex quantitative PCR to assess prevalence and intensity of intestinal parasite infections in a controlled clinical trial. PLoS Negl. Trop. Dis. 2016;10 doi: 10.1371/journal.pntd.0004380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Loveland T.R., Belward A.S. The International Geosphere Biosphere Programme Data and Information System global land cover data set (DISCover) Acta Astronautica. 1997;41:681–689. [Google Scholar]
- Martha Maria F., Carmen Elena V., Paloma G. Use of recursive partitioning analysis in clinical trials and meta-analysis of randomized clinical trials, 1990–2016. Rev. Recent Clin. Trials. 2017;12:3–7. doi: 10.2174/1574887111666160916144658. [DOI] [PubMed] [Google Scholar]
- Martins N., McMinn P. Timor-Leste Ministry of Health; Dili, Timor Leste: 2012. Timor-Leste National Parasite Survey. [Google Scholar]
- Mayfield H.J., Smith C.S., Lowry J.H., Watson C.H., Baker M.G., Kama M., Nilles E.J., Lau C.L. Predictive risk mapping of an environmentally-driven infectious disease using spatial Bayesian networks: a case study of leptospirosis in Fiji. PLoS Neg. Trop. Dis. 2018;12 doi: 10.1371/journal.pntd.0006857. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Montresor A. World Health Organization; France: 2012. Eliminating Soil-transmitted Helminthiases as a Public Health Problem in Children. [Google Scholar]
- NASA Land Processes Distributed Active Archive Center (LP DAAC), 2000. MODIS MOD13Q1. version 5. USGS/Earth Resources Observation and Science (EROS) Center, Sioux Falls, South Dakota.
- Needham C.J., Bradford J.R., Bulpitt A.J., Westhead D.R. A primer on learning in bayesian networks for computational biology. PLoS Comput. Biol. 2007;3 doi: 10.1371/journal.pcbi.0030129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nguefack-Tsague, G., 2011. Using bayesian networks to model hierarchical relationships in epidemiological studies. Epidemiol. Health 33, e2011006-e2011006. [DOI] [PMC free article] [PubMed]
- Norsys Software Corp., 2020. 2.1.3 Discrete vs. Continuous, Basic Netica Operation. Accessed 15 March 2020, https://www.norsys.com/tutorials/netica/secB/tut_B2.htm.
- Ong J., Liu X., Rajarethinam J., Kok S.Y., Liang S., Tang C.S., Cook A.R., Ng L.C., Yap G. Mapping dengue risk in Singapore using Random Forest. PLoS Neg. Trop. Dis. 2018;12 doi: 10.1371/journal.pntd.0006587. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ranganathan P., Pramesh C.S., Aggarwal R. Common pitfalls in statistical analysis: logistic regression. Perspect. Clin. Res. 2017;8:148–151. doi: 10.4103/picr.PICR_87_17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Somodi I., Lepesi N., Botta-Dukát Z. Prevalence dependence in model goodness measures with special emphasis on true skill statistics. Ecol. Evol. 2017;7:863–872. doi: 10.1002/ece3.2654. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Spratt H., Ju H., Brasier A.R. A structured approach to predictive modeling of a two-class problem using multidimensional data sets. Methods. 2013;61:73–85. doi: 10.1016/j.ymeth.2013.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Statistics Timor-Leste . Direccao Geral Estatistica-Ministerio das Financas Timor-Leste; Dilli: 2019. Município Aileu Esboços Mapa Suco No Aldeia. [Google Scholar]
- Statistics Timor-Leste . Direccao Geral Estatistica-Ministerio das Financas Timor-Leste; Dilli: 2019. Município Manufahi Esboços Mapa Suco No Aldeia. [Google Scholar]
- Strobl C., Malley J., Tutz G. An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychol. Methods. 2009;14:323–348. doi: 10.1037/a0016973. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Strunz E.C., Addiss D.G., Stocks M.E., Ogden S., Utzinger J., Freeman M.C. Water, sanitation, hygiene, and soil-transmitted helminth infection: a systematic review and meta-analysis. PLoS Med. 2014;11 doi: 10.1371/journal.pmed.1001620. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Therneau, T.M., Atkinson, E.J., 1997. An introduction to recursive partitioning using the RPART routines. Tech. Rep. 61. URL http://www. mayo. edu/hsr/techrpt/61. pdf.
- Tolbert P.E., Klein M., Peel J.L., Sarnat S.E., Sarnat J.A. Multipollutant modeling issues in a study of ambient air quality and emergency department visits in Atlanta. J. Expo. Sci. Environ. Epidemiol. 2007;17:S29–S35. doi: 10.1038/sj.jes.7500625. [DOI] [PubMed] [Google Scholar]
- United Nations Development Program [UNDP] United Nations Development Programme; New York: 2018. Human Development Indicies and Indicators: 2018 Statistial Update. [Google Scholar]
- Vaz Nery S., Clarke N.E., Richardson A., Traub R., McCarthy J.S., Gray D.J., Vallely A.J., Williams G.M., Andrews R.M., Campbell S.J., Clements A.C.A. Risk factors for infection with soil-transmitted helminths during an integrated community level water, sanitation, and hygiene and deworming intervention in Timor-Leste. Int. J. Parasitol. 2019;49:389–396. doi: 10.1016/j.ijpara.2018.12.006. [DOI] [PubMed] [Google Scholar]
- Wang Y.-Q., Jia R.-X., Liang J.-H., Li J., Qian S., Li J.-Y., Xu Y. Effects of non-pharmacological therapies for people with mild cognitive impairment. A Bayesian network meta-analysis. Int. J. Geriatr. Psychiatry. 2020;35:591–600. doi: 10.1002/gps.5289. [DOI] [PubMed] [Google Scholar]
- Wardell R., Clements A.C.A., Lal A., Summers D., Llewellyn S., Campbell S.J., McCarthy J., Gray D.J., Nery V.S. An environmental assessment and risk map of Ascaris lumbricoides and Necator americanus distributions in Manufahi District. Timor-Leste. PLoS Neg. Trop. Dis. 2017;11 doi: 10.1371/journal.pntd.0005565. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weier, J., Herring, D., 2000. Measuring vegetation (EVI & NDVI). Accessed 12 March 2019, https://earthobservatory.nasa.gov/features/MeasuringVegetation.
- World Health Organization and UNICEF, 2015. Progress on sanitation and drinking water: 2015 update and MDG assessment. World Health Organization, New York.
- World Health Organization . WHO; Geneva: 2017. Preventive Chemotherapy to Control Soil-transmitted Helminth Infections in At-risk Population Groups. [PubMed] [Google Scholar]
- Zeger S.L., Thomas D., Dominici F., Samet J.M., Schwartz J., Dockery D., Cohen A. Exposure measurement error in time-series studies of air pollution: concepts and consequences. Environ. Health Perspect. 2000;108:419–426. doi: 10.1289/ehp.00108419. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.