Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2022 Dec 5;18(12):e1010748. doi: 10.1371/journal.pcbi.1010748

Leveraging infectious disease models to interpret randomized controlled trials: Controlling enteric pathogen transmission through water, sanitation, and hygiene interventions

Andrew F Brouwer 1,*, Marisa C Eisenberg 1, Kevin M Bakker 1, Savannah N Boerger 1, Mondal H Zahid 1, Matthew C Freeman 2, Joseph N S Eisenberg 1
Editor: Eric HY Lau3
PMCID: PMC9754603  PMID: 36469517

Abstract

Randomized controlled trials (RCTs) evaluate hypotheses in specific contexts and are often considered the gold standard of evidence for infectious disease interventions, but their results cannot immediately generalize to other contexts (e.g., different populations, interventions, or disease burdens). Mechanistic models are one approach to generalizing findings between contexts, but infectious disease transmission models (IDTMs) are not immediately suited for analyzing RCTs, since they often rely on time-series surveillance data. We developed an IDTM framework to explain relative risk outcomes of an infectious disease RCT and applied it to a water, sanitation, and hygiene (WASH) RCT. This model can generalize the RCT results to other contexts and conditions. We developed this compartmental IDTM framework to account for key WASH RCT factors: i) transmission across multiple environmental pathways, ii) multiple interventions applied individually and in combination, iii) adherence to interventions or preexisting conditions, and iv) the impact of individuals not enrolled in the study. We employed a hybrid sampling and estimation framework to obtain posterior estimates of mechanistic parameter sets consistent with empirical outcomes. We illustrated our model using WASH Benefits Bangladesh RCT data (n = 17,187). Our model reproduced reported diarrheal prevalence in this RCT. The baseline estimate of the basic reproduction number R0 for the control arm (1.10, 95% CrI: 1.07, 1.16) corresponded to an endemic prevalence of 9.5% (95% CrI: 7.4, 13.7%) in the absence of interventions or preexisting WASH conditions. No single pathway was likely able to sustain transmission: pathway-specific R0s for water, fomites, and all other pathways were 0.42 (95% CrI: 0.03, 0.97), 0.20 (95% CrI: 0.02, 0.59), and 0.48 (95% CrI: 0.02, 0.94), respectively. An IDTM approach to evaluating RCTs can complement RCT analysis by providing a rigorous framework for generating data-driven hypotheses that explain trial findings, particularly unexpected null results, opening up existing data to deeper epidemiological understanding.

Author summary

A randomized controlled trial (RCT) testing an intervention to reduce infectious disease transmission can provide high-quality scientific evidence about the impact of that intervention in a specific context, but the results are often difficult to generalize to other policy-relevant contexts and conditions. Infectious disease transmission models can be used to explore what might happen to disease dynamics under different conditions, but the standard use of these models is to fit to longitudinal, surveillance data, which is rarely collected by RCTs. We developed a framework to fit an infectious disease model to steady-state diarrheal prevalence data in water, sanitation, and hygiene RCTs, explicitly accounting for completeness, coverage, compliance, and other factors. Although this framework is developed with water, sanitation, and hygiene interventions for enteropathogens in mind, it could be extended to other disease contexts. By leveraging existing large-scale RCT data sets, it will be possible to better understand the underlying disease epidemiology and investigate the likely outcomes of policy-relevant scenarios. Ultimately, this work can be incorporated into decision making for public health policy and programs.

Introduction

Randomized controlled trials (RCTs) are an important source of scientific evidence in the field of epidemiology. They provide high-quality evidence on the causal impact of interventions. However, they cannot tell us how the trial would have performed in another context (i.e., with a different population), meaning that generalization of the trial results is not straightforward. Accordingly, policy makers eager to capitalize on new information do not have a way to predict the impacts of a related policy for their populations. While systematic reviews and meta-analysis are often employed to increase the collective power of a collection of related trials, it is still difficult to account for all relevant factors directly. In the infectious disease context, an intervention’s effectiveness will be impacted by variability in baseline exposures (conditions), differences in background rates of infection or disease burden across study sites, heterogeneous interventions, or differences coverage and adherence of the intervention. Mechanistic modeling approaches may be complementary to RCTs by offering a means to generalize an RCT’s results by directly accounting for these factors.

Infectious disease transmission models are an important tool that allow us to make inferences about disease spread and dynamics within a population. The standard use of transmission models for inference largely relies on time-series incidence data, usually in the form of passive surveillance, where the dynamic signature of time-series data can provide valuable information about an infectious disease transmission system [1]. However, longitudinal surveillance is only one of many types of data that can provide insight into the epidemiology of infectious diseases. RCTs are concerned with comparisons between individuals across intervention arms in the form of relative risks [24] and are generally do not include the collection of longitudinal surveillance data. If mathematical models could be fit directly to the data from large-scale RCTs (rather presupposing generalizability by using RCT results as parameters), it would open up a wealth of data that could be used to better understand epidemiological findings. Mechanistic models can play a pivotal role in developing intervention strategies and estimating what health benefits can be expected under various scenarios.

Decades of RCTs and observational studies have been conducted in the field of water, sanitation, and hygiene (WASH), constituting a wealth of data on the epidemiology of enteric disease [5]. WASH improvements have been responsible for major public health gains by greatly reducing diarrheal disease burden caused by a variety of enteropathogens. Reducing the diarrheal burden can also result in improved nutrition and reduced stunting of child growth [6]. Transmission of the enteropathogens responsible for diarrheal disease can occur through multiple, interconnected pathways by which a susceptible person may come in contact with pathogens shed (in feces) by an infected individual. These pathways are often visualized as an “F-diagram,” illustrating some of the potential transmission routes (e.g., feces, fingers, fomites, fluids, and food) and potential points of intervention [7] (Fig 1). Recent large scale, well-powered intervention trials have not shown the expected health benefits of WASH interventions, namely reduced diarrhea and improved growth [814]. Moreover, while WASH improvements such as latrines have demonstrable efficacy at the household level, they have not yielded the expected health improvements at the community level [15]. This failure of community-level efficacy is likely due to a combination of factors, including that the interventions did not sufficiently reduce transmission across the multiple pathways (completeness), the fraction of the population given the intervention did not reach a sufficient level to induce herd protection (coverage), the adherence to the interventions was not sufficiently great (compliance), the intervention was not a substantial improvement over existing infrastructure (baseline WASH conditions), the disease burden was too great or too small to see an impact (baseline disease conditions), or the interventions did not reduce transmission (efficacy) [16, 17].

Fig 1. F-diagram.

Fig 1

Simplified “F-diagram” illustrating different transmission pathways [7]. The blue bars show how specific types of interventions may interrupt transmission along these pathways. The dotted line (not traditionally included in the F-diagram), highlights that infected individuals continue to contribute to environmental contamination.

The goal of our study is to develop a framework that allows us to generalize the results of WASH RCTs by mechanistically accounting for completeness, coverage, compliance, WASH and disease conditions, and efficacy. Through this analysis, we aim to elucidate the mechanisms underlying how specific interventions designs impact transmission and disease incidence in different contexts, ultimately allowing us to inform the next generation of WASH research and programming and, more broadly, allowing RCT results to be generalized to other populations. To this end, we developed a compartmental infectious disease modeling framework and parameter estimation approach to analyzing RCT data. This analytical framework was designed to incorporate the data underlying relative risk estimates and other contextual data collected by an RCT to calibrate a mechanistic model by determining sets of mechanistic parameters that are consistent with the RCT outcomes. We applied our framework and approach using data from WASH Benefits Bangladesh [11, 18], a large, seven arm, cluster-randomized controlled trial on the impact of WASH and nutrition on diarrhea and child growth outcomes. Specifically, the model generated prevalence estimates from steady-state simulations accounting for multiple environmental pathways, individual adherence to multiple WASH interventions, preexisting WASH conditions, and the contribution of the subpopulation that was not enrolled into the RCT study to transmission. We used sampling-importance resampling to determine parameter sets consistent with the data and quantify the uncertainty in key parameters of interest, such as intervention efficacy and the relative contribution of different environmental pathways. These results will be used in future work to examine counterfactual questions (i.e., “what would have happened if … ?”) by simulating how the results of the calibrated model change under different scenarios, thereby addressing knowledge gaps and generating hypotheses about how best to improve disease outcomes in future.

Methods

Overview

The goal of this paper is to present a generalized approach to using infectious disease models to analyze WASH RCTs. Accordingly, the methods development is one of the primary results. Because this approach is intended to be generalized, we first open with a discussion of the type of data that our approach is intended to leverage. Then, we develop the model for a single intervention to build familiarity and intuition and discuss how the approach is generalized to multi-arm RCTs. Next, we discuss a hybrid sampling and estimation approach that identifies parameter sets consistent with the data, thereby reducing the dimension of the parameter space and providing estimates of the posterior distribution of each parameter. Finally, we introduce the WASH Benefits Bangladesh RCT, which we will use to illustrate the modeling approach.

Randomized controlled trial design

While no two randomized controlled trials (RCTs) are exactly alike, they often have similar characteristics. The objective of an infectious-disease-related RCT is to determine the effectiveness of an intervention, and, if the interventions are effective, the RCT may estimate pathway-specific attributable risks. RCTs are the one true experimental tool in epidemiology and are often considered the gold standard of scientific evidence [24] because the intervention is randomly assigned, thereby addressing confounding in the design phase of the study. Participants are randomly assigned to one or more groups, or arms. An intervention is applied to one arm (e.g., a water treatment device and health promotion visits from study staff), while the comparison group, the control arm, receives either no intervention, a placebo, or an alternative intervention (e.g., standard care, such as health promotion visits but no water treatment device). Typically, specified measurements are taken at a baseline time point prior to intervention and at one or more follow-up time points after the interventions are applied. More complicated intervention designs are possible, including multiple interventions that are compared to the control or each other. In the case of WASH RCTs, different interventions are used to target different environmental pathways (Fig 1). Multiple interventions applied together are used to evaluate the completeness of the interventions in blocking transmission, i.e., what fraction of transmission is along pathways that the interventions act on.

RCT data are often analyzed using one of two strategies. An intention-to-treat (ITT) approach analyzes data by grouping individuals by their intervention assignment. The ITT approach results in a measure of intervention effectiveness, reflecting real-world usage. Effectiveness can differ from the true underlying potential efficacy of the intervention used under ideal conditions because of imperfect intervention fidelity, i.e., delivery of the intervention, or imperfect intervention compliance/adherence, i.e., the use of the intervention by the recipient as intended. A per protocol approach to analysis of the RCT removes participants who did not receive or did not adhere to their intervention assignment. This approach can provide an estimate of intervention effectiveness that is closer to the true efficacy, but the benefits of randomization are lost. The magnitude of risk reduction associated with an intervention depends not only on the its efficacy at blocking transmission, but also on the endemic pathogen prevalence, the susceptibility of the target population, the quality of the preexisting WASH conditions (e.g., an improved latrine may have greater benefits if it replaces open defecation compared to if it replaces an unimproved latrine), and other factors.

Interventions in a WASH RCT are usually delivered in a way that would be practical for subsequent programs to implement at a greater scale, which means that interventions are usually provided at the household, village, school, or district level, rather than the individual level. Moreover, rather than randomizing individuals to treatment arms, RCTs may employ a cluster-randomized design to randomize larger groups, or clusters, because WASH interventions likely affect those beyond the intended target, i.e., they have both direct effects on the participant and indirect effects on the community through reduced infection pressure. The fraction of the population in an area that receives the interventions is the community coverage of the trial; in some trials, all individuals in a cluster are receive the intervention, and in other trials, only a subset or only eligible individuals receive the intervention.

WASH randomized clinical trial data

For the purposes of developing a general analysis framework, we lay out our assumptions about what is included in a WASH RCT data set reporting on diarrheal outcomes.

  • Random allocation occurs at the cluster level, and we know which intervention arm and cluster each individual is in.

  • We know whether each enrolled individual has preexisting WASH conditions substantively equivalent to the intervention. If not, we assume that the preexisting WASH conditions are not equivalent to the intervention.

  • The preexisting WASH conditions of people not enrolled the study are approximated by the conditions of people in the control arm at that point in time.

  • We know whether each individual received and adhered to their intervention (resulting in a per-protocol-like approach). If not, we assume that all individuals received and adhered to their intervention (resulting in an ITT-like approach).

  • We know the fraction of the population enrolled in the study (e.g., through a census). If not, we will estimate it as a model parameter.

  • The measured outcome for each enrolled individual is a self-report (or guardian report) of whether they recently had (all-cause) diarrhea (e.g., in the past seven days). All diarrhea is assumed to have an infectious etiology.

Mathematical modeling framework

Our framework is comprised of a transmission model representing WASH interventions applied to a subset of a population and a parameter estimation approach designed to fit data to steady-state modeling simulations. Our model structure incorporates several important structural design decisions. First, the model reflects a cluster RCT design where each cluster includes both the fraction of the population enrolled in the study and the remainder that is not. Second, the model explicitly accounts for adherence and fidelity (i.e., the extent to which the intervention is implemented as intended), and therefore it uses a per-protocol-like approach where only those who received and are adhering to the intervention are considered in the protected group. Third, the model accounts for both the impacts of the interventions and preexisting WASH conditions that block the same transmission pathway as the intervention. For example, we would account for both preexisting improved latrines and ones provided by the RCT. The efficacies of the intervention and corresponding preexisting condition may be assumed the same or separately estimated. Finally, everyone within a given cluster is modeled as sharing the same environment. These structural design decisions are specific to our WASH RCT analysis and the WASH Benefits study design in particular. Other study designs and infectious disease systems would necessitate an alternative model structure. Below, we describe the model structure and the parameter estimation approach.

Developing the single-intervention SISE–RCT model

Here, we develop the general SISE–RCT (susceptible, infectious, susceptible, environment model for randomized controlled trials) framework for a single intervention. The basis of our modeling framework is the environmental infectious disease transmission model, the SIRE (susceptible, infectious, recovered, environment) model [19, 20]. This model tracks the fraction of individuals that are susceptible (S), infectious (I), or recovered (R), as well as the concentration of pathogens in the environment (E). Pathogen transmission occurs when susceptible people contact the environment. The contact rate, contact volume, and pathogen infectiousness are all combined into a single transmission rate parameter β with units of per unit pathogen concentration per time [21]. Infectious individuals recover at rate γ and shed pathogen into the environment at rate α. Pathogens decay in the environment at rate ξ. The differential equations governing this model are

dSdt=-βSE,dIdt=βSE-γI,dRdt=γI,dEdt=αI-ξE. (1)

The basic reproduction number of the SIRE model, which is a measure of the epidemic potential of the system, is R0=βαγξ.

For the RCT framework, we modified the above model in several ways. First, because the WASH RCT outcome measure is all-cause and not pathogen-specific diarrhea, we used an SIS (susceptible–infectious–susceptible) framework. Individuals return to the susceptible compartment after their infectious period rather than progressing to a recovered compartment, as they remain susceptible to other enteropathogens. An SIS model can have an endemic equilibrium, where the disease remains prevalent at some level in the population. For many SIS models, the value of this endemic equilibrium is 1-1/R0, so that the larger the R0, the more prevalent the disease at equilibrium.

Second, we explicitly modeled two populations that share a single environment, i.e., a single cluster (Fig 2). The first population, denoted by the subscript +, received and is adhering to an intervention or has a substantively equivalent preexisting WASH condition (e.g., already has an improved latrine in a trial providing improved latrines). For brevity going forward, “preexisting WASH condition” will refer to a preexisting WASH condition that is substantively equivalent to the corresponding intervention. Accordingly, this population has attenuated exposure to or shedding into the environment. The second population, denoted by the subscript −, is not enrolled in the study, or has not received or is not adhering to the intervention and has no preexisting WASH condition. This population has regular exposure and shedding, although it may receive indirect benefits of the intervention through reduced environmental contamination. We use this framework for clusters both in the intervention arm and in the control arm. In the intervention arm, all study households receive the intervention, though not all adhere to the intervention, while local, non-study households do not receive the intervention but may have preexisting WASH conditions that are equivalent to the intervention. In the control arm, no one receives the intervention, and both study households and non-study households may or may not have preexisting WASH conditions.

Fig 2. Environmental transmission in an intervention study.

Fig 2

The shared environment has multiple environmental pathways that correspond to transmission routes depicted on a traditional F-diagram. The first population (subscript +) adheres to an intervention or has preexisting WASH conditions that provide the same protection as the intervention; this population has attenuated exposure to and attenuated shedding into the environment (dashed lines). The second population (subscript -) is not covered by the study or does not adhere to the intervention and has no preexisting WASH conditions; this population has regular exposure and shedding (solid lines).

Let ρ be the fraction of those in a cluster (i.e., with a shared environment) that are adhering to the intervention or an equivalent preexisting WASH condition. Let ρ0 be the fraction of those not in the study that have a preexisting WASH condition equivalent to the intervention. Let ω be the fraction of the population in the study (community coverage). The fraction of the population adhering to intervention or with a preexisting condition is ωρ + (1 − ω)ρ0, and the fraction of the population not adhering to the intervention or preexisting condition population is ω(1 − ρ) + (1 − ω)(1 − ρ0).

Third, we extend the model to account for multiple modes of environmental transmission, E1, E2, …, En. The modes represent different pathways on the F-diagram (Fig 1), such as fluids and fomites. We always include an “other” pathway to account for transmission pathways not intervened on. We define which part of the transmission process is impacted by each of the modeled interventions. An intervention may reduce the transmission rate by reducing the number of pathogens contacted or the susceptibility of the individual. We denote the relative transmission by ϕβi, where subscript βi denotes an impact on transmission from environment i, and thus the modifies the intervention efficacy by a factor of 1-ϕβi. For example, if an intervention has 80% efficacy in reducing transmission, the magnitude of the remaining transmission is 20% of the original. The intervention may instead reduce the shedding of pathogen to the shared environment, with efficacy 1-ϕαi, where αi denotes the impact on shedding into environment i.

The SISE–RCT model of a single intervention has four equations representing the human population and n equations representing the environmental pathways (Fig 2).

dS+dt=-(ϕβ1β1E1+ϕβ2β2E2++ϕβnβnEn)S++γI+,dI+dt=(ϕβ1β1E1+ϕβ2β2E2++ϕβnβnEn)S+-γI+,dS-dt=-(β1E1+β2E2++βnEn)S-+γI-,dI-dt=(β1E1+β2E2++βnEn)S--γI-,dE1dt=α1(ϕα1I++I-)-ξ1E1,dE2dt=α2(ϕα2I++I-)-ξ2E2,dEndt=α2(ϕαnI++I-)-ξnEn. (2)

Identifiability and reparameterization

The initial goal of this work is to determine what model parameter values are consistent with the observed RCT data. As we discuss in a later section in more detail, we connect the model to the data through the modeled steady-state diarrheal prevalence. To estimate the value of a parameter from the data, it must be identifiable from steady-state prevalence, that is, the parameter must have a unique value associated with a given steady state.

To solve for the steady-state values of the model, we first set each dEi/dt equation in the SISE–RCT model (Eq (2)) to 0 (a quasi-steady-state assumption), solve for each Ei, and substitute those expressions into the remaining equations (Eq. (S3) in S1 Appendix). When these resulting equations are at steady state, for each pathway i the model parameters βi, αi, ξi, and γi only appear together in a certain parameter combination and thus are not separately identifiable from steady-state data. We define these identifiable combinations of those four parameters as the pathway-specific reproduction numbers,

R0,i=βiαiγiξi. (3)

The magnitude of each R0,i relates to the strength of transmission along pathway i. Note, too, that R0=iR0,i is the basic reproduction number of the system in Eq (2) in the absence of any efficacious intervention, i.e., each ϕβi=1 and ϕαi=1. For parameter estimation, it is convenient to express the strength of the transmission pathways as relative to the total R0, that is R0,i/R0.

The equations Eq. (S3) in S1 Appendix resulting from the quasi-steady-state assumption on the environmental states and reparameterization (in terms of βα/ξ and γ) still include dynamics for the susceptible and infectious states, and there is not a closed-form solution for the steady state values. However, we can use numerical simulation to calculate the steady state solutions as a function of the R0,i only and not their constituent parameters using the following observation. If we divide the right hand side of each equation in Eq. (S3) in S1 Appendix by γ, the equations are only parameterized by the R0,i, and this set of equations has same steady state values as the original equations (Eq (2)), which is our goal. (Note that the transient dynamics of these equations are no longer biologically meaningful).

dS+dt=-(ϕβ1R0,1(ϕα1I++I-)+ϕβ2R0,2(ϕα2I++I-)++ϕβnR0,n(ϕαnI++I-))S++I+,dI+dt=(ϕβ1R0,1(ϕα1I++I-)+ϕβ2R0,2(ϕα2I++I-)++ϕβnR0,n(ϕαnI++I-))S+-I+,dS-dt=-(R0,1(ϕα1I++I-)+R0,2(ϕα2I++I-)++R0,n(ϕαnI++I-))S-+I-,dI-dt=(R0,1(ϕα1I++I-)+R0,2(ϕα2I++I-)++R0,n(ϕαnI++I-))S--I-. (4)

Adjusting for arm- and time-point-specific variation in force of infection

If an RCT takes multiple measurements (even just a pre-intervention and post-intervention measurement), it may be necessary to adjust the model for systematic differences in the force of infection across those time points. For example, external stressors, such as weather, may be vary across time points, as may important demographic factors such as the age distribution.

While successful randomization should render the baseline disease prevalence across arms to be essentially the same (therefore negating the need to adjust for covariates when estimating relative risks), it may be beneficial to account for actual differences across arms in potential confounders or external factors when fitting the data in mechanistic models to improve our model fits.

We account for any systematic differences in R0 across time points t = 1, …, T and between the control (a = 1) and intervention arms (a = 2), by defining time- and arm- specific relative reproduction number parameters ηt and ηa, relative to the basic reproduction number baseline and the control arm, respectively. Thus, we can represent the diarrheal disease pressure at time t in arm a as R0a,t=ηa·ηt·R0, where ηt=1 = 1 at baseline and ηa=1 = 1 for the control arm. Note that because we are constraining the systematic differences between R0 in the control arm at baseline and R0 at another time or in another arm as being the product of a time effect and an arm effect (ηtηa), the effect of the interventions can still be discerned from the systematic differences in R0. We are, for example, accounting for a higher baseline disease burden in an intervention arm compared to the control at baseline when assessing the intervention impact at endline. Note that the intervention effects would not be distinguishable from the systematic differences in R0 if the systematic effects were allowed to vary jointly across time and arm (ηt,a); i.e., we would not be able to determine whether the intervention worked if the disease burden changed arbitrarily for each arm at each time point.

Summary of model parameters

We have now defined all parameters needed to specify the SISE-RCT model, which will be used to calculate cluster- and intervention vs non-intervention population-specific steady-state prevalence values. These parameters consist of i) the overall basic reproduction number R0; ii) pathway-specific basic reproduction numbers relative to the total basic reproduction R0,i/R0, one for each environmental pathway i; iii) efficacy parameters indicating the effect of the intervention on transmission ϕβ,i and shedding ϕα,i for each pathway i; iv) the arm- and time-specific relative basic reproduction numbers ηt and ηa; and v) the coverage ω (if unknown). Collectively, we denote this set of model parameters {R0,{R0,i/R0},{ϕβ,i},{ϕα,i},{ηa},{ηt},ω} as θ, where internal brackets denote a set of multiple parameters.

Expanding the model framework to an RCT with multiple intervention arms

This single intervention framework can be expanded to account for multiple interventions within a single RCT. In the context of WASH, an RCT may have water, sanitation, and hygiene interventions arms, as well as arms evaluating some combination of these interventions. The RCT may also include interventions indirectly associated with WASH, such as nutrition. For an RCT with multiple interventions, the number of modeled populations is 2 raised to the power of distinct interventions. For example, to model an RCT testing 3 interventions in some number of combinations, we model each cluster as partitioned into 8 populations denoted by whether individuals are independently adhering to each of the three interventions (or preexisting conditions): I000, I100, I010, I001, I110, I101, I011, I111, where a 1 in the subscript represent that individuals in the group receive and adhere to the respective intervention or preexisting WASH condition and a 0 represents that they do not. We refer to these populations as adherence groups. Note that the number of adherence groups does not depend on the number of arms in the RCT. The RCT may or may not be investigating any given combination of interventions, but because individuals may or may not be adhering to each intervention or the associated preexisting condition, all adherence groups are modeled in all clusters in all arms. We replace the single intervention adherence fraction ρ by a vector denoting the fraction of the population in each of adherence groups ρ. We denote the specific distribution of adherence groups in cluster j as ρj. In a cluster, the population not enrolled in the study also has a distribution of adherence groups, denoted ρ0, which only includes adherence to preexisting conditions. Because we do not have a measure of ρ0, as detailed earlier, we assume that it follows the mean baseline distribution of preexisting conditions and is the same in all clusters.

The differential equation model defines a set of steady state prevalence values πj among the adherence groups in cluster j as a function of the model parameters θ and the distribution of adherence groups among people in the study ρj and not in the study ρ0. We denote the steady state prevalence values in this cluster πj(θ, ρj, ρ0) as a function of the model parameters. We are interested in prevalence estimates for each observation k, i.e., for a given individual at a given time point. Because ρj is known from the data for any observation k in cluster j and as ρ0 is assumed to be equal to that of the control group, we can drop explicit dependence on these quantities when we denote the modeled prevalence for observation k as a function of the model parameters, πk(θ). Note that all individuals in the same adherence group in the same cluster at a given time will have the same associated modeled prevalence.

Parameter estimation

Statistical likelihood

We connect the model parameters to the data through a goodness-of-fit function, the likelihood L. Because self-reported diarrhea is a binary outcome, we use a Bernoulli likelihood. As defined above, let θ be the vector of model parameters, πk be the modeled prevalence corresponding to observation k, and xk be the indicator of diarrhea for observation k. Then, the likelihood is given as

L(θ)=Πk((πk(θ))xk(1-πk(θ))1-xk). (5)

Sampling-importance resampling

We expect that in most RCT contexts, most of the model parameters θ will not be strongly determined by the available data, i.e., will have large uncertainty around their values. For example, a 30% water intervention efficacy may explain the data as well as a 70% efficacy, for certain values of the other parameters. In this situation, the likelihood space may be flat or multi-modal. Accordingly, it may not be possible to determine maximum-likelihood estimates, and asymptotic confidence intervals may not be representative of the true uncertainty. Instead, we take a sampling approach to understand the distribution and uncertainty of the parameter values that explain the RCT data. The goal of this approach is akin to parameter space dimension reduction—that is, identifying the low-dimensional manifold in parameter space that corresponds to high goodness-of-fit to the data—rather than individual parameter estimation. We do not expect to recover most, if any, of the parameters with certainty. Instead, these parameter sets will collectively be used in future work to simulate what trial outcomes would have been in counterfactual scenarios to explore the full range of possible outcomes consistent with the RCT.

First, we sample a large number of parameter sets from a prior distribution (such as a uniform distribution, as we use here). Then, we evaluate the importance of these samples by calculating their likelihood (Eq 5). Finally, we resample the parameter sets, weighting the parameter sets by their likelihoods, to generate posterior distributions of our parameters. This Bayesian approach is known as sampling-importance resampling [22, 23].

Hybrid sampling–estimation approach

Instead of sampling all parameters in the sampling step of the sampling-importance resampling procedure, we sample a subset of the parameters, treat them as fixed, and estimate the remaining parameters. This approach creates a hybrid sampled–estimated set of initial parameters that we subsequently calculate the likelihoods for and resample from. The advantage of this approach is that it reduces the number of parameter samples needed by preventing a parameter set that could otherwise fit the data well from being discarded because an identifiable parameter was sampled poorly. This hybrid approach requires that the parameters to be estimated be practically identifiable, given fixed values of the sampled subset of parameters. Because R0 is closely tied to steady state prevalence, there will be a best-fit value of R0 for a given set of coverage, efficacy, and pathway-specific relative R0 parameters. Thus, for each sample of the coverage, efficacy, and relative pathway parameters, we find the maximum-likelihood value of R0 and arm- and time-specific relative basic reproduction numbers ηa and ηt. This maximum likelihood is then associated with the sampled parameter set. Once a likelihood is established for each sampled–estimated parameter set, the posterior distribution can be estimated using sampling-importance resampling. More formally, we take the following steps.

  1. Define sets of parameters to be sampled θsamp={{R0,i/R0}i=1n-1,{ϕiβ}i=1n,{ϕiα}i=1n,ω} and parameters to be estimated θest={R0,{ηa}a=2A,{ηt}t=2T}, taking advantage of any degeneracies (e.g., we do not need to estimate R0n/R0 if we know R0 and {R0,i/R0}i=1n-1).

  2. Define parameter sample sets of values of θsampm for mM using a multivariate uniform distribution or a more efficient algorithm such as Latin hypercube sampling [24] or a Sobol sequence [25].

  3. For each θsampm, use an optimization algorithm to minimize the negative log-likelihood as a function of θest, and set θestm=argminθest(-logL(θest;θsampm)).

  4. Define unnormalized weights νm for each θm={θsampm,θestm} as the corresponding likelihood value divided by the probability of the sample in the originating (uniform) distribution νm=L(θm)/(1/M). Define normalized weights ν¯m=νm/mMνm. It may be preferable to directly compute the normalized weights as
    ν¯m=exp(logL(θm)+ψ)/mM(exp(logL(θm)+ψ)) (6)
    where ψ=minmM(-logL(θm)) is the minimum negative log-likelihood among the parameter samples.
  5. Sample N parameter sets, with replacement, from {θm} using the normalized weights ν¯m.

  6. These N parameter sets approximate the posterior distribution of θ and thus describe our knowledge of about the uncertain parameter. Summarize these distributions using histograms.

We summarize the hybrid sampling–estimation approach in Fig 3.

Fig 3. Schematic.

Fig 3

Schematic of the hybrid sampling–estimation approach to estimating model parameters from Water, Sanitation, and Hygiene (WASH) randomized controlled trials (RCT) data.

WASH Benefits Bangladesh

We demonstrate the modeling framework using data from the WASH Benefits Bangladesh RCT [11, 18]. WASH Benefits Bangladesh measured diarrheal prevalence in children (as well as multiple child growth measures, although we do not consider those outcomes here) at each of three time points (baseline, midline, endline). Households in the study area are typically organized into compounds in which a patrilineal family shares a common space and resources, such as a water source and latrine. A total of 5551 compounds were enrolled, contingent on having a pregnant woman in their second trimester during the enrollment period. The study followed one or more target children born after baseline in the index household, as well as any siblings or children in other households of the compound who were under age 3 at baseline. These compounds were grouped into 720 clusters. Each cluster was assigned to one of 7 arms testing combinations of 4 interventions: water chlorination (W), a double-pit, pour flush improved latrine (S), handwashing with soap and water (H), and supplementary nutrition sachets (N). Of the 720 clusters, 180 were assigned to the control arm (C), while 90 were assigned to each of the water (W), sanitation (S), handwashing (H), nutrition (N), combined water, sanitation, and handwashing (WSH), and all interventions (WSH-N) arms. Specific details on trial design, intervention specifics, and results may be found elsewhere [11, 18]. Because our analysis was a secondary analysis of deidentified data, it is not regulated as human subjects research.

For the purposes of assigning individuals to intervention adherence groups for the model, we classified individuals at each time point in each arm as using or not using each intervention or preexisting WASH condition using the following indicators.

  • W: Free chlorine was detected in stored water.

  • S: Latrine was present and had a functional water seal.

  • H: Primary handwashing location was present with available water and soap.

  • N: At least 50% of expected nutrition sachets were reported as being consumed.

Each of these indicators denote WASH or nutrition conditions that impact susceptibility, exposure, or shedding, as we describe in more detail below. These four indicators collectively describe 24 = 16 adherence groups. In WASH Benefits Bangladesh, chlorine was only measured in arms with the water intervention, and we assumed that there was no use of chlorination in arms not receiving the intervention. Children not in nutrition intervention arms and non-target children in nutrition intervention arms were assumed to have not received supplemental nutrition. Chlorination, handwashing, and latrine interventions were assessed at the compound level, but the chlorination and handwashing interventions were targeted to the index household. We were not able to determine whether non-target children were members of the index household or a non-index household in the same compound. For this analysis, we assumed that all non-target children were covered by the intervention if the target child was; any misspecification may attenuate the estimates of water and hygiene intervention efficacy. We removed individuals with negative reported ages (n = 2), missing reported diarrhea (n = 2,745), or missing in any of the four use indicators (n = 2,660), which left 17,187 individual observations (76% of the original sample) over the three time points. For the remaining data, we plot the arm-specific prevalence of each of the 4 indicators (Fig 4a–4d). Among target children in arms receiving the nutrition intervention, 93% reported consumption of at least 50%, the vast majority of whom (83% of target children) reported 100% consumption.

Fig 4. Prevalence of intervention and preexisting WASH conditions.

Fig 4

Prevalence of (a) free chlorine, (b) latrine with water seal, (c) handwashing station with soap and water, and (d) reported 50% nutrition sachet consumption (only provided to target children). The arms are denoted by combinations of interventions, C: control, W: water, S: sanitation, H: hygiene, N: nutrition.

We connect the data to the model by making assumptions about which transmission pathways the interventions/conditions effect. We model three transmission pathways, namely water w, fomites and hands f, and other o. We assume chlorination (W) reduces exposure from the water pathway ϕβw,W, a latrine water seal (S) reduces shedding into the water pathway ϕαw,S, handwashing with soap and water (H) reduces exposure from the fomites and hands pathway ϕβf,H, and supplemental nutrition (N) reduces susceptibility from all pathways ϕβ,N=ϕβw,N=ϕβf,N=ϕβo,N (where the first subscript, e.g., βw, represents the pathway-specific parameter attenuated by the intervention, and the second subscript, e.g., W, represents the intervention.) We assume transmission and shedding in all other arms and pathways are not attenuated, i.e., ϕβ and ϕα are 1. The S and H interventions have comparable preexisting conditions, but the intervention efficacy may be higher than the preexisting condition efficacy. Accordingly, we estimated separate preexisting condition efficacy parameters, ϕ¯αw,S and ϕ¯βf,H, applied at baseline and in the arms without the corresponding intervention.

In total, we considered 18 model parameters, namely the overall R0, 3 pathway-specific relative R0 parameters, 4 intervention efficacy parameters, 2 preexisting condition efficacy parameters, a coverage parameter, 6 arm-specific relative R0 parameters, and 2 time-point-specific R0 parameters: θ={R0,R0,w/R0,R0,f/R0,ϕβw,W,ϕαw,S,ϕβf,H,ϕβ,N,ϕ¯αw,S,ϕ¯βf,Hω,ηW,ηS,ηH,ηN,ηWSH,ηWSH-N,ηmid,ηend}. Full and steady-state model equations are given in Eq. (S4) and (S5). We used the hybrid sampling–estimation approach (Fig 3) to estimate posterior distributions of each of the 18 parameters. We began with prior set of M = 50,000 parameter sets determined by a Sobol sequence to uniformly cover the parameter space, and we present posterior distributions taking N = 50,000 samples (with replacement) from the prior distribution weighted by the importance (likelihood) of the prior samples.

Results

The hybrid sampling–estimation procedure applied model of the WASH Benefits Bangladesh RCT trial resulted in a set of 50,000 parameter sets in which 3,692 unique samples with repetition dependent on goodness-of-fit. (A comparison of the distribution of model fits in the prior and posterior samples is provided in Fig A in S1 Appendix). These parameter sets reproduced the observed prevalence across the arms and time points (Fig 5a; the analogous figure with midline and endline separately plotted is given in Fig B in S1 Appendix). The observed prevalence values are analogous and comparable to the results shown previously in Luby et al. [11], although the prevalence estimates in Fig 5a are for the subset of the population with full intervention adherence data. As previously reported [11], midline/endline prevalence in the W arm was comparable to the control arm, and the prevalence in the remaining arms were lower than control and similar to each other. There were no statistically significant differences in the observed baseline prevalence across the arms.

Fig 5. Prevalence and prevalence ratio.

Fig 5

a) Prevalence of self-reported diarrhea (7-day recall) in WASH Benefits comparing the baseline (red) to the combined midline/endline (blue) surveys (comparable to result given in [11]), as well as posterior distributions of simulated prevalence (violin plots). b) Prevalence ratios (data and simulated) for each arm for midline/endline relative to base. In both figures, the violin plots indicate the distribution of values in the parameter posterior sample for each arm, and their areas are scaled to the number of observations.

When comparing the relative risk of midline/endline to baseline across arms, arms with combined interventions had lower point estimate prevalence ratios than arms with constituent interventions, e.g., the point estimate for the WSH-N prevalence ratio was below than each of the W, S, H, WSH, and N arms. Moreover, the midline/endline to baseline prevalence ratios for all arms, including the control, were below 1. Together, these results support the inclusion of both arm- and time-point-specific estimates of the variation in R0, allowing us to maximize the amount of information about the pathway and efficacy parameters.

The median estimate of the basic reproduction number R0 corresponding to the control arm at baseline was 1.10 (95% CrI: 1.08, 1.16) (Fig 6a), which corresponds to an endemic prevalence of 9.5% (95% CrI: 7.4, 13.7%) in the absence of any preexisting WASH conditions. Estimates of timepoint- and arm-specific relative R0s are given in the supporting information (Fig C in S1 Appendix). The posterior distributions of the pathway-specific reproduction numbers were wide, indicating uncertainty in the estimates. The mean R0 value for the water pathway accounted for 42% of transmission (R0,w=0.38, 95% CrI: 0.03, 0.97), while the fomite and other pathways accounted for 22% (R0,f=0.17, 95% CrI: 0.02, 0.59) and 35% (R0,o=0.48, 95% CrI: 0.02, 0.94), respectively. However, there is substantial uncertainty around the specific values. Nevertheless, the results indicate that no single pathway would be able to sustain diarrhea transmission alone but that elimination of any single pathway path may not be sufficient to eliminate disease, either.

Fig 6. Reproduction numbers.

Fig 6

Posterior distributions (dark grey) of the (a) overall basic reproduction number R0, (b) the water pathway basic reproduction number, (c) the fomite pathway reproduction number, and (d) the other pathway reproduction number. Prior distributions (light grey) are given for the three sampled parameters; prior distributions are not uniform because they are products of underlying uniformly distributed parameters. Vertical lines give median values.

The observed prevalence values are consistent with a wide range of values for the efficacy of a) water chlorination (ϕβw,W=0.40, 95% CrI: 0.04, 0.98), b) having a latrine with a water seal (ϕαw,S=0.15, 95% CrI: 0.00, 0.86), and c) handwashing with soap and water (ϕβf,H=0.52, 95% CrI: 0.04, 0.97), but are consistent with a comparably narrower range of efficacy of d) consumption of nutrition sachets (ϕβ,N = 0.12, 95% CrI: 0.01, 0.33) (Fig 7). The efficacy of the preexisting condition versions of having a latrine with a water seal and handwashing with soap and water were skewed slightly lower (ϕ¯αw,S=0.08, 95% CrI: 0.00, 0.67) and ϕβf,H=0.34, 95% CrI: 0.01, 0.88)) The median estimate of the community coverage ω was 7.2% (95% CrI: 0.004, 18.6%) of the population was enrolled in the study (Fig D in S1 Appendix); this result is an underestimate because the initial sampling range (up to 20%) did not capture the full tail of the distribution.

Fig 7. Efficacy.

Fig 7

Posterior distributions (dark grey) of the efficacy of (a) water chlorination, (b) latrine water seal, (c) handwashing, and (d) nutrition interventions and posterior distributions (blue) of the efficacy of (b) latrine water seal and (c) handwashing preexisting conditions. Prior distributions (light grey) are given for all four sampled parameters. Vertical lines give median values.

Discussion

Mechanistic models are valuable hypothesis-generating tools, complementing epidemiological analyses often used for hypothesis testing. As we showed here, they can be particularly useful alongside randomized controlled trials (RCTs), which provide rigorous assessments of a specific hypothesis but do not generalize easily to other contexts. Here, we developed a steady-state analytical framework and Bayesian parameter estimation approach that exploits both relative risk estimates and other contextual information collected in RCTs, taking advantage of the rich individual-level data in the trial. In particular, sampling-importance resampling is an ideal algorithm for developing inference from relative risk estimates or other non-epidemic disease data [2628]. Applying this framework to a WASH RCT, we found that while no single pathway was likely sufficient to sustain transmission, reduction of one pathway alone may not be sufficient to eliminate disease, suggesting the potential need for multiple interventions, including some not included in traditional WASH RCTs.

One important feature of our approach is the focus on steady state analysis. Although there is an extensive mathematical biology literature studying steady state properties of dynamics transmission models that focus both on stability analysis and the estimation of R0 [29], little work has been done to build an inferential framework around steady state solutions to take advantage of epidemiological data collected in RCTs and observational studies. Instead, most inferential frameworks built around analyzing infectious disease transmission models are designed to incorporate time-series data often from passive surveillance. For instance, increasingly, studies are using transmission models with epidemic data to estimate R0 [30, 31]. The framework that we developed here takes advantage of relative risk estimates comparing prevalence among different sub-populations that have different pathway-specific exposures [32]. RCTs provide the most rigorous data to exploit since the experimental design reduces confounding biases.

Pathogens that can exploit multiple environmental transmission pathways, like enteric pathogens, require a modeling framework that explicitly and distinctly incorporates these pathways [33]. Modeling multiple pathways allows us to ascertain the combination of interventions that optimally reduces diarrhea, subject to programmatic constraints. In our analysis of the WASH Benefits data, we found that there was not a single dominant environmental pathway that was likely to sustain transmission on its own, suggesting that multiple interventions would be needed to eliminate transmission. The perspective that traditional WASH interventions are not blocking all the important pathways causing infection is actively being discussed in the literature [16, 34, 35]. Our finding that no single pathway was sufficient provides empirical backing to this perspective. We also estimated that the fraction of the population covered by the study was small, likely under 10%. Although the direct effects of interventions can be estimated in RCTs with low community coverage, some WASH interventions, particularly sanitation, act primarily through indirect effects. Previous work has shown that indirect effects are unlikely to be apparent until greater intervention coverage is achieved [36, 37]. Our future work will explore the potential disease reduction outcomes if the WASH Benefits interventions were implemented on a larger scale.

We also found that the reported diarrheal prevalence was consistent with a wide range of values for many of the environmental pathway strength and efficacy parameters. The practical unidentifiability of these parameters is due in part to trade-offs between the pathway strength and efficacy parameters. For example, the prevalence data may be explained by a strong water transmission pathway and weak efficacy of chlorination. Alternatively, the data may be explained by a weak water transmission pathway, regardless of the efficacy of chlorination. While these wide uncertainties may appear a limitation of our approach, they are in fact expected and consistent with our goal of determining parameter sets consistent with the data rather than individual parameter values. In future work, we will use these parameter sets to assess counterfactual questions and generalize the trial results; capturing reasonable fits to the data across the uncertainty in individual parameters is a strength of our approach when it comes exploring the real range of potential outcomes in other scenarios.

Additional data may be able to reduce individual parameter uncertainty, allowing for more detailed insight into the multiple, complicated factors underlying diarrheal disease transmission. New data could better constrain the outcomes of counterfactual and generalizing simulations. Additional data could come in the form of more information about mechanistic parameters, e.g., tests that confirm that the chlorination efficacy should be at least 75%, or a community census indicating the study covered 5–10% of households. More broadly, our approach could be extended to incorporate environmental data, as there are an increasing number of studies collecting environmental samples from different media to inform pathways-specific exposure [3841]. Environmental sampling may be able to add more specificity to exposure variables, which have traditionally been based on the presence of infrastructure (e.g., piped water or the presence of sanitation structures). They may also help to clarify important transmission pathways that are not always covered by WASH interventions, e.g., food, animals, included here as the other transmission pathway. Multiplex molecular technology now allows for more efficient testing of environmental samples for a wide array of pathogens. Information about pathogen-specific disease burden could be incorporated into pathogen-specific models that, for instance, account for the fact that chlorination may be more effective at reducing bacterial transmission than protozoal transmission [42]. Given new, more affordable technologies, it is important to develop and standardize environmental sampling strategies for different transmission pathways in a way that complements standard epidemiological data sets. More work is needed, both theoretical and practical, to ensure that the environmental sampling data can inform transmission pathways and, ultimately, case data. This new molecular technology can also affordably identify pathogens in stool samples, which can improve parameter estimation by replacing self-reported diarrhea outcomes, which are highly variable [43] and of uncertain etiology, with laboratory-confirmed infection. An enhanced understanding of infection, and not just disease, would also more directly inform our disease transmission modeling framework.

Our work is subject to several limitations. We assumed that individuals in a cluster all interact in a shared environment distinct from other environments. A more detailed characterization of the environment could allow for the relaxation of this assumption. We also assumed that the impacts of interventions could be identified from a proxy indicators. Additionally, the data assessed intervention adherence at the compound level, resulting in potential misspecification of the water and hygiene interventions for non-target children not in index household; the likely impact of this misspecification is an attenuation of the estimated efficacy of the water and hygiene interventions. Also, we adjusted for differences in diarrheal prevalence in the different survey years, but did not account specifically for seasonality within the survey years. Finally, we assumed a particular specification of transmission pathways and how the interventions impact them. Although we chose this specification based on our best mechanistic understanding of the interventions and the environment, other specifications—and the sensitivity of the results to the choice of specification—should be examined more closely in future work.

Conclusion

The strength of this work is in the integration of an advanced mathematical framework, a computational approach leading to a robust understanding of uncertainty, and the large, well-executed trial that supplied the data. Our work highlights the benefits of underutilized, interdisciplinary collaborations between mathematical epidemiologists and infectious disease trialists. This framework lays the groundwork for further analysis to better explain WASH RCT results, asking questions about completeness, i.e., the degree to which interventions block most or all transmission pathways, and how the effectiveness of the interventions may increase with increasing community coverage, compliance, conditions, and efficacy. This framework can be used in future work to examine policy-relevant questions about how these factors collectively impact intervention effectiveness, and it can be adapted to a variety of other infectious disease RCTs, e.g., vectorborne disease interventions. The application of mathematical modeling to estimate the impact of WASH interventions across different contexts—community coverage, adherence, background infection prevalence, intervention efficacy—could improve external validity and deliver policy-relevant findings to better inform public infrastructure investment.

Supporting information

S1 Appendix. Supporting information.

The supporting information includes the generic SISE–RCT model equations, the WASH Benefits Bangladesh SISE–RCT model equations, and supplemental results, including distributions of likelihoods, timepoint- and arm-specific relative basic reproduction numbers, and fraction of the population enrolled in the study. Fig A. Distribution of negative log-likelihood fits to the data for parameter samples in the prior and posterior sample sets. Fig B. Prevalence of self-reported diarrhea (7-day recall) in WASH Benefits comparing the baseline (red) to the combined midline (orange) and endline (yellow) surveys, as well as posterior distributions of simulated prevalence (violin plots). Fig C. Distribution of timepoint- and arm-specific relative basic reproduction numbers (R0). The dotted line corresponds to 1.00, or no difference from the control arm at baseline. The solid lines give the mean values of the distributions. Fig D. Posterior (grey) and prior (white) distributions for the estimated fraction of the population enrolled in the study.

(PDF)

Acknowledgments

We thank the WASH Benefits investigators who supported data acquisition and initial interpretation of our work, including Benjamin Arnold, John Colford Jr, Clair Null, Amy Pickering, Andrew Mertens, Ayse Ercumen, Jade Benjamin-Chung, Mahbubur Rahman, and Sania Ashraf.

Data Availability

The WASH Benefits Bangladesh data is publicly available at https://osf.io/tprw2/. The data and code underlying the results of this paper are available on a GitHub repository at https://github.com/afbrouwer/WASH_RCT_transmission_modeling.

Funding Statement

This work was funded by the Bill & Melinda Gates Foundation (www.gatesfoundation.org; grant INV-005081; supported AFB, MCE, SNB, MHZ, MCF, and JNSE) and the National Science Foundation (www.nsf.gov; grant DMS-1853032; supported AFB and MCE). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Keeling MJ, Rohani P, Rohani P. Modeling Infectious Diseases in Humans and Animals. Princeton University Press,(Princeton, NJ:); 2011. [Google Scholar]
  • 2. Akobeng AK. Understanding randomised controlled trials. Archives of disease in childhood. 2005;90(8):840–844. doi: 10.1136/adc.2004.058222 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Hariton E, Locascio JJ. Randomised controlled trials—the gold standard for effectiveness research. BJOG. 2018;125(13):1716. doi: 10.1111/1471-0528.15199 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Sibbald B, Roland M. Understanding controlled trials. Why are randomised controlled trials important? BMJ: British Medical Journal. 1998;316(7126):201. doi: 10.1136/bmj.316.7126.201 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Taing L, Dang N. Water, Sanitation, and Hygiene in Global Health. In: Handbook of Global Health. Springer; 2021. p. 2211–2226. [Google Scholar]
  • 6. Zavala E, King SE, Sawadogo-Lewis T, Roberton T. Leveraging water, sanitation and hygiene for nutrition in low-and middle-income countries: A conceptual framework. Maternal & child nutrition. 2021; p. e13202. doi: 10.1111/mcn.13202 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Wagner EG, Lanoix JN, Organization WH, et al. Excreta disposal for rural areas and small communities. World Health Organization; 1958. [PubMed]
  • 8. Clasen T, Boisson S, Routray P, Torondel B, Bell M, Cumming O, et al. Effectiveness of a rural sanitation programme on diarrhoea, soil-transmitted helminth infection, and child malnutrition in Odisha, India: A cluster-randomised trial. The Lancet Global Health. 2014;2(11):e645–e653. doi: 10.1016/S2214-109X(14)70307-9 [DOI] [PubMed] [Google Scholar]
  • 9. Patil SR, Arnold BF, Salvatore AL, Briceno B, Ganguly S, Colford JM, et al. The effect of India’s total sanitation campaign on defecation behaviors and child health in rural Madhya Pradesh: A cluster randomized controlled trial. PLoS Medicine. 2015;11(8). doi: 10.1371/journal.pmed.1001709 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Pickering AJ, Djebbari H, Lopez C, Coulibaly M, Alzua ML. Effect of a community-led sanitation intervention on child diarrhoea and child growth in rural Mali: A cluster-randomised controlled trial. The Lancet Global Health. 2015;3(11):e701–e711. doi: 10.1016/S2214-109X(15)00144-8 [DOI] [PubMed] [Google Scholar]
  • 11. Luby SP, Rahman M, Arnold BF, Unicomb L, Ashraf S, Winch PJ, et al. Effects of water quality, sanitation, handwashing, and nutritional interventions on diarrhoea and child growth in rural Bangladesh: A cluster randomised controlled trial. The Lancet Global Health. 2018;6(3):e302–e315. doi: 10.1016/S2214-109X(17)30490-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Null C, Stewart CP, Pickering AJ, Dentz HN, Arnold BF, Arnold CD, et al. Effects of water quality, sanitation, handwashing, and nutritional interventions on diarrhoea and child growth in rural Kenya: a cluster-randomised controlled trial. The Lancet Global Health. 2018;6(3):e316–e329. doi: 10.1016/S2214-109X(18)30005-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Rogawski McQuade ET, Platts-Mills JA, Gratz J, Zhang J, Moulton LH, Mutasa K, et al. Impact of Water Quality, Sanitation, Handwashing, and Nutritional Interventions on Enteric Infections in Rural Zimbabwe: The Sanitation Hygiene Infant Nutrition Efficacy (SHINE) Trial. The Journal of Infectious Diseases. 2020;221(8):1379–1386. doi: 10.1093/infdis/jiz179 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Knee J, Sumner T, Adriano Z, Anderson C, Bush F, Capone D, et al. Effects of an urban sanitation intervention on childhood enteric infection and diarrhea in Maputo, Mozambique: A controlled before-and-after trial. eLife. 2021;10. doi: 10.7554/eLife.62278 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Levy K, Eisenberg JNS. Moving towards transformational WASH. The Lancet Global Health. 2019;7(11):e1492. doi: 10.1016/S2214-109X(19)30396-1 [DOI] [PubMed] [Google Scholar]
  • 16. Cumming O, Arnold BF, Ban R, Clasen T, Esteves Mills J, Freeman MC, et al. The implications of three major new trials for the effect of water, sanitation and hygiene on childhood diarrhea and stunting: a consensus statement. BMC medicine. 2019;17(1):1–9. doi: 10.1186/s12916-019-1410-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Haque SS, Freeman MC. The Applications of Implementation Science in Water, Sanitation, and Hygiene (WASH) Research and Practice. Environmental Health Perspectives. 2021;129(6):65002. doi: 10.1289/EHP7762 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Arnold BF, Null C, Luby SP, Unicomb L, Stewart CP, Dewey KG, et al. Cluster-randomised controlled trials of individual and combined water, sanitation, hygiene and nutritional interventions in rural bangladesh and Kenya: The WASH benefits study design and rationale. BMJ Open. 2013;3(8):1–17. doi: 10.1136/bmjopen-2013-003476 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Li S, Eisenberg JNS, Spicknall IH, Koopman JS. Dynamics and Control of Infections Transmitted From Person to Person Through the Environment. American Journal of Epidemiology. 2009;170(2):257–265. doi: 10.1093/aje/kwp116 [DOI] [PubMed] [Google Scholar]
  • 20. Tien JH, Earn DJD. Multiple transmission pathways and disease dynamics in a waterborne pathogen model. Bulletin of mathematical biology. 2010;72(6):1506–33. doi: 10.1007/s11538-010-9507-6 [DOI] [PubMed] [Google Scholar]
  • 21. Brouwer AF, Weir MH, Eisenberg MC, Meza R, Eisenberg JNS. Dose-response relationships for environmentally mediated infectious disease transmission models. PLOS Computational Biology. 2017;13(4):e1005481. doi: 10.1371/journal.pcbi.1005481 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Rubin DB. Using the SIR algorithm to simulate posterior distributions. Bayesian statistics. 1988;3:395–402. [Google Scholar]
  • 23. Smith AFM, Gelfand AE. Bayesian statistics without tears: A sampling-resampling perspective. American Statistician. 1992;46(2):84–88. doi: 10.2307/2684170 [DOI] [Google Scholar]
  • 24. Stein M. Large sample properties of simulations using Latin hypercube sampling. Technometrics. 1987;29(2):143–151. doi: 10.1080/00401706.1987.10488205 [DOI] [Google Scholar]
  • 25. Sobol’ IM. On the distribution of points in a cube and the approximate evaluation of integrals. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki. 1967;7(4):784–802. [Google Scholar]
  • 26. Havumaki J, Eisenberg JN, Mattison CP, Lopman BA, Ortega-Sanchez IR, Hall AJ, et al. Immunologic and epidemiologic drivers of norovirus transmission in daycare and school outbreaks. Epidemiology. 2021;32(3):351–359. doi: 10.1097/EDE.0000000000001322 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Ryckman T, Luby S, Owens DK, Bendavid E, Goldhaber-Fiebert JD. Methods for nodel calibration under high uncertainty: modeling cholera in Bangladesh. Medical Decision Making. 2020;40(5):693–709. doi: 10.1177/0272989X20938683 [DOI] [PubMed] [Google Scholar]
  • 28. Gambhir M, Bockarie M, Tisch D, Kazura J, Remais J, Spear R, et al. Geographic and ecologic heterogeneity in elimination thresholds for the major vector-borne helminthic disease, lymphatic filariasis. BMC Biology. 2010;8(1):22. doi: 10.1186/1741-7007-8-22 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Vynnycky E, White R. An introduction to infectious disease modelling. OUP oxford; 2010. [Google Scholar]
  • 30. Dietz K. The estimation of the basic reproduction number for infectious diseases. Statistical methods in medical research. 1993;2(1):23–41. doi: 10.1177/096228029300200103 [DOI] [PubMed] [Google Scholar]
  • 31. Ke R, Romero-Severson E, Sanche S, Hengartner N. Estimating the reproductive number R0 of SARS-CoV-2 in the United States and eight European countries and implications for vaccination. Journal of theoretical biology. 2021;517:110621. doi: 10.1016/j.jtbi.2021.110621 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Enger KS, Nelson KL, Clasen T, Rose JB, Eisenberg JNS. Linking quantitative microbial risk assessment and epidemiological data: Informing safe drinking water trials in developing countries. Environmental Science and Technology. 2012;46(9):5160–5167. doi: 10.1021/es204381e [DOI] [PubMed] [Google Scholar]
  • 33. Eisenberg JNS, Scott JC, Porco T. Integrating disease control strategies: Balancing water sanitation and hygiene interventions to reduce diarrheal disease burden. American Journal of Public Health. 2007;97(5):846–852. doi: 10.2105/AJPH.2006.086207 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Pickering AJ, Null C, Winch PJ, Mangwadu G, Arnold BF, Prendergast AJ, et al. The WASH Benefits and SHINE trials: interpretation of WASH intervention effects on linear growth and diarrhoea. The Lancet Global Health. 2019;7(8):e1139–e1146. doi: 10.1016/S2214-109X(19)30268-2 [DOI] [PubMed] [Google Scholar]
  • 35. Goddard FG, Ban R, Barr DB, Brown J, Cannon J, Colford JM Jr, et al. Measuring environmental exposure to enteric pathogens in low-income settings: review and recommendations of an interdisciplinary working group. Environmental science & technology. 2020;54(19):11673–11691. doi: 10.1021/acs.est.0c02421 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Fuller JA, Eisenberg JN. Herd protection from drinking water, sanitation, and hygiene interventions. The American journal of tropical medicine and hygiene. 2016;95(5):1201. doi: 10.4269/ajtmh.15-0677 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Garn JV, Boisson S, Willis R, Bakhtiari A, Al-Khatib T, Amer K, et al. Sanitation and water supply coverage thresholds associated with active trachoma: modeling cross-sectional data from 13 countries. PLoS neglected tropical diseases. 2018;12(1):e0006110. doi: 10.1371/journal.pntd.0006110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. George CM, Cirhuza LB, Birindwa A, Williams C, Beck S, Julian T, et al. Child hand contamination is associated with subsequent pediatric diarrhea in rural Democratic Republic of the Congo (REDUCE Program). Tropical Medicine & International Health. 2021;26(1):102–110. doi: 10.1111/tmi.13510 [DOI] [PubMed] [Google Scholar]
  • 39. Kwong LH, Ercumen A, Pickering AJ, Arsenault JE, Islam M, Parvez SM, et al. Ingestion of fecal Bacteria along multiple pathways by young children in rural Bangladesh participating in a Cluster-Randomized trial of water, sanitation, and hygiene interventions (WASH benefits). Environmental science & technology. 2020;54(21):13828–13838. doi: 10.1021/acs.est.0c02606 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Gwimbi P, George M, Ramphalile M. Bacterial contamination of drinking water sources in rural villages of Mohale Basin, Lesotho: exposures through neighbourhood sanitation and hygiene practices. Environmental health and preventive medicine. 2019;24(1):1–7. doi: 10.1186/s12199-019-0790-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Navab-Daneshmand T, Friedrich MN, Gächter M, Montealegre MC, Mlambo LS, Nhiwatiwa T, et al. Escherichia coli contamination across multiple environmental compartments (soil, hands, drinking water, and handwashing water) in urban Harare: correlations and risk factors. The American journal of tropical medicine and hygiene. 2018;98(3):803. doi: 10.4269/ajtmh.17-0521 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.World Health Organization. Guidelines for drinking-water quality: first addendum to the fourth edition. Geneva; 2017. [PubMed]
  • 43. Schmidt WP, Arnold BF, Boisson S, Genser B, Luby SP, Barreto ML, et al. Epidemiological methods in diarrhoea studies-An update. International Journal of Epidemiology. 2011;40(6):1678–1692. doi: 10.1093/ije/dyr152 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Appendix. Supporting information.

The supporting information includes the generic SISE–RCT model equations, the WASH Benefits Bangladesh SISE–RCT model equations, and supplemental results, including distributions of likelihoods, timepoint- and arm-specific relative basic reproduction numbers, and fraction of the population enrolled in the study. Fig A. Distribution of negative log-likelihood fits to the data for parameter samples in the prior and posterior sample sets. Fig B. Prevalence of self-reported diarrhea (7-day recall) in WASH Benefits comparing the baseline (red) to the combined midline (orange) and endline (yellow) surveys, as well as posterior distributions of simulated prevalence (violin plots). Fig C. Distribution of timepoint- and arm-specific relative basic reproduction numbers (R0). The dotted line corresponds to 1.00, or no difference from the control arm at baseline. The solid lines give the mean values of the distributions. Fig D. Posterior (grey) and prior (white) distributions for the estimated fraction of the population enrolled in the study.

(PDF)

Data Availability Statement

The WASH Benefits Bangladesh data is publicly available at https://osf.io/tprw2/. The data and code underlying the results of this paper are available on a GitHub repository at https://github.com/afbrouwer/WASH_RCT_transmission_modeling.


Articles from PLOS Computational Biology are provided here courtesy of PLOS

RESOURCES