. 2023 Jul 29;8(1):e10382. doi: 10.1002/lrh2.10382

TABLE 3.

Comprehensive list of included studies in predictive modeling for diarrheal disease.

Authors	Year of publication	Time period of the study	Country	Aspect of diarrheal disease	Technique used for predictive modeling	Data used (studied variables)	Performance of model
Pascual et al.	2008	1966–2005	Bangladesh	Cholera incidence/outbreaks	TSIR or TSIRS model	Laboratory‐confirmed cholera infection data, year and month of infection, climate data	The authors observed that lack of extreme events between 2001 and 2005 would have been anticipated with 75% confidence half a year ahead with a model fitted to data up to 2000.
Pasetto et al.	2017	2010	Haiti	Cholera incidence/outbreaks	Data assimilation: ensemble Kalfman filter	Laboratory‐confirmed cholera infection data, year and month of infection, rainfall data	The authors showed that the assimilation procedure with the sequential update of the parameters outperformed calibration schemes based on Markov chain Monte Carlo. Moreover, in a forecasting mode, the model predicted the spatial incidence of cholera at least 1 month ahead.
Bertuzzo et al.	2016	2010–2017	Haiti	Cholera incidence/outbreaks	Individual‐based spatially‐explicit stochastic model	Epidemiological dynamics and health‐care practice data	The model captured the timing and the magnitude of the peaks correctly (Nash–Sutcliffe index = 0.79). The authors showed that the probability that the epidemic would go extinct before the end of 2016 was of the order of 1%.
Jutla et al.	2015	2002–2010	Bangladesh	Cholera incidence/outbreaks	Logistic regression	River discharge data, terrestrial water storage (TWS) data, cholera prevalence data	The authors observed that TWS representing had an asymmetrical, strong association with cholera prevalence in the spring (τ = −0.53; P < 0.001) and autumn (τ = 0.45; P < 0.001) up to 6 months in advance.
Daisy et al.	2020	2000–2013	Bangladesh	Cholera incidence/outbreaks	Seasonal‐auto‐regressive‐integrated‐moving‐average (SARIMA) model	Cholera incidence and climatic variables	Root Mean Square Error (RMSE) = 14.7; mean absolute error (MAE) = 11.
Bengtsson et al.	2015	2010	Haiti	Cholera incidence/outbreaks	Gravity models	Case data, mobility data	Area under the Curve (AUC) = 0.79 for mobile phone‐based model
Matsuda et al.	2008	1983–2002	Bangladesh	Cholera incidence/outbreaks	Auto‐regression model	Cholera patient data, climate data	The authors reported a Pearson's correlation coefficient of 0.95 between the monthly number of patients predicted by the model and the actual monthly number of patients.
Koepke et al.	2016	2000–2007; 2010–2013	Bangladesh	Cholera incidence/outbreaks	SIRS model	Cholera case data, environmental variables	The authors showed that their model successfully predicted an increase in the number of infected individuals in the population weeks before the observed number of cholera cases increased.
Jutla et al.	2013	1998–2010	Bangladesh	Cholera incidence/outbreaks	Multiple regression models	Cholera incidence, sewage discharge data, satellite environmental determinants	Accuracy = 75%
Levine et al.	2015	2014	Bangladesh	Dehydration	Logistic regression/recursive partitioning model	Historical, demographic, clinical, and nutritional data	The authors reported an AUC of 0.79 (95% confidence interval [CI] = 0.74–0.84) for severe dehydration and 0.78 (95% CI = 0.74–0.81) for some (any) dehydration for the new DHAKA Dehydration Score. Additionally, their score had a 90% agreement between independent raters, with a Cohen's Kappa of 0.75 (95% CI = 0.66–0.85) among children with a repeat clinical exam.
Levine et al.	2013	2010–2012	Rwanda	Dehydration	Logistic regression	Demographic and clinical data	The authors reported AUCs of 0.72 (95% CI = 0.60–0.85), 0.73 (95% CI = 0.62–0.84), and 0.80 (95% CI = 0.71–0.89) for the WHO severe dehydration scale, CDC scale, and Clinical Dehydration Scale, respectively, in the full cohort. They also showed that only the Clinical Dehydration Scale was a significant predictor of severe disease when used in infants, with an AUC of 0.77 (95% CI = 0.61–0.93).
Zodpey et al.	1999	1996–1997	India	Dehydration	Logistic regression	Demographic and clinical data	The authors reported sensitivity, specificity, positive predictive value, Cohen's kappa, and overall predictive accuracy of 0.81, 0.81, 0.81, 0.61, and 0.86, respectively.
Alexander and Blackburn	2013	2006–2009	Botswana	Determinants of diarrheal disease burden	Cluster analysis/classification and regression trees	Hospital surveillance data
Green et al.	2009	200–2007	Global	Determinants of diarrheal disease burden	Classification and Regression Trees (CART)	WASH, government spending, literacy levels	Mean squared prediction error (MSE) of 0.225
Fang et al.	2020	2012–2016	China	Diarrhea incidence	Random Forest, autoregressive integrated moving average (ARIMA/X)	Morbidity and meteorological data	20% mean absolute percentage error (MAPE) with actual values; 30% MAPE between ARIMAX and ARIMA Model
Pangestu et al.	2020	2010–2019	Indonesia	Diarrhea incidence	Seasonal‐auto‐regressive‐integrated‐moving‐average (SARIMA/X)	Burden of disease estimates and climate data	Accuracy = 78.6%
Wang et al.	2020	2012–2016	China	Diarrhea Incidence	Parsimony Model (PM)/Multiple Linear Regression/Random Forest Regression/Support Vector Regression/Gradient Boosting Regression/Extreme Gradient Boosting Regression/Convolutional Neural Network/Neural Network Regression	Historical outpatient visit counts, meteorological factors (MF) and Baidu search indices (BSI)	The authors observed that the PM model obtained the best performance in terms of three metrics benefiting from MF and BSI data.
Medina et al.	2007	1996–2004	Mali	Diarrhea incidence/seasonality	Multiplicative Holt‐Winters method	Clinical data and climate data	MAPE circa 25%.
Heaney et al.	2020	2007–2017	Botswana	Diarrheal incidence/outbreaks	Compartmental susceptible‐infected‐recovered‐susceptible (SIRS) model	Hospital surveillance data	The authors reported that the average RMSE and correlation between the observations and simulations across all wet season outbreaks was 0.79 and 0.99, respectively. Similarly, they reported an average RMSE and correlation across dry season outbreaks as 1.33 and 0.99, respectively.
Maniruzzaman et al.	2020	2014	Bangladesh	Diarrheal infection	naïve Bayes/linear discriminant analysis/quadratic discriminant analysis/support vector machine	Demographic health survey	Support Vector Machine (SVM) with radial basis kernel yielded 65.61% accuracy, 66.27% sensitivity, and 52.28% specificity.
Abubakar and Olatunji	2019	2013	Nigeria	Diarrheal infection	Artificial Neural Network	Demographic and health survey data	High accuracy of 95.78 and 95.63% during training and testing phases
Brander et al.	2019	2007–2011	The Gambia, Mali, Mozambique, Kenya, Pakistan, India, Bangladesh	Malnutrition	Linear regression	Clinical, historical, anthropometric	AUC of 0.67 (95% CI = 0.64–0.69)
Suzuki et al.	2016	2006–2015	Japan	Norovirus strain dynamics	Fitness models	Sequence data, year and month of isolation	The authors showed that their model predicted GII.3 and GII.4 would contract, whereas GII.17 would expand and predominate in the 2015–2016 season.
Garbern et al.	2021		Mali and Bangladesh	Pathogen detection/clinical profiles/viral etiology	Random Forest/logistic regression	Clinical, historical, anthropometric and microbiologic data	AUC of 0.754 (0.665–0.843)
Brintz et al.	2020	2007–2011	The Gambia, Mali, Mozambique, Kenya, Pakistan, India, Bangladesh	Pathogen detection/clinical profiles/viral etiology/Bacterial etiology	Random Forest/logistic regression	Clinical, historical, anthropometric and microbiologic data	AUC = 0.825; specificity = 0.85; sensitivity = 0.59, negative predictive value (NPV) = 0.82; positive predictive value (PPV) = 0.64
Ayers et al.	2016	2007–2011	Kenya	Pathogen detection/clinical profiles/Rotavirus	Classification trees	Clinical, historical, anthropometric and microbiologic data	AUC = 0.816 on training: AUC = 0.6125 on test data
Pitzer et al.	2011	1985–2009*	Italy, Hungary, Spain, Japan, United States, Australia	Rotavirus strain dynamics	Fourier analysis	Laboratory‐confirmed rotavirus infection data, sequence data, vaccination	The authors showed that their model explained the coexistence and cyclical pattern in the distribution of genotypes observed in most developed countries: predominant rotavirus strains cycle with periods (T) ranging from 3 to 11 years
Chao et al.	2019	2007–2011	The Gambia, Mali, Mozambique, Kenya, Pakistan, India, Bangladesh	Seasonality	Principal‐Component Analysis/K‐means clustering	Microbiological and weather data	The authors observed that rotavirus was most prevalent during the drier “winter” months and out of phase with bacterial pathogens, which peaked during hotter and rainier times of year corresponding to “monsoon,” “rainy,” or “summer” seasons.
Adamker et al.	2018	2002–2015	Israel	Shigella species/Outcomes	Logistic Regression (LR), Neural Network (NN), and Support Vector Machines (SVM)	National Shigella data as collated by the Ministry of Health (MoH) Division of Epidemiology	Accuracy of 93.2% (Shigella species) and 94.9% (hospitalization)
Freiesleben de Blasio et al.	2014		Kazakhstan	Vaccine cost‐effectiveness	Dynamic model		The authors reported that a vaccination program with 90% coverage would prevent ≈880 rotavirus deaths and save an average of 54,784 life‐years for children <5 years of age. They also showed that Indirect protection accounted for 40% and 60% reduction in severe and mild rotavirus gastroenteritis, respectively
Bar‐Lev et al.	2021	2014–2018	Israel	Vaccine hesitancy	Logistic regression, Random Forest and Neural Networks	Demographic, clinical, socio‐economic data, vaccination, social media traffic	The authors observed that the performance of models for Rotavirus, Hepatitis A and Hepatitis B, were close to random (accuracy <0.63 and F1 < 0.65). Additionally, they reported a negative association between on‐line discussions and vaccination.
de Blasio et al.	2010	2005–2008	Kyrgyzstan	Vaccine impact	Deterministic age‐structured dynamic model	Key features of rotavirus epidemiology, rotavirus associated events (death, hospitalization, outpatient visits), vaccination	The authors reported that a routine rotavirus vaccination program at 95% coverage and 54% effectiveness against severe infection was estimated to lead to a 56% reduction in rotavirus‐associated deaths and a 50% reduction in hospital admissions, while outpatient visits and homecare episodes would decrease by 52% compared to baseline levels after 5 years of intervention.
Atchison et al.	2010	1998–2007	England and Wales	Vaccine impact/seasonality	Deterministic age‐structured dynamic model	Key features of rotavirus epidemiology, vaccination	The authors showed that their model reproduced the strong seasonal pattern and age distribution of rotavirus disease observed in England and Wales. Furthermore, they observed that their model predicted that vaccination would provide both direct and indirect protection within the population resulting in 61% reduction of rotavirus disease incidence.
Park et al.	2017	2009–2012	Niger	Vaccine Impact/transmission dynamics	Susceptible‐infected‐recovered (SIR)‐like compartmental models/Ensemble models	Clinic admissions data and healthcare seeking data	The authors reported that their model predicted the current burden of severe rotavirus disease to be 2.6%–3.7% of the population each year and that a two‐dose vaccine schedule achieving 70% coverage could reduce burden by 39%–42%.
Pitzer et al.	2012	1999–2009	England and Wales	Vaccine impact/transmission dynamics	SIS‐ (susceptible‐infectious‐susceptible)/SIRS‐like (susceptible‐infectious‐recovered‐susceptible) compartmental models	Laboratory‐confirmed rotavirus infection data	The authors showed that their models predicted that during the initial year after vaccine introduction, incidence of severe Rotavirus gastroenteritis (RVGE) would be reduced 1.8–2.9 times more than expected from the direct effects of the vaccine alone (28%–50% at 90% coverage), but over a 5‐year period following vaccine introduction severe RVGE would be reduced only by 1.1–1.7 times more than expected from the direct effects (54%–90% at 90% coverage). They also reported that projections for the long‐term reduction of severe RVGE ranged from a 55% reduction at full coverage to elimination with at least 80% coverage.
Effelterre et al.	2009		France, Germany, Italy, Spain and the United Kingdom	Vaccine impact/transmission dynamics	Dynamic, deterministic compartmental model	Burden of disease estimates: hospitalizations, emergency‐room visits and primary‐care visits	The authors reported that with vaccination coverage rates of 70%, 90%, and 95% their model predicted that, in addition to the direct effect of vaccination, herd protection induced a reduction in RV‐related gastroenteritis (GE) incidence of 25%, 22%, and 20%, respectively, for RV‐GE of any severity, and of 19%, 15%, and 13%, respectively, for moderate‐to‐severe RV‐GE, 5 years after implementation of a vaccination program.
Asare et al.	2020	2007–2015	Ghana	Vaccine impact/transmission dynamics	SIRS‐like model	Epidemiological data and vaccination data	The authors showed that their model captured the spatio‐temporal variations in rotavirus incidence across the three sites and showed good agreement with the age distribution of observed cases
Olson et al.	2020	2002–2016	United States	Vaccine impact/transmission dynamics	Periodic regression models/ age‐structured compartmental mode	Case data, Emergency department (ED) visits data, hospitalization data, vaccination data	The authors reported that their published mechanistic model qualitatively predicted patterns more than 2 years in advance.