Skip to main content
. 2023 Jul 29;8(1):e10382. doi: 10.1002/lrh2.10382

TABLE 3.

Comprehensive list of included studies in predictive modeling for diarrheal disease.

Authors Year of publication Time period of the study Country Aspect of diarrheal disease Technique used for predictive modeling Data used (studied variables) Performance of model
Pascual et al. 2008 1966–2005 Bangladesh Cholera incidence/outbreaks TSIR or TSIRS model Laboratory‐confirmed cholera infection data, year and month of infection, climate data The authors observed that lack of extreme events between 2001 and 2005 would have been anticipated with 75% confidence half a year ahead with a model fitted to data up to 2000.
Pasetto et al. 2017 2010 Haiti Cholera incidence/outbreaks Data assimilation: ensemble Kalfman filter Laboratory‐confirmed cholera infection data, year and month of infection, rainfall data The authors showed that the assimilation procedure with the sequential update of the parameters outperformed calibration schemes based on Markov chain Monte Carlo. Moreover, in a forecasting mode, the model predicted the spatial incidence of cholera at least 1 month ahead.
Bertuzzo et al. 2016 2010–2017 Haiti Cholera incidence/outbreaks Individual‐based spatially‐explicit stochastic model Epidemiological dynamics and health‐care practice data The model captured the timing and the magnitude of the peaks correctly (Nash–Sutcliffe index = 0.79). The authors showed that the probability that the epidemic would go extinct before the end of 2016 was of the order of 1%.
Jutla et al. 2015 2002–2010 Bangladesh Cholera incidence/outbreaks Logistic regression River discharge data, terrestrial water storage (TWS) data, cholera prevalence data The authors observed that TWS representing had an asymmetrical, strong association with cholera prevalence in the spring (τ = −0.53; P < 0.001) and autumn (τ = 0.45; P < 0.001) up to 6 months in advance.
Daisy et al. 2020 2000–2013 Bangladesh Cholera incidence/outbreaks Seasonal‐auto‐regressive‐integrated‐moving‐average (SARIMA) model Cholera incidence and climatic variables Root Mean Square Error (RMSE) = 14.7; mean absolute error (MAE) = 11.
Bengtsson et al. 2015 2010 Haiti Cholera incidence/outbreaks Gravity models Case data, mobility data Area under the Curve (AUC) = 0.79 for mobile phone‐based model
Matsuda et al. 2008 1983–2002 Bangladesh Cholera incidence/outbreaks Auto‐regression model Cholera patient data, climate data The authors reported a Pearson's correlation coefficient of 0.95 between the monthly number of patients predicted by the model and the actual monthly number of patients.
Koepke et al. 2016 2000–2007; 2010–2013 Bangladesh Cholera incidence/outbreaks SIRS model Cholera case data, environmental variables The authors showed that their model successfully predicted an increase in the number of infected individuals in the population weeks before the observed number of cholera cases increased.
Jutla et al. 2013 1998–2010 Bangladesh Cholera incidence/outbreaks Multiple regression models Cholera incidence, sewage discharge data, satellite environmental determinants Accuracy = 75%
Levine et al. 2015 2014 Bangladesh Dehydration Logistic regression/recursive partitioning model Historical, demographic, clinical, and nutritional data The authors reported an AUC of 0.79 (95% confidence interval [CI]  =  0.74–0.84) for severe dehydration and 0.78 (95% CI  =  0.74–0.81) for some (any) dehydration for the new DHAKA Dehydration Score. Additionally, their score had a 90% agreement between independent raters, with a Cohen's Kappa of 0.75 (95% CI  =  0.66–0.85) among children with a repeat clinical exam.
Levine et al. 2013 2010–2012 Rwanda Dehydration Logistic regression Demographic and clinical data The authors reported AUCs of 0.72 (95% CI =  0.60–0.85), 0.73 (95% CI =  0.62–0.84), and 0.80 (95% CI =  0.71–0.89) for the WHO severe dehydration scale, CDC scale, and Clinical Dehydration Scale, respectively, in the full cohort. They also showed that only the Clinical Dehydration Scale was a significant predictor of severe disease when used in infants, with an AUC of 0.77 (95% CI =  0.61–0.93).
Zodpey et al. 1999 1996–1997 India Dehydration Logistic regression Demographic and clinical data The authors reported sensitivity, specificity, positive predictive value, Cohen's kappa, and overall predictive accuracy of 0.81, 0.81, 0.81, 0.61, and 0.86, respectively.
Alexander and Blackburn 2013 2006–2009 Botswana Determinants of diarrheal disease burden Cluster analysis/classification and regression trees Hospital surveillance data
Green et al. 2009 200–2007 Global Determinants of diarrheal disease burden Classification and Regression Trees (CART) WASH, government spending, literacy levels Mean squared prediction error (MSE) of 0.225
Fang et al. 2020 2012–2016 China Diarrhea incidence Random Forest, autoregressive integrated moving average (ARIMA/X) Morbidity and meteorological data 20% mean absolute percentage error (MAPE) with actual values; 30% MAPE between ARIMAX and ARIMA Model
Pangestu et al. 2020 2010–2019 Indonesia Diarrhea incidence Seasonal‐auto‐regressive‐integrated‐moving‐average (SARIMA/X) Burden of disease estimates and climate data Accuracy = 78.6%
Wang et al. 2020 2012–2016 China Diarrhea Incidence Parsimony Model (PM)/Multiple Linear Regression/Random Forest Regression/Support Vector Regression/Gradient Boosting Regression/Extreme Gradient Boosting Regression/Convolutional Neural Network/Neural Network Regression Historical outpatient visit counts, meteorological factors (MF) and Baidu search indices (BSI) The authors observed that the PM model obtained the best performance in terms of three metrics benefiting from MF and BSI data.
Medina et al. 2007 1996–2004 Mali Diarrhea incidence/seasonality Multiplicative Holt‐Winters method Clinical data and climate data MAPE circa 25%.
Heaney et al. 2020 2007–2017 Botswana Diarrheal incidence/outbreaks Compartmental susceptible‐infected‐recovered‐susceptible (SIRS) model Hospital surveillance data The authors reported that the average RMSE and correlation between the observations and simulations across all wet season outbreaks was 0.79 and 0.99, respectively. Similarly, they reported an average RMSE and correlation across dry season outbreaks as 1.33 and 0.99, respectively.
Maniruzzaman et al. 2020 2014 Bangladesh Diarrheal infection naïve Bayes/linear discriminant analysis/quadratic discriminant analysis/support vector machine Demographic health survey Support Vector Machine (SVM) with radial basis kernel yielded 65.61% accuracy, 66.27% sensitivity, and 52.28% specificity.
Abubakar and Olatunji 2019 2013 Nigeria Diarrheal infection Artificial Neural Network Demographic and health survey data High accuracy of 95.78 and 95.63% during training and testing phases
Brander et al. 2019 2007–2011 The Gambia, Mali, Mozambique, Kenya, Pakistan, India, Bangladesh Malnutrition Linear regression Clinical, historical, anthropometric AUC of 0.67 (95% CI = 0.64–0.69)
Suzuki et al. 2016 2006–2015 Japan Norovirus strain dynamics Fitness models Sequence data, year and month of isolation The authors showed that their model predicted GII.3 and GII.4 would contract, whereas GII.17 would expand and predominate in the 2015–2016 season.
Garbern et al. 2021 Mali and Bangladesh Pathogen detection/clinical profiles/viral etiology Random Forest/logistic regression Clinical, historical, anthropometric and microbiologic data AUC of 0.754 (0.665–0.843)
Brintz et al. 2020 2007–2011 The Gambia, Mali, Mozambique, Kenya, Pakistan, India, Bangladesh Pathogen detection/clinical profiles/viral etiology/Bacterial etiology Random Forest/logistic regression Clinical, historical, anthropometric and microbiologic data AUC = 0.825; specificity = 0.85; sensitivity = 0.59, negative predictive value (NPV) = 0.82; positive predictive value (PPV) = 0.64
Ayers et al. 2016 2007–2011 Kenya Pathogen detection/clinical profiles/Rotavirus Classification trees Clinical, historical, anthropometric and microbiologic data AUC = 0.816 on training: AUC = 0.6125 on test data
Pitzer et al. 2011 1985–2009* Italy, Hungary, Spain, Japan, United States, Australia Rotavirus strain dynamics Fourier analysis Laboratory‐confirmed rotavirus infection data, sequence data, vaccination The authors showed that their model explained the coexistence and cyclical pattern in the distribution of genotypes observed in most developed countries: predominant rotavirus strains cycle with periods (T) ranging from 3 to 11 years
Chao et al. 2019 2007–2011 The Gambia, Mali, Mozambique, Kenya, Pakistan, India, Bangladesh Seasonality Principal‐Component Analysis/K‐means clustering Microbiological and weather data The authors observed that rotavirus was most prevalent during the drier “winter” months and out of phase with bacterial pathogens, which peaked during hotter and rainier times of year corresponding to “monsoon,” “rainy,” or “summer” seasons.
Adamker et al. 2018 2002–2015 Israel Shigella species/Outcomes Logistic Regression (LR), Neural Network (NN), and Support Vector Machines (SVM) National Shigella data as collated by the Ministry of Health (MoH) Division of Epidemiology Accuracy of 93.2% (Shigella species) and 94.9% (hospitalization)
Freiesleben de Blasio et al. 2014 Kazakhstan Vaccine cost‐effectiveness Dynamic model The authors reported that a vaccination program with 90% coverage would prevent ≈880 rotavirus deaths and save an average of 54,784 life‐years for children <5 years of age. They also showed that Indirect protection accounted for 40% and 60% reduction in severe and mild rotavirus gastroenteritis, respectively
Bar‐Lev et al. 2021 2014–2018 Israel Vaccine hesitancy Logistic regression, Random Forest and Neural Networks Demographic, clinical, socio‐economic data, vaccination, social media traffic The authors observed that the performance of models for Rotavirus, Hepatitis A and Hepatitis B, were close to random (accuracy <0.63 and F1 < 0.65). Additionally, they reported a negative association between on‐line discussions and vaccination.
de Blasio et al. 2010 2005–2008 Kyrgyzstan Vaccine impact Deterministic age‐structured dynamic model Key features of rotavirus epidemiology, rotavirus associated events (death, hospitalization, outpatient visits), vaccination The authors reported that a routine rotavirus vaccination program at 95% coverage and 54% effectiveness against severe infection was estimated to lead to a 56% reduction in rotavirus‐associated deaths and a 50% reduction in hospital admissions, while outpatient visits and homecare episodes would decrease by 52% compared to baseline levels after 5 years of intervention.
Atchison et al. 2010 1998–2007 England and Wales Vaccine impact/seasonality Deterministic age‐structured dynamic model Key features of rotavirus epidemiology, vaccination The authors showed that their model reproduced the strong seasonal pattern and age distribution of rotavirus disease observed in England and Wales. Furthermore, they observed that their model predicted that vaccination would provide both direct and indirect protection within the population resulting in 61% reduction of rotavirus disease incidence.
Park et al. 2017 2009–2012 Niger Vaccine Impact/transmission dynamics Susceptible‐infected‐recovered (SIR)‐like compartmental models/Ensemble models Clinic admissions data and healthcare seeking data The authors reported that their model predicted the current burden of severe rotavirus disease to be 2.6%–3.7% of the population each year and that a two‐dose vaccine schedule achieving 70% coverage could reduce burden by 39%–42%.
Pitzer et al. 2012 1999–2009 England and Wales Vaccine impact/transmission dynamics SIS‐ (susceptible‐infectious‐susceptible)/SIRS‐like (susceptible‐infectious‐recovered‐susceptible) compartmental models Laboratory‐confirmed rotavirus infection data The authors showed that their models predicted that during the initial year after vaccine introduction, incidence of severe Rotavirus gastroenteritis (RVGE) would be reduced 1.8–2.9 times more than expected from the direct effects of the vaccine alone (28%–50% at 90% coverage), but over a 5‐year period following vaccine introduction severe RVGE would be reduced only by 1.1–1.7 times more than expected from the direct effects (54%–90% at 90% coverage). They also reported that projections for the long‐term reduction of severe RVGE ranged from a 55% reduction at full coverage to elimination with at least 80% coverage.
Effelterre et al. 2009 France, Germany, Italy, Spain and the United Kingdom Vaccine impact/transmission dynamics Dynamic, deterministic compartmental model Burden of disease estimates: hospitalizations, emergency‐room visits and primary‐care visits The authors reported that with vaccination coverage rates of 70%, 90%, and 95% their model predicted that, in addition to the direct effect of vaccination, herd protection induced a reduction in RV‐related gastroenteritis (GE) incidence of 25%, 22%, and 20%, respectively, for RV‐GE of any severity, and of 19%, 15%, and 13%, respectively, for moderate‐to‐severe RV‐GE, 5 years after implementation of a vaccination program.
Asare et al. 2020 2007–2015 Ghana Vaccine impact/transmission dynamics SIRS‐like model Epidemiological data and vaccination data The authors showed that their model captured the spatio‐temporal variations in rotavirus incidence across the three sites and showed good agreement with the age distribution of observed cases
Olson et al. 2020 2002–2016 United States Vaccine impact/transmission dynamics Periodic regression models/ age‐structured compartmental mode Case data, Emergency department (ED) visits data, hospitalization data, vaccination data The authors reported that their published mechanistic model qualitatively predicted patterns more than 2 years in advance.