Skip to main content
PLOS One logoLink to PLOS One
. 2025 Apr 22;20(4):e0320921. doi: 10.1371/journal.pone.0320921

Ecological niche modeling for surveillance of foot-and-mouth disease in South Asia

Umanga Gunasekera 1,*, Moh A Alkhamis 2, Sumathy Puvanendiran 3, Moumita Das 1, Pradeep L Kumarawadu 4, Munawar Sultana 5, M Anwar Hossain 5, Jonathan Arzt 6, Andres Perez 1
Editor: Nussieba A Osman7
PMCID: PMC12013921  PMID: 40261938

Abstract

Control of transboundary diseases at a regional level is commended over the country level due to its inherent complexities. World Organization for Animal Health (WOAH) has established different zones worldwide to control such contagious diseases as foot-and-mouth disease (FMD). Controlling FMD is difficult because of the complicated connection between FMD risk factors, and the deficits of surveillance activities in countries. We used an ecological niche model (ENM) that accounts for the under-reporting of outbreaks to determine FMD risk and risk factors in South Asian countries India, Bangladesh, and Sri Lanka. Centered on known outbreak information, we predicted high-risk areas using similar regional ecological features. Using a multi-algorithm machine-learning ensemble that includes random forest, support vector, and gradient boosting, 15 predictive variables (i.e., livestock densities, land cover, and climate), 660 FMD outbreaks from 13 years (2009–2022) in the region including the outbreaks from India, Bangladesh, and Sri Lanka we identified that Sri Lanka and Bangladesh appeared to have low to medium outbreak risk in the range of 0.04 to 0.55. India was used to fit the model. The machine learning models demonstrated high predictive performance (accuracy >0.87) through cross-validation. Production systems, isothermality, cattle density (per Km2), and mean diurnal range was identified as the most important predictors of FMD outbreaks. These models help to determine FMD low-risk areas to minimize FMD surveillance activities and high-risk areas to focus on performing additional confirmatory testing, and improve surveillance in a regional context.

Introduction

Foot-and-mouth Disease (FMD) is a transboundary animal disease caused by an RNA virus in the picornavirdae family. FMD is transmitted between animals via contact, animal products, and contaminated fomites including humans [1]. The disease is a threat to global food security due to trade restrictions for affected countries. In the South Asia region, FMD affects livestock farmers that only raise a few animals as their livelihood. In the South Asia transboundary animal diseases coordination meeting in 2023, both Food and Agriculture (FAO) and the World Organization for Animal Health (WOAH) have indicated the importance of controlling FMD at regional levels such as in South Asia, South America, Africa, the Middle East, or South East Asia with collective actions and initiating discussions among countries. The recommendations are to share information related to strategic control plans, border movement monitoring, value chain analysis among the countries and have a regional vaccine calendar to synchronize vaccination. Except India which is in stage 4 of Progressive Control Pathway (PCP), other countries in South Asia are in the PCP stages 1, and 2, or progressing towards stage 2 including Sri Lanka, and Bangladesh (WOAH Report). Activities in stage 1 of PCP include identifying the livestock-marketing network, importation of animals, animal products, and animal movement among other aspects. It is expected that the countries in the stage 1 demonstrate commitment to regional FMD control.

Predicting FMD risk based on reported outbreak numbers is unreliable due to drawbacks in passive surveillance activities such as underreporting and late reporting. Historically, this limitation has been compensated via different analytical methods such as standardized incidence ratios, spatial methods such as kriging, kernel density [2], and spatial Bayesian methods [3,4]. Compared to spatial methods such as kriging [5] and sat scan analysis, which uses only outbreak data location and time [6] to determine FMD risk areas, ecological niche models (ENM) can accommodate and explore many highly correlated and complex risk factors to predict the spatial risk of FMD robustly. ENM use variables, such as temperature and precipitation, along with reported incidences to detect species abundance and predict outbreak occurrence [7]. One of the most commonly used ENM is maximum entropy species distribution. However, this model requires several assumptions such as representative sampling, and considers presence-only data. For the algorithm component, supervised machine learning methods based on decision trees and kernel function are recommended over this method using both presence and absent data [8]. Interpretable multi algorithm machine learning models require fewer statistical assumptions, are less sensitive to highly correlated variables and, therefore, overfitting, and can explore none-linear complex relationships between variables [9]. Also, Machine learning (ML) methods have been widely used to predict veterinary infectious diseases [1014].

For FMD and other infectious diseases, there is no straight linear pattern that differentiate infected from the non-infected [15]. Machine learning approaches are capable of exploring these nonlinear interactions [16]. Different environmental risk factors such as the relative humidity, temperature are associated with FMD persistence in endemic countries [17]. To predict the FMD risk, location of outbreak occurrence data are combined with different environmental predictor’s spatial distribution [16]. For FMD, ML has been used in studies in Thailand [18], South Africa [19], and China [20].

We hypothesize that various environmental, epidemiological, and demographic factors could serve as a proxy to predict FMD risk. The objectives of this study were to a) use interpretable supervised machine learning algorithms based on decision trees and support vector methods to develop risk prediction models based on environmental data for FMD in South Asia using empirical outbreak data from India, Sri Lanka, and Bangladesh. b) To identify major risk factors shaping FMD risk and c) compare predictive performance among different models used. These results will help Bangladesh and Sri Lanka to move forward in their PCPs for FMD, ultimately contributing to the control of the disease in the region.

Methods

For this analysis, reported outbreak data were obtained from India, Bangladesh, and Sri Lanka from 2009 to 2022. An outbreak was defined as a group of epidemiologically linked cases (WOAH, Terrestrial code) during the considered period.

In Bangladesh, a systematic FMD outbreak surveillance system is not available. Therefore, it is possible that not all outbreaks are reported during the period. The outbreak data used in this study are from a field study conducted during 2012–2021 covering 32 different districts. Outbreak information are collected based on farmer’s notification for serology testing. A total of 481 samples were collected from different outbreak. Sample collection was affected by the 2020–2021 COVID 19 outbreak [21]. In this study, we used sample collection location confirmed by laboratory testing affiliated district as the outbreak location. We did not consider the temporal aspect of data. Compared to data coming from Sri Lanka and India, data from Bangladesh is not complete.

For Sri Lanka, officially reported outbreak data were available from the Department of Animal Health, Sri Lanka. The respective veterinarians from nearly 256 veterinary ranges report outbreaks as a part of passive surveillance activities. A district in Sri Lanka consist of multiple veterinary ranges. Outbreak reporting system is a paper based monthly report sent to the head office. Total 369 outbreaks were reported during the period of 2014–2022 from different areas of the country. A reported outbreak may have one to many infected animals in an identified farm location. If one or more outbreak was reported at a VS range, a point location of the VS range was considered as positive and recorded for this study. These outbreaks are clinically identified initially with later confirmatory serotype diagnosis.

In India, veterinary authorities conduct both active and passive surveillance for FMD mainly focusing on passive surveillance in nearly 65000 administrative levels. Confirmatory diagnosis is carried out for serotyping in 27 FMD network laboratories and 2 national laboratories. A reported outbreak may have one to many infected animals in an identified location. The reporting system is paper based and a monthly report is submitted to the Department of Animal Husbandry. The outbreak location was considered up to the district level (i.e., if there are one or more outbreaks reported at village level, for the purpose of this analysis, the district was considered positive and a point location of the district was recorded). FMD outbreak data for India (n = 429) were available at the district level from a previous study for the years 2009–2020 [3].

For FMD Diagnostic testing, the countries follow the guidelines of ‘Manual of Diagnostic Tests and Vaccines for Terrestrial Animals, 13th edition 2024”, WOAH terrestrial code. Number of outbreaks from each country each year, is shown in the S1 Fig. Following the spatial analysis ML algorithm [10], to lower the training error of the model, duplicate occurrences of the same geographical location of outbreaks were removed during the considered period. From a total of 829 outbreaks, the final data set comprised of 660 outbreaks from the three countries. While outbreak data is considered in this study, the model predict suitable places where FMD outbreaks are most likely to occur where it was not reported or underreported considering other features as well.

For the predictors, we used FMD risk factors such as climate data [22,23], production systems [2426] and livestock densities [23,27,28]. Because animal movement records are not available, the road network was incorporated as a proxy indicator of animal movement for trade and market [4,29]. Extensive farming practices are common in the region [30], and land cover includes land use (different types of forests, cropland, land that can be used for pastoralism, and wetlands) that is associated with extensive grazing. Therefore, the Normalized Difference Vegetation Index (NDVI) was considered. A summary of different predictors is shown in S1 Table.

Selected historical bioclimatic data are derived from monthly temperature and rainfall values representing annual trends and seasonality [31]. The dataset consists of 19 bioclimatic features related to temperature and precipitation. However, the mean temperature of the wettest quarter, the mean temperature of the driest quarter, the warmest quarter precipitation, and the coldest quarter precipitation were not included due to spatial artifacts. Data are average for the years 1970–2000 one for each month at 5-minute resolution.

Livestock densities of cattle, buffalo, goats, sheep, and pigs were obtained from the FAO, Gridded Livestock of the World database (GLW v4) database [32]. The latest data set available was from 2015. Global livestock distribution was expressed in the total number of animals per pixel (5 min of arc) at a km2 census unit. The data are stored in geographic coordinates of decimal degrees based on the World Geodetic System spheroid of 1984 (WGS84).

Normalized difference vegetation index indicates the density of vegetation using sensor data obtained via satellites. Copernicus land monitoring service includes NDVI indexes at a 10-day interval at a global scale at a spatial resolution of about 300m from 2014 to June 2020 at WGS 1984 latitude longitude projection. We downloaded 10 raster layers in tiff format representing years 2014–2020 and obtained a combined raster layer using ArcGIS with the same cell resolution and the coordinate system.

The Cropland tiff file was obtained in tiff format from the FAO Crop Land – Global Land Cover Share Database. The database is created and validated by harmonizing different land cover databases. The cropland cover projection is rated at an accuracy of 94.9%. The dataset is in the raster format Geotif at WGS 1984 scale, representing the percentage density at a resolution of 30 Arc seconds.

Livestock production systems are different across the world. The livestock production system data are obtained from the FAO Global Livestock Production System at a 30 Arc second spatial resolution. In this database, production systems are considered as agro-pastoral systems, crop-livestock, and others.

The road network was considered a proxy for animal movement to livestock markets and slaughterhouses. Shape files are obtained from the Global Roads Inventory Project (GRIP) global roads database [33]. Roads are projected until the year 2050 using the existing open road databases from year 2000–2015. The data are stored as shape file in geographic coordinates of decimal degrees based on the World Geodetic System spheroid of 1984 (WGS84). Roads are displayed at 5 arc minute resolution under The Global Roads Inventory Project (GRIP) categories. Shape files is converted to a raster file using the feature to raster function in the ArcGIS. Road raster will be aggregated and resample to obtain the road density in R.

Data processing

Raster package in R was used to convert different data containing files such as tif, and adf into one geodetic system; WGS84. Each variable was then cropped to the extent of South Asia. The relevant shape file was downloaded from the ‘Natural Earth’ (https://www.naturalearthdata.com/downloads/10m-cultural-vectors/). Because different variables have different spatial resolutions, all were aggregated and resampled to make it to the same spatial resolution of approximately 9 km2.

As a requirement of the machine-learning pipeline random absent data points were created to make presence–absence data [9]. A preliminary analysis was run to optimize the variables, the spatial resolution and to determine the case-control ratio. Random absent data points were created maintaining a 1:1 case control ratio. Here the absence points characterize the environment in the study region [34]. Absence points were merged with the actual outbreak location data into one data frame and spatial probabilities were created to determine FMD risk areas.

Collinear variables were removed where the largest mean absolute correlation was greater than 0.9 based on the correlation matrix. The Boruta package in R was used to select statistically significant important features [35]. This package uses a Random Forest classifier. Once shadow features are created to account for random fluctuations, the Z score is used to determine important features. All the considered variables that were not correlated at a threshold of 0.9 were identified as significant when the Boruta package was applied.

The whole data set was divided into 80% training and 20% testing data for 10-fold cross-validation and to train the machine-learning algorithm. Cross-validation prevents overfitting of the data. The caret package in R is used for data partitioning [36].

Model training and evaluation

For the selected features, we performed Random Forest (RF), Gradient Boosting (GB), and the Support Vector (SV) machine learning algorithms to create predictive models of FMD following Fountain Johns et al., 2019 using R package Caret [36].

Random Forrest (RF) and gradient boosting (GB) methods are based on decision trees. Decision trees provide classification and separate paths based on selected variables. The way decision trees are made is different for each method. The RF method is suitable when the data is sparse. In the RF method, variables from the bootstrap data (training data) are randomly allocated in decision trees. Trees are then randomly selected to test data (out-of-bag data) that was not used in creating trees [37,38]. The accuracy of the model is determined by test data. When GB makes decision trees, new trees are scaled, and made based on the errors of previous trees, and the size of the trees is restricted [39]. The support vector (SV) machine learning method uses a kernel function to classify data (outbreak vs no outbreak) at a higher dimension space based on a threshold value. The threshold accounts for the bias-variance tradeoff [40,41]. Ten-fold cross-validation was used to estimate model performance and compare different machine learning methods. Each model was run 10 times for 10 fold cross validation after the first run using training data.

Tenfold fold cross validation involve randomly dividing the data set in to k folds. The first set is considered as the validation set and remaining as training data to test the method. Here repeated 10 times on test data. K fold cross validation is expected to get a closer error to true error with repeated running. For the classification tree, we consider the number of misclassifications as the error. Performance was assessed for each run and the average was compared among the models. K fold cross validation reduce the bias variance tradeoff by including more variables and observations ((k − 1) n/k) in each run [42]. Confusion matrices were used for each model cross-validation average to select the best model using the Caret package in R. Accuracy, Receiver Operator Characteristic (ROC), Mathews Correlation Coefficient (MCC), sensitivity (Se), and the specificity (Sp) of each model was calculated.

Accuracy measures the proportion of correctly identified observations. The sensitivity of the model is important to determine correctly identified FMD positive areas as FMD positive and the specificity of the model depicts correctly identified FMD negative areas as negative. The ROC helped to determine the optimal threshold ratio between the true positives (sensitivity) and the false positives (1-specificity) of confusion matrices that resulted from cross-validation. ROCs were built for each model separately. The area under the curve of ROC determines how well the model discriminates between FMD negative and positive areas and to determine that there is no data imbalance between the outcomes of interest [43].

The Mathews Correlation Coefficient (MCC) is a correlation coefficient that accounts for both true false positives and true negatives as a balanced measure ranging from −1 to +1. A coefficient closer to 1 indicates the higher predictive ability of all four categories of the confusion matrix of each model. MCC is invariant to class changes and is considered a better measure compared to the F1 score of the confusion matrix [44].

Predicted FMD risk maps of India, Bangladesh and Sri Lanka were obtained for all three algorithms. RF predicted risk raster was exported to Arc GIS to obtain mean FMD risk values at higher administrative levels for Sri Lanka and Bangladesh. These values are presented as relative risks.

Model interpretation

The predictive performance was high and comparatively similar across all models. For all the models, we considered feature importance, feature dependence, and the overall interactions for model interpretation.

Different variables are considered important in different models. It is important to determine the amount each predictor contributes to the model’s accuracy. Model class reliance identifies a range of important variables across different well-performing models [45]. This is accomplished by changing each selected variable to understand the impact that change has on model performance. If it is significant, the variable is considered as important [46]. R package FeatureImp was used to create variable importance plots.

Centered Individual Conditional Expectation plots (cICE plots) calculated the individual effect of each variable on each response. These plots show individual effects and individual responses on each observation keeping all the other variables the same. cICE plots were created to show the top five variables of each model [47]. Partial dependency plots help to identify the relationships between the predictor’s values and model predictions. This is an average estimate of a predictor if all data points assume the same feature value (a global estimate) on the outcome variable [47].

Interactions among variables increase with the increased number of predictor variables. Once important features are recognized, feature interaction strength is measured by the Friedman H statistic [48]. The interaction plots show the marginal impact of a variable on the predicted outcome.

Results

Our analytical pipeline identified 15 of 24 variables as important predictors of FMD’s spatial risk. Identified variables are production systems, Normalized Vegetation Index (NDVI), cattle, sheep, buffalo and, goat density, road density, cropland, maximum temperature in warmest month, isothermality, mean diurnal range, annual precipitation, precipitation dry month, precipitation seasonality, and precipitation in the wet months.

Based on the cross-validation approach, both random forest and gradient boosting models performed similarly. Therefore, we selected the RF for the subsequent predictions and interpretations (Table 1).

Table 1. Summary results of cross-validation for different models.

Model ROC Accuracy (%) Specificity (%) Sensitivity (%) Mathews Correlation Coefficient
RF 94.45 ± 0.01 87.59 ± 0.36 88.28 ± 0.49 87.34 ± 0.53 0.75 ± 0.007
GB 93.21 ± 0.01 87.87 ± 0.31 88.91 ± 0.54 86.76 ± 0.43 0.75 ± 0.006
SV 89.51 ± 0.01 82.14 ± 0.55 82.03 ± 0.52 82.26 ± 0.83 0.64 ± 0.01

RF, random forest; GB, gradient boosting; SV, support vector.

Predicted risk varied between 0.04 to 0.55 in Sri Lanka and Bangladesh, whereas India was predicted to be highest risk (mean value of 0.75), with most of the predicted risk throughout central India and some border areas remained high of the country (Fig 1).

Fig 1. a) Locations of reported outbreaks in India, Bangladesh, and Sri Lanka. b) Predicted high and low-risk areas probability distribution for FMD based on Random forest model.

Fig 1

In Sri Lanka, both North and North Central provinces had a higher mean risk compared to the rest of the country (0.072 and 0.067). In Bangladesh, Barisal, Khulna and Chittagong divisions were identified to have a higher mean risk (0.682, 0.603 and 0.573) (Fig 2).

Fig 2. Mean predicted relative risk for a) Sri Lanka and b) Bangladesh with the Random Forest algorithm.

Fig 2

Darker red colors indicate high-risk areas. Note: Size of the countries are not proportionate to each other.

From each different model, production systems and isothermailty were identified as the most important features associated with the predicted FMD spatial risk (Fig 3). PD plots showed that the risk increased with production systems 5–10. Production systems 5–10 include rain-fed arid, humid, temperate, and mixed irrigated arid systems. The spatial risk of FMD increased and plateaued when isothemality =~ 35% (Fig 4). Isothermality indicates temperature variation. Interaction plots showed that the highest interaction between the production systems and isothermailty. Higher risk was predicted at higher values (Fig 5).

Fig 3. Random Forest feature importance plot.

Fig 3

Fig 4. Centered ICE (classification error loss) feature importance plots from Random Forest.

Fig 4

Fig 5. A) Feature interaction plots of all the variables, production systems and mean diurnal range were identified with highest interaction.

Fig 5

B) Overall interaction strength of production systems with the other features. Isothermality and mean diurnal range has the highest interaction with the mean diurnal range.

Considering the RF model, FMD risk increases with cattle density per Km2, NDVI above 100. FMD risk was reduced with the mean diurnal temperature. The mean diurnal range is the difference between days’ minimum and maximum temperature.

Fig 5 shows the interaction strength among the selected features and the interaction of production systems with the other features. These interactions are further explored by heat matrix (Fig 6).

Fig 6. Feature interaction heat matrix plots of A) Isothermality, B) MDR-mean diurnal range, C) Dry month precipitation, D) cattle density, E) Normalized vegetation index, with production systems.

Fig 6

Production systems followed by mean diurnal range, isothermailty, cropland, and wet month precipitation had the highest overall interaction with the other features. At higher isothermality (<60) and a lower mean diurnal range, the magnitude of the spatial risk is high with the ‘production systems of mixed irrigated, urban, and other’. The interaction effect for dry month precipitation, cattle density, and NDVI in general remained high for the same production system.

Discussion

Using machine learning algorithms, outbreak data spanning more than 10 years, and FMD risk factors, we identified areas at highest risk for FMD in Bangladesh, and Sri Lanka using India to fit the model. Feature importance plots identified the production systems of livestock management and the isothermality as the major risk factors shaping FMD risk in the region, followed by the NDVI, cattle density and the mean diurnal range. All of our selected ML algorithms (i.e., RF, GB, SV) showed high predictive performance. Our main result concluded that Bangladesh and Sri Lanka had lower risk for FMD outbreaks (i.e., p = 0.04 to 0.55) compared to India (p = 0.75).

India predicted high risk when environmental predictors were considered (0.75 mean risk). India is progressing in the PCP Pathway to stage 4 with FMD control program (FMD CP) officially endorsed by WOAH [49]. The country had achieved several milestones related FMD control program concerning vaccination, and surveillance of outbreaks. At stage 3 of the PCP, the objective of India was to do zonal compartments to enhance vaccination for progressive reduction of the FMDV from the country. These activities are continued in stage 4. Despite control measures, FMD risk areas are still spread across India. In this model, we considered only environment-related variables for predictions and outbreak data from India until 2020. Therefore, it may not reflect the recent advances the country has made with the FMD control with the vaccination program [49].

Our previous spatial temporal Bayesian analysis in India that accounted for vaccination showed that border areas were at high risk compared to the central part of the country [3]. Findings from this study are compatible with that study, as comparatively high-risk areas are identified in the peripheral parts of the country.

Phylogeography studies have shown that the viruses originating from South Asia would move toward Southeast Asia and west towards the Middle East [50,51]. This is assumed because of illegal animal transport through international borders, and low vaccination coverage among other reasons [52].

Both Bangladesh and Sri Lanka were identified as low-risk compared to India. Both countries are in the PCP stage 1 (3rd SAARC Roadmap Meeting on the Foot-and-Mouth Disease Progressive Control Pathway (FMD-PCP). In Sri Lanka Northern Province bordering India and in Bangladesh, Barisal, Khulna and Chittagong divisions that border India are identified as high risk for FMD compared to the other areas. The eastern part of Bangladesh was identified as a high-risk area in a previous publication [22]. In Sri Lanka, the northern and eastern parts are identified as high-risk compared to the rest of the country [53]. These findings are compatible with high risk areas identified by our ML models. Collaborating to implement restrictions on illegal animal movement across international borders is beneficial to prevent FMD and other transboundary diseases. A requirement of PCP stage 2 is initiating a legal framework to ensure movement control and surveillance at the borders.

The association between risk factors for a transboundary disease like FMD is complex [54]. Use of logistic regression to determine risk factors does not show good accuracy when there are complex relationships among variables [41]. As identified from the RF algorithm there are considerable interactions among the identified major risk factors production systems with climate related features. The best-performing model identified among the different methods we tested here was Random Forest, which performed better compared to the regression model. Random forest and the other ML models we used do not provide a coefficient or causality but account for the complex association between risk factors via correlation [55]. In general, these associations are nonlinear. We considered different risk factors identified from the best performing RF algorithm production systems, isothemality, mean diurnal range, NDVI, and cattle density for discussion.

South Asia is population dense with 2.04 billion people (25% of the world population) (World Meter). According to the Commission on Global Poverty, World Bank, 34% of the world’s extreme poor are in this region. The livestock is reared to provide emergency income and to utilize crop residues. The livestock management systems are such that, every household may have two to three animals. The productivity is low in this system compared to the other management systems [30]. The identified FMD risk factors in the area revolve around this system that is far from intensive management and therefore lacks bio-security measures.

FAO global livestock production systems that we used in this analysis account for human population density and these different types of management systems. Here, it was identified that FMD risk increases with mixed irrigated and rain-fed arid systems, as well as urban and other. Mixed irrigated systems are classified as extensive livestock management systems where 10% of production comes from non-livestock land use (irrigated systems) and where crop by-products are fed to animals. Rain-fed farming systems are classified where industrial crop activities are higher but livestock are also present. The rain-fed arid system is identified as a dispersed system with very low productivity due to extreme weather conditions. With high human population density, livestock is typical in urban areas as well. Normalized vegetation index (NDVI) was another risk factor identified by different models. A higher and moderate vegetation index is associated with higher FMD risk. This could be due to extensive grazing management practices of the livestock where animals are sent to graze in forest and pasture lands.

Weather patterns are also associated with FMD risk such as isothermality, mean diurnal range, and precipitation. Isothermality is associated with the degree to which temperature varies throughout the year compared to the annual temperature variation [56]. The mean diurnal range is the difference between days’ minimum and maximum temperature. This value has been decreasing with climate change [57]. According to the cICE plot, a mean diurnal range decrease is associated with high FMD risk. Precipitation seasonality is directly associated with monthly rainfall and FMD risk due to grazing patterns followed in the region where animals are moved to different grazing areas and the virus survives during dry and cool weather right after the monsoon rainy season [22,53,58]. Identification of this pattern is important to decide on when to implement control measures such as vaccination and movement restrictions, before animals get exposed to the virus.

FMD risk increases with cattle density. Cattle are the main livestock species infected by FMD in the region [22,52]. Controlling cattle movement is encouraged but is hard with existing management practices. In Sri Lanka, animal movement and communal grazing were identified as the major risk factors [53]. In stage 1 of PCP, it is important to identify FMD distribution in the country. Identifying areas with an interaction among the aforesaid variables that predispose to FMD is important to focus on to carry out risk-based surveillance activities.

An advantage of the ML approach we followed is the ability to untangle complex interactions. This advantage is lacking in linear models like logistic regression, where attempts to fit interaction terms usually lead to uninterpretable results (i.e., lack of epidemiological plausibility) and overfitting. Here, we identified production systems as the most overall interacting feature on one side and isothermality, mean diurnal range and cattle density features on the other (Fig 6). Production systems are inherently associated with the climatic variables such as seasonality, rainfall for both livestock production in the region [59]. Since no studies are available related to this exact production system, considering production systems infers farm density, there are evidence from simulation studies that high cattle densities will results in FMD outbreaks irrespective of farm density but farm density is associated with larger epidemics only in the presence of high cattle density [60].

In this analysis, decision tree models performed better than SV models. Another advantage of an RF model is that it can reduce the chances of overfitting using out-of-bag error data for internal validation [37]. Random forest is considered best performing with a larger number of variables than the number of observations compared to other ML methods. Here, we have fewer observations compared to many different machine-learning models used in other fields, e.g., gene editing and gene expression [61], and patient outcome prediction [62]. The caret package selected the number of predictors and optimal branching for RF.

It is identified with RF, that there can be selection bias when the scale of measurement or the number of categories varies across variables [63]. Here we selected our scale of measurement similar across many variables by aggregating and resampling before further analysis and did not include any categorical predictor variables. Even if there is selection bias, it does not affect the selection of important variables [37]. For variable importance measure (VIM), we use the permutation method [46] based on prediction accuracy instead of the Gini impurity index which is based on the splitting criteria, as VIM is based on prediction accuracy and is considered better than Gini impurity index [37].

There are a number of limitations in our work. The differences in surveillance capacities of the three countries that lead to data inconsistencies are a limitation when comparing the countries for FMD risk. India is having higher diagnostic capabilities compared to both Sri Lanka and Bangladesh. Since we are predicting the suitability for all types of occurrences (reported outbreaks and laboratory confirmed outbreaks via antibody detection) using an ecological model, ecological fallacy may affect the results. To a degree, this is compensated as we regarded only the spatial aspect of data. Our study encourage authorities in all the three countries to improve the quality of FMD surveillance particularly Bangladesh where data was not available from a central source.

Since feature selection via Boruta was performed before splitting the data into training and test sets there is a potential of data leakage [64]. Such impact should be minimal as our data set is smaller and the model evaluation results were consistent across different validation methods. In an ecological niche modeling environment, it is important to select, a scale that fits the biology of the disease that is modeled [7]. We have used 5 minute arc is a pixel resolution of about 9 km [31]. FMD is transmitted at a range of 50 Km to 200 Km [65]. There could be limitations to interpolated climatic data [66]. Since this model is not at a fine scale, detailed interpretations cannot be made based on these findings. Here we did not consider the temporal aspect of FMD outbreaks but FMD has a strong temporal component that results in the specific epidemic curve, which indicates this identified high-risk areas are not high risk all the time.

Data from this model can act as training data for the other South Asian countries where information was not available for this study. This study identifies the complex relationship between identified FMD risk factors, importantly impact of climate variables for FMD outbreaks. Standardizing the surveillance procedures among countries will improve for capturing occurrences in future studies and regional FMD control.

Further, we emphasize a regional approach of implementing control measures and resource allocation in FMD control. This study facilitate both Sri Lanka and Bangladesh to target risk based surveillance and control activities. Identifying FMD risk areas in the country would help both Sri Lanka and Bangladesh to accomplish zonal compartmentalization in the country, legal frameworks to regulate international animal movements, and requirements of stage 2 of PCP.

Supporting information

S1 Fig. Outbreak numbers in different years from Bangladesh, India and Sri Lanka used in the analysis.

(TIF)

pone.0320921.s001.tif (175KB, tif)
S1 Table. Source of the different predictors, spatial resolution, and the data collected period considered in this study.

(DOCX)

pone.0320921.s002.docx (14.8KB, docx)
S1 File. Outbreak location data repository.

(DOCX)

pone.0320921.s003.docx (14.8KB, docx)

Data Availability

Outbreak data used in this study are from three countries obtained with collaboration of listed coauthors in each country. Listed below are the links to Sri Lanka and India government websites where the data are available. Sri Lanka: https://daph.gov.lk/downloads/daph-publications accessed on 3/13/2025 India: http://www.pdfmd.ernet.in/ accessed on 3/13/2025 For Bangladesh, no central FMD outbreak data repository is available. Data used here are published in the following link cited in the manuscript. Further information should be obtained by contacting the authors of that manuscript; Bangladesh: ‘Epidemiological Surveillance and Mutational Pattern Analysis of Foot-and-Mouth Disease Outbreaks in Bangladesh during 2012–2021’. https://onlinelibrary-wiley-com.ezp3.lib.umn.edu/doi/10.1155/2023/8896572 accessed on 3/13/2025 Other data source links are included in the supplementary files including the outbreak locations we used in this study. Interested researchers can replicate the study findings by obtaining the data from the third party sources and following the information outlined in the Methods section.

Funding Statement

This project was funded in part by a grant from the USDA:ARS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Paton DJ, Gubbins S, King DP. Understanding the transmission of foot-and-mouth disease virus at different scales. Curr Opin Virol. 2018;28(1):85–91. [DOI] [PubMed] [Google Scholar]
  • 2.Chanchaidechachai T, de Jong MCM, Fischer EAJ. Spatial model of foot-and-mouth disease outbreak in an endemic area of Thailand. Prev Vet Med. 2021;195:105468. doi: 10.1016/j.prevetmed.2021.105468 [DOI] [PubMed] [Google Scholar]
  • 3.Gunasekera U, Biswal J, Machado G, Ranjan R, Subramaniam S, Rout M. Impact of mass vaccination on the spatiotemporal dynamics of FMD outbreaks in India, 2008–2016. Transbound Emerg Dis. 2022;69(5):e1936-50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Chhetri BK, Perez AM, Thurmond MC. Factors associated with spatial clustering of foot-and-mouth disease in Nepal. Tropical Animal Health and Production. 2010 Oct 1. [cited 2024 Jun 12];42(7). Available from: https://escholarship.org/uc/item/723713cz [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Perez AM, Thurmond MC, Carpenter TE. Spatial distribution of foot-and-mouth disease in Pakistan estimated using imperfect data. Prev Vet Med. 2006;76(3–4):280–9. doi: 10.1016/j.prevetmed.2006.05.013 [DOI] [PubMed] [Google Scholar]
  • 6.Lee HS, Pham TL, Wieland B. Temporal patterns and space-time cluster analysis of foot-and-mouth disease (FMD) cases from 2007 to 2017 in Vietnam. Transbound Emerg Dis. 2020;67(2):584–91. doi: 10.1111/tbed.13370 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Escobar LE. Ecological niche modeling: An introduction for veterinarians and epidemiologists. Front Vet Sci. 2020;7:519059. doi: 10.3389/fvets.2020.519059 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Yackulic CB, Chandler R, Zipkin EF, Royle JA, Nichols JD, Campbell Grant EH, et al. Presence-only modelling using MAXENT: when can we trust the inferences?. Methods Ecol Evol. 2012;4(3):236–43. doi: 10.1111/2041-210x.12004 [DOI] [Google Scholar]
  • 9.Fountain-Jones NM, Machado G, Carver S, Packer C, Recamonde-Mendoza M, Craft ME. How to make more from exposure data? An integrated machine learning pipeline to predict pathogen exposure. J Anim Ecol. 2019;88(10):1447–61. doi: 10.1111/1365-2656.13076 [DOI] [PubMed] [Google Scholar]
  • 10.Alkhamis MA, Fountain-Jones NM, Aguilar-Vega C, Sánchez-Vizcaíno JM. Environment, vector, or host? Using machine learning to untangle the mechanisms driving arbovirus outbreaks. Ecol Appl. 2021;31(7):e02407. doi: 10.1002/eap.2407 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Gulyaeva M, Huettmann F, Shestopalov A, Okamatsu M, Matsuno K, Chu D. Data mining and model-predicting a global disease reservoir for low-pathogenic Avian Influenza (AI) in the wider Pacific rim using big data sets. Sci Rep. 2020;10(1):16817. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Ito S, Aguilar-Vega C, Bosch J, Isoda N, Sánchez-Vizcaíno JM. Application of machine learning with large-scale data for an effective vaccination against classical swine fever for wild boar in Japan. Sci Rep. 2024;14(1):5312. doi: 10.1038/s41598-024-55828-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Saleh AY, Medang SA, Ibrahim AO. Rabies outbreak prediction using deep learning with long short-term memory. In: Saeed F, Mohammed F, Gazem N, editors. Emerging Trends in Intelligent Computing and Informatics. Cham: Springer International Publishing; 2020. p. 330–40. [Google Scholar]
  • 14.Reagan KL, Deng S, Sheng J, Sebastian J, Wang Z, Huebner SN, et al. Use of machine-learning algorithms to aid in the early detection of leptospirosis in dogs. J Vet Diagn Invest. 2022;34(4):612–21. doi: 10.1177/10406387221096781 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Elith J, Leathwick JR, Hastie T. A working guide to boosted regression trees. J Anim Ecol. 2008;77(4):802–13. doi: 10.1111/j.1365-2656.2008.01390.x [DOI] [PubMed] [Google Scholar]
  • 16.Fletcher RJ Jr, Hefley TJ, Robertson EP, Zuckerberg B, McCleery RA, Dorazio RM. A practical guide for combining data to model species distributions. Ecology. 2019;100(6):e02710. doi: 10.1002/ecy.2710 [DOI] [PubMed] [Google Scholar]
  • 17.Mielke SR, Garabed R. Environmental persistence of foot-and-mouth disease virus applied to endemic regions. Transbound Emerg Dis. 2020;67(2):543–54. doi: 10.1111/tbed.13383 [DOI] [PubMed] [Google Scholar]
  • 18.Punyapornwithaya V, Klaharn K, Arjkumpa O, Sansamur C. Exploring the predictive capability of machine learning models in identifying foot and mouth disease outbreak occurrences in cattle farms in an endemic setting of Thailand. Prev Vet Med. 2022;207:105706. doi: 10.1016/j.prevetmed.2022.105706 [DOI] [PubMed] [Google Scholar]
  • 19.van Schalkwyk O, Knobel D, De Clercq E, De Pus C, Hendrickx G, Van den Bossche P. Description of events where African buffaloes (Syncerus caffer) strayed from the endemic foot-and-mouth disease zone in South Africa, 1998-2008. Transbound Emerg Dis. 2016;63(3):333–47. [DOI] [PubMed] [Google Scholar]
  • 20.Gao H, Ma J. Spatial distribution and risk areas of foot and mouth disease in mainland China. Prev Vet Med. 2021;189:105311. doi: 10.1016/j.prevetmed.2021.105311 [DOI] [PubMed] [Google Scholar]
  • 21.Hossain K, Anjume H, Akther M, Alam K, Yeamin A, Akter S. Epidemiological surveillance and mutational pattern analysis of foot-and-mouth disease outbreaks in Bangladesh during 2012–2021. Transbound Emerg Dis. 2023;2023(1):8896572. [Google Scholar]
  • 22.Rahman AKMA, Islam SKS, Sufian MA, Talukder MH, Ward MP, Martínez-López B. Foot-and-mouth disease space-time clusters and risk factors in cattle and buffalo in Bangladesh. Pathogens. 2020;9(6):423. doi: 10.3390/pathogens9060423 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Haoran W, Jianhua X, Maolin O, Hongyan G, Jia B, Li G, et al. Assessment of foot-and-mouth disease risk areas in mainland China based spatial multi-criteria decision analysis. BMC Vet Res. 2021;17(1):374. doi: 10.1186/s12917-021-03084-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Miller CAJ, Young JR, Nampanya S, Khounsy S, Singanallur NB, Vosloo W, et al. Risk factors for emergence of exotic foot-and-mouth disease O/ME-SA/Ind-2001d on smallholder farms in the Greater Mekong Subregion. Prev Vet Med. 2018;159:115–22. doi: 10.1016/j.prevetmed.2018.09.007 [DOI] [PubMed] [Google Scholar]
  • 25.Dukpa K, Robertson ID, Edwards JR, Ellis TM, Tshering P, Rinzin K, et al. Risk factors for foot-and-mouth disease in sedentary livestock herds in selected villages in four regions of Bhutan. N Z Vet J. 2011;59(2):51–8. doi: 10.1080/00480169.2011.552852 [DOI] [PubMed] [Google Scholar]
  • 26.Iriarte MV, Gonzáles JL, de Freitas Costa E, Gil AD, de Jong MCM. Main factors associated with foot-and-mouth disease virus infection during the 2001 FMD epidemic in Uruguay. Front Vet Sci. 2023;10:1070188. doi: 10.3389/fvets.2023.1070188 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Nyaguthii DM, Armson B, Kitala PM, Sanz-Bernardo B, Di Nardo A, Lyons NA. Knowledge and risk factors for foot-and-mouth disease among small-scale dairy farmers in an endemic setting. Vet Res. 2019;50(1):33. doi: 10.1186/s13567-019-0652-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Sirdar M, Fosgate G, Blignaut B, Heath L, Lazarus D, Mampane R. A comparison of risk factor investigation and experts’ opinion elicitation analysis for identifying foot-and-mouth disease (FMD) high-risk areas within the FMD protection zone of South Africa (2007–2016). Prev Vet Med. 2024;226(1):106192. doi: 10.1016/j.prevetmed.2024.106192 [DOI] [PubMed] [Google Scholar]
  • 29.Hoogesteyn AL, Rivas AL, Smith SD, Fasina FO, Fair JM, Kosoy M. Assessing complexity and dynamics in epidemics: geographical barriers and facilitators of foot-and-mouth disease dissemination. Front Vet Sci. 2023;10:1149460. doi: 10.3389/fvets.2023.1149460 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Parthasarathy Rao P, Birthal PS. Livestock in mixed farming systems in South Asia. 2008. [cited 2024 May 18]; Available from: https://hdl.handle.net/10568/400 [Google Scholar]
  • 31.Fick S, Hijmans R. WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas. Int J Climatol. 2017;37(12):4302–15. [Google Scholar]
  • 32.Gilbert M, Cinardi G, Da Re D, Wint WGR, Wisser D, Robinson TP. Global cattle distribution in 2015 (5 minutes of arc) [Internet]. Harvard Dataverse; 2022. [cited 2024 Jun 13]. Available from: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/LHBICE [Google Scholar]
  • 33.Meijer JR, Huijbregts MAJ, Schotten KCGJ, Schipper AM. Global patterns of current and future road infrastructure. Environ Res Lett. 2018;13(6):064006. doi: 10.1088/1748-9326/aabd42 [DOI] [Google Scholar]
  • 34.Phillips SJ, Dudík M, Elith J, Graham CH, Lehmann A, Leathwick J, et al. Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. Ecol Appl. 2009;19(1):181–97. doi: 10.1890/07-2153.1 [DOI] [PubMed] [Google Scholar]
  • 35.Kursa M, Rudnicki W. Feature selection with the Boruta package. J Stat Softw. 2010;36(1):1–13. [Google Scholar]
  • 36.Kuhn M, Wing J, Weston S, Williams A, Keefer C, et al. caret: classification and regression training [Internet]. 2023. [cited 2024 Jun 13]. Available from: https://cran.r-project.org/web/packages/caret/index.html [Google Scholar]
  • 37.Boulesteix A, Janitza S, Kruppa J, König IR. Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. WIREs Data Min Knowl. 2012;2(6):493–507. doi: 10.1002/widm.1072 [DOI] [Google Scholar]
  • 38.Breiman L. Random forests. Mach Learn. 2001;45(1):5–32. [Google Scholar]
  • 39.Friedman JH. Stochastic gradient boosting. Comput Stat Data Anal. 2002;38(4):367–78. doi: 10.1016/s0167-9473(01)00065-2 [DOI] [Google Scholar]
  • 40.Schölkopf B, Smola A, Smola A, Smola A. Support vector machines and kernel algorithms. Encyclopedia Biostat. 2005:5328–35. [Google Scholar]
  • 41.Uddin S, Khan A, Hossain ME, Moni MA. Comparing different supervised machine learning algorithms for disease prediction. BMC Med Inform Decis Mak. 2019;19(1):281. doi: 10.1186/s12911-019-1004-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning: with applications in R [Internet]. New York, NY: Springer US; 2021. [cited 2025 Jan 2]. Available from: https://link.springer.com/10.1007/978-1-0716-1418-1 [Google Scholar]
  • 43.Haibo He, Garcia EA. Learning from imbalanced data. IEEE Trans Knowl Data Eng. 2009;21(9):1263–84. doi: 10.1109/tkde.2008.239 [DOI] [Google Scholar]
  • 44.Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020;21(1):6. doi: 10.1186/s12864-019-6413-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Fisher A, Rudin C, Dominici F. All models are wrong, but many are useful: learning a variable’s importance by studying an entire class of prediction models simultaneously [Internet]. arXiv; 2019. [cited 2024 Feb 9]. Available from: http://arxiv.org/abs/1801.01489 [PMC free article] [PubMed] [Google Scholar]
  • 46.Molnar C, Casalicchio G, Bischl B. iml: An R package for interpretable machine learning. J Open Source Softw. 2018;3(26):786. [Google Scholar]
  • 47.Goldstein A, Kapelner A, Bleich J, Pitkin E. Peeking inside the black box: visualizing statistical learning with plots of individual conditional expectation. J Comput Graph Stat. 2015;24(1):44–65. [Google Scholar]
  • 48.Friedman JH, Popescu BE. Predictive learning via rule ensembles. Ann Appl Stat. 2008;2(3):916–54. doi: 10.1214/07-aoas148 [DOI] [Google Scholar]
  • 49.Das S, Pal S, Rautaray SS, Mohapatra JK, Subramaniam S, Rout M, et al. Estimation of foot-and-mouth disease virus sero-prevalence rates using novel computational approach for the susceptible bovine population in India during the period 2008–2021. Sci Rep. 2023;13(1):22583. doi: 10.1038/s41598-023-48459-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Di Nardo A, Ferretti L, Wadsworth J, Mioulet V, Gelman B, Karniely S, et al. Evolutionary and ecological drivers shape the emergence and extinction of foot-and-mouth disease virus lineages. Mol Biol Evol. 2021;38(10):4346–61. doi: 10.1093/molbev/msab172 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Brito BP, Mohapatra JK, Subramaniam S, Pattnaik B, Rodriguez LL, Moore BR, et al. Dynamics of widespread foot-and-mouth disease virus serotypes A, O and Asia-1 in southern Asia: A Bayesian phylogenetic perspective. Transbound Emerg Dis. 2018;65(3):696–710. doi: 10.1111/tbed.12791 [DOI] [PubMed] [Google Scholar]
  • 52.Subramaniam S, Mohapatra JK, Sahoo NR, Sahoo AP, Dahiya SS, Rout M, et al. Foot-and-mouth disease status in India during the second decade of the twenty-first century (2011–2020 ). Vet Res Commun. 2022 Dec 1;46(4):1011–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Gunasekera UC, Sivasothy A, Wedasingha N, Thayaparan S, Rotewewa B, Muralithas M. Analyzing the foot and mouth disease outbreak as from 2008 to 2014 in cattle and buffaloes in Sri Lanka. Prev Vet Med. 2017;148:78–8. [DOI] [PubMed] [Google Scholar]
  • 54.González Gordon L, Porphyre T, Muhanguzi D, Muwonge A, Boden L, Bronsvoort BM de C. A scoping review of foot-and-mouth disease risk, based on spatial and spatio-temporal analysis of outbreaks in endemic settings. Transbound Emerg Dis. 2022;69(6):3198–215. doi: 10.1111/tbed.14769 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Couronné R, Probst P, Boulesteix A-L. Random forest versus logistic regression: a large-scale benchmark experiment. BMC Bioinform. 2018;19(1):270. doi: 10.1186/s12859-018-2264-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Easterling DR, Horton B, Jones PD, Peterson TC, Karl TR, Parker DE, et al. Maximum and minimum temperature trends for the globe. Science. 1997;277(5324):364–7. doi: 10.1126/science.277.5324.364 [DOI] [Google Scholar]
  • 57.Braganza K, Karoly DJ, Arblaster JM. Diurnal temperature range as an index of global climate change during the twentieth century. Geophys Res Lett. 2004;31(13). doi: 10.1029/2004gl019998 [DOI] [Google Scholar]
  • 58.Hossen ML, Ahmed S, Khan MFR, Nazmul Hussain Nazir KHM, Saha S, Islam MA, et al. The emergence of foot-and-mouth disease virus serotype O PanAsia-02 sub-lineage of Middle East-South Asian topotype in Bangladesh. J Adv Vet Anim Res. 2020;7(2):360–6. doi: 10.5455/javar.2020.g429 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Godde CM, Mason-D’Croz D, Mayberry DE, Thornton PK, Herrero M. Impacts of climate change on the livestock food supply chain; a review of the evidence. Glob Food Security. 2021;28:100488. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Meadows AJ, Mundt CC, Keeling MJ, Tildesley MJ. Disentangling the influence of livestock vs. farm density on livestock disease epidemics. Ecosphere. 2018;9(7):e02294. doi: 10.1002/ecs2.2294 [DOI] [Google Scholar]
  • 61.Cummings MP, Myers DS. Simple statistical models predict C-to-U edited sites in plant mitochondrial RNA. BMC Bioinform. 2004;5:132. doi: 10.1186/1471-2105-5-132 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Boulesteix AL, Porzelius C, Daumer M. Microarray-based classification and clinical predictors: on combined classifiers and additional predictive value. Bioinformatics. 2008;24(15):1698–706. [DOI] [PubMed] [Google Scholar]
  • 63.Strobl C, Boulesteix A-L, Zeileis A, Hothorn T. Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinform. 2007;8:25. doi: 10.1186/1471-2105-8-25 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Shim M, Lee S-H, Hwang H-J. Inflated prediction accuracy of neuropsychiatric biomarkers caused by data leakage in feature selection. Sci Rep. 2021;11(1):7980. doi: 10.1038/s41598-021-87157-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Brown E, Nelson N, Gubbins S, Colenutt C. Airborne transmission of foot-and-mouth disease virus: a review of past and present perspectives. Viruses. 2022;14(5):1009. doi: 10.3390/v14051009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Townsend J. Mapping disease transmission risk: enriching models using biogeography and ecology. Emerg Infect Dis. 2015;21(8):1489. [Google Scholar]

Decision Letter 0

Nussieba A Osman

19 Nov 2024

PONE-D-24-43580Ecological niche modeling for surveillance of foot-and-mouth disease in South AsiaPLOS ONE

Dear Dr. Gunasekera,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

ACADEMIC EDITOR: Reviewers recommended major revisions to improve the manuscript. The reviewer's comments are enclosed below. Please carefully consider each point raised by the reviewers.

==============================

Please submit your revised manuscript by Jan 03 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Nussieba A. Osman, Dr. Med. Vet.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Thank you for stating the following financial disclosure:

“This project was funded in part by a grant from the USDA:ARS”

Please state what role the funders took in the study.  If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

If this statement is not correct you must amend it as needed.

Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

3. Thank you for uploading your study's underlying data set. Unfortunately, the repository you have noted in your Data Availability statement does not qualify as an acceptable data repository according to PLOS's standards.

At this time, please upload the minimal data set necessary to replicate your study's findings to a stable, public repository (such as figshare or Dryad) and provide us with the relevant URLs, DOIs, or accession numbers that may be used to access these data. For a list of recommended repositories and additional information on PLOS standards for data deposition, please see https://journals.plos.org/plosone/s/recommended-repositories .

4. We note that Figure 2 in your submission contain map images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright  

We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission:

     1. You may seek permission from the original copyright holder of Figure 2 to publish the content specifically under the CC BY 4.0 license. 

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission.

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

     2. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.

The following resources for replacing copyrighted map figures may be helpful:

USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/

The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/

Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html

NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/  

Landsat: http://landsat.visibleearth.nasa.gov/

USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/#

Natural Earth (public domain): http://www.naturalearthdata.com/

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Preliminary observation:

The manuscript entitled “Ecological niche modeling for surveillance of foot-and-mouth disease in South Asia” provides an interesting insight on risk factors associated for FMD in south Asian countries in general and India, Sri Lanka and Bangladesh in particular based on the data available. Of these countries India is in stage 4 of Progressive Control Pathway (PCP) of FMD, whereas other two countries are in PCP 1. Therefore the FMD control in India is at a very dynamic stage. The approach included use of appropriate models to normalize the possibility of outbreaks in different parts of these countries, as under-reporting of the outbreaks is common in the area due to different reasons. While the authors acknowledge the challenge of underreporting outbreaks, they fail to provide a transparent account of how the model compensates for this issue. Clarifying the methodology used to address underreporting is essential for enhancing the study’s credibility. This also included a variety of machine learning algorithms to identify the best fit using different (13) parameters responsible for disease/outbreak and risks associated.

General observations

The English language used in the manuscript is good and easy to understand.

Comments

Minor:

The abbreviation of Ecological niche models (ENM) may be checked as they are written as EMN (lines 64 & 245). A concise explanation of how ENMs function and their relevance to FMD would significantly benefit the reader’s understanding. The figure S1 is missing in the supporting documents.

Major:

• The methodology used for analysis is very appropriate in terms of models (ecological niche model and machine learning tools and different parameters required for risk assessment of FMD. The manuscript mentions the use of random forest, support vector, and gradient boosting algorithms. However, it does not provide information on the selection criteria or performance metrics used for comparison. Also the methodology for handling duplicate outbreak occurrences is inadequately defined, lacking clear criteria for identification and removal. This ambiguity poses a risk of distorting the training dataset used for machine learning models, which could compromise the reliability of the results. A well-defined rationale and systematic approach to duplicate management are crucial to ensure data integrity and enhance model performance.

• The Indian FMD control programme has entered in to a very dynamic stage changing in every round of vaccination every six month. As shown in the manuscript that India contributes to the major niche in the region and remains a major source of viruses in the region, with a changed scenario this may not be true. The factors such as outbreak trend cannot be predicted, until the recent vaccination efforts, and trend of recent outbreaks are used to fit the model from India. This may also affect the prediction in neighbouring countries like Sri Lanka. Interventions during intensive vaccinations are very dynamic and difficult to predict due to changing herd immunity patterns every six months in different parts of the country. The implications of this data dependency should be thoroughly discussed to avoid misleading conclusions.

• Also degree of under reporting varies in different countries taken in the study and also different parts of a country like India needs to be predicted and differentiated (although manuscript discussed this at a point). Higher risk of predictions for India (0.75 as against 0.04 and 0.55 for Sri Lanka and Bangladesh respectively) compared to Sri Lanka and Bangladesh needs to be normalized in view of ongoing efforts, which have reduced the number of outbreaks significantly during the past two years on account of 3-4 rounds of mass vaccinations using a potent FMD vaccine and other measures.

• India is having a higher capability for diagnosis, virus identification and characterization, which sometimes gets reflected in scenario that most of the viruses originate from India (line 288-289) needs to be discussed in the manuscript.

• The analysis primarily considers environmental variables to predict FMD risk but may overlook other crucial epidemiological and socio-economic factors such as herd immunity levels, animal movement patterns, and vaccination compliance. These aspects are likely to be significant drivers of FMD outbreaks and could have strengthened the model’s predictive ability. Adding these non-environmental predictors could yield more comprehensive insights and potentially improve model accuracy. The absence of time-based analysis also limits the model's ability to predict outbreaks dynamically, making it less actionable for real-time decision-making.

Reviewer #2: This paper discusses how ecological niche modelling, coupled with machine learning, can be used to predict the risk of Foot and Mouth Disease (FMD) in the South Asian countries of India, Bangladesh, and Sri Lanka. The paper's main contribution is the development of a spatial risk map for FMD, highlighting areas of high risk in these countries. This information can then be used to inform and improve surveillance and control efforts for the disease in the region. I recommend that this paper be accepted after major revision. Please find full comments from the attached file.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #1: Yes:  Rabindra Prasad Singh

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: Comments-Final to be submitted.docx

pone.0320921.s004.docx (16.9KB, docx)
Attachment

Submitted filename: Reviewer Comments for PONE-D-24-43580.pdf

pone.0320921.s005.pdf (100.3KB, pdf)
PLoS One. 2025 Apr 22;20(4):e0320921. doi: 10.1371/journal.pone.0320921.r003

Author response to Decision Letter 1


20 Jan 2025

Reviewer 1

Dear Editor and dear Reviewer,

We would like to thank for your detailed review of the manuscript and for the useful comments given. We carefully considered all your comments and thus we addressed them with the appropriate corrections in the text.

Replies and explanations to each comment are listed below.

While the authors acknowledge the challenge of underreporting outbreaks, they fail to provide a transparent account of how the model compensates for this issue. Clarifying the methodology used to address underreporting is essential for enhancing the study’s credibility.

We would like to thank the reviewer for this remark. For different ML algorithms, reported outbreaks are one variable among considered many variables that we use to determine FMD high-risk areas. One objective of this MS was to identify high risk areas based on environmental data and reported outbreaks numbers. Following sentences were added/ modified.

Introduction;

Line 61-66

‘Compared to spatial methods such as kriging (5) and sat scan analysis, which uses only outbreak data location and time (6) to determine FMD risk areas, ecological niche models (ENM) can accommodate and explore many highly correlated and complex risk factors to predict the spatial risk of FMD robustly. ENM use variables, such as temperature and precipitation, along with reported incidences to detect species abundance and predict outbreak occurrence (7)’

Line 75-77

‘Different environmental risk factors such as the relative humidity, temperature are associated with FMD persistence in endemic countries (15). To predict the FMD risk, location of outbreak occurrence data are combined with different environmental predictor’s spatial distribution (16). ‘

Limitations:

Line 429-436

‘There are a number of limitations in our work. The differences in surveillance capacities of the three countries that lead to data inconsistencies are a limitation when comparing the countries for FMD risk.India is having higher diagnostic capabilities compared to both Sri Lanka and Bangladesh. Since we are predicting the suitability for all types of occurrences (reported outbreaks and laboratory confirmed outbreaks via antibody detection) using an ecological model, ecological fallacy may affect the results. To a degree, this is compensated as we regarded only the spatial aspect of data. Our study encourage authorities in all the three countries to improve the quality of FMD surveillance particularly Bangladesh where data was not available from a central source. ’

Minor Comments

The abbreviation of Ecological niche models (ENM) may be checked as they are written as EMN (lines 64 & 245).

Thank you for your observation. This is corrected.

A concise explanation of how ENMs function and their relevance to FMD would significantly benefit the reader’s understanding.

Thank you this suggestion. This is now explained in detail in the introduction.

Line 75-79

‘For FMD and other infectious diseases, there is no straight linear pattern that differentiate infected from the non-infected (15). Machine learning approaches are capable of exploring these nonlinear interactions (16). Different environmental risk factors such as the relative humidity, temperature are associated with FMD persistence in endemic countries (17). To predict the FMD risk, location of outbreak occurrence data are combined with different environmental predictor’s spatial distribution (16). For FMD, ML has been used in studies in Thailand (18), South Africa (19), and China (20). ‘

The figure S1 is missing in the supporting documents.

This figure is uploaded again to the system.

Major comments

The methodology used for analysis is very appropriate in terms of models (ecological niche model and machine learning tools and different parameters required for risk assessment of FMD). The manuscript mentions the use of random forest, support vector, and gradient boosting algorithms. However, it does not provide information on the selection criteria or performance metrics used for comparison.

We would like to thank the reviewer for this remark. In this study we followed the multi algorithm machine learning ensemble pipeline introduced in the ‘How to make more from exposure data? An integrated machine learning pipeline to predict pathogen exposure. J Anim Ecol. 2019 Oct;88(10):1447–61.’ Each ML approach we predicts and identify features differently as explained in line 203-214.

“Random Forrest (RF) and gradient boosting (GB) methods are based on decision trees. Decision trees provide classification and separate paths based on selected variables. The way decision trees are made is different for each method. The RF method is suitable when the data is sparse. In the RF method, variables from the bootstrap data (training data) are randomly allocated in decision trees. Trees are then randomly selected to test data (out-of-bag data) that was not used in creating trees (33) (34). The accuracy of the model is determined by test data. When GB makes decision trees, new trees are scaled, and made based on the errors of previous trees, and the size of the trees is restricted (35). The support vector (SV) machine learning method uses a kernel function to classify data (outbreak vs no outbreak) at a higher dimension space based on a threshold value. The threshold accounts for the bias-variance tradeoff (36)(37). Ten-fold cross-validation was used to estimate model performance and compare different machine learning methods.”

The pipeline used in this MS performs better compared to presence only traditional ecological niche models, as mentioned in line 66-73

‘One of the most commonly used ENM is maximum entropy species distribution. However, this model requires several assumptions such as representative sampling, and considers presence-only data. For the algorithm component, supervised machine learning methods based on decision trees and kernel function are recommended over this method using both presence and absent data (8). Interpretable multi algorithm machine learning models require fewer statistical assumptions, are less sensitive to highly correlated variables and, therefore, overfitting, and can explore none-linear complex relationships between variables (9). ‘

Performance metrics

Performance matrices of each model from k fold cross validation is mentioned in line 215 – 235 and comparison results are shown in the table 1.

Regarding different features, the feature importance plots identify importance of each feature to the model and any feature that is not performing well. Partial dependency plots shows the impact of each feature on model predictions and the interaction plots show the marginal impact (also calculated by Friedman’s H statistics).

Also the methodology for handling duplicate outbreak occurrences is inadequately defined, lacking clear criteria for identification and removal. This ambiguity poses a risk of distorting the training dataset used for machine learning models, which could compromise the reliability of the results. A well-defined rationale and systematic approach to duplicate management are crucial to ensure data integrity and enhance model performance.

We would like to thank the reviewer for this remark. Since models are spatial, all the duplicate outbreaks were removed. Location is considered once for each outbreak. This is explained in line 90

‘to lower the training error of the model, duplicate occurrences of the same geographical location of outbreaks were removed during the considered period. ‘

The granularity of selected outbreak location is now made clear with the added description below. The selected outbreak locations were shared in a data repository.

Line 92-124.

‘In Bangladesh, a systematic FMD outbreak surveillance system is not available. Therefore, it is possible that not all outbreaks are reported during the period. The outbreak data used in this study are from a field study conducted during 2012 to 2021 covering 32 different districts. Outbreak information are collected based on farmer’s notification for serology testing. A total of 481 samples were collected from different outbreak. Sample collection was affected by the 2020-2021 COVID 19 outbreak (18). In this study, we used sample collection location confirmed by laboratory testing affiliated district as the outbreak location. We did not consider the temporal aspect of data. Compared to data coming from Sri Lanka and India, data from Bangladesh is not complete.

For Sri Lanka, officially reported outbreak data were available from the Department of Animal Health, Sri Lanka. The respective veterinarians from nearly 256 veterinary ranges report outbreaks as a part of passive surveillance activities. A district in Sri Lanka consist of multiple veterinary ranges. Outbreak reporting system is a paper based monthly report sent to the head office. Total 369 outbreaks were reported during the period of 2014 to 2022 from different areas of the country. A reported outbreak may have one to many infected animals in an identified farm location. If one or more outbreak was reported at a VS range, a point location of the VS range was considered as positive and recorded for this study. These outbreaks are clinically identified initially with later confirmatory serotype diagnosis.

In India, veterinary authorities conduct both active and passive surveillance for FMD mainly focusing on passive surveillance in nearly 65000 administrative levels. Confirmatory diagnosis is carried out for serotyping in 27 FMD network laboratories and 2 national laboratories. A reported outbreak may have one to many infected animals in an identified farm location. The reporting system is paper based and a monthly report is submitted to the Department of Animal Husbandry. The outbreak location was considered up to the district level (ie if there are one or more outbreaks reported at village level, for the purpose of this analysis, the district was considered positive and a point location of the district was recorded). FMD outbreak data for India (n=429) were available at the district level from a previous study for the years 2009 to 2020 (3).’

The Indian FMD control programme has entered in to a very dynamic stage changing in every round of vaccination every six month. As shown in the manuscript that India contributes to the major niche in the region and remains a major source of viruses in the region, with a changed scenario this may not be true. The factors such as outbreak trend cannot be predicted, until the recent vaccination efforts, and trend of recent outbreaks are used to fit the model from India.

This may also affect the prediction in neighbouring countries like Sri Lanka. Interventions during intensive vaccinations are very dynamic and difficult to predict due to changing herd immunity patterns every six months in different parts of the country. The implications of this data dependency should be thoroughly discussed to avoid misleading conclusions.

Thank you for your observation. We accept that the dynamic nature of FMD occurrence cannot be discussed in this MS since we did not consider the temporal aspect of data with unavailability of outbreak data. However, we try to explain this with following added lines in the discussion Line 334-336,

‘In this model, we considered only environment-related variables for predictions and outbreak data from India until 2020. Therefore, it may not reflect the recent advances the country has made with the FMD control with the vaccination program (49).’

Implications of data dependency now described further in line 92-124 and was added as a limitation in line 439-436.

Also degree of under reporting varies in different countries taken in the study and also different parts of a country like India needs to be predicted and differentiated (although manuscript discussed this at a point). Higher risk of predictions for India (0.75 as against 0.04 and 0.55 for Sri Lanka and Bangladesh respectively) compared to Sri Lanka and Bangladesh needs to be normalized in view of ongoing efforts, which have reduced the number of outbreaks significantly during the past two years on account of 3-4 rounds of mass vaccinations using a potent FMD vaccine and other measures.

Thank you for your observation. The limitation of outbreak data was added and the differences in surveillance program in each country was discussed in further details. It is evident that India has a better surveillance system compared to the other two countries. More details about outbreak reporting systems added in (line 93-117).

Since we are looking at data at a higher scale, ecological fallacy can be associated. Mentioned in line 429-436

‘The differences in surveillance capacities of the three countries that lead to data inconsistencies are a limitation when comparing the countries for FMD risk. India is having higher diagnostic capabilities compared to both Sri Lanka and Bangladesh. Since we are predicting the suitability for all types of occurrences (reported outbreaks and laboratory confirmed outbreaks via antibody detection) using an ecological model, ecological fallacy may affect the results. To a degree, this is compensated as we regarded only the spatial aspect of data. Our study encourage authorities in all the three countries to improve the quality of FMD surveillance particularly Bangladesh where data was not available from a central source.’

India is having a higher capability for diagnosis, virus identification and characterization, which sometimes gets reflected in scenario that most of the viruses originate from India (line 288-289) needs to be discussed in the manuscript.

This sentence was changed to South Asia instead of India in Line 342 and the following sentence was removed.

“Studies link current circulating strains in Sri Lanka (47) (48) and Bangladesh Ind2001BD1 (49) to Ind/2001 that was first identified in India (50). “

The analysis primarily considers environmental variables to predict FMD risk but may overlook other crucial epidemiological and socio-economic factors such as herd immunity levels, animal movement patterns, and vaccination compliance. These aspects are likely to be significant drivers of FMD outbreaks and could have strengthened the model’s predictive ability. Adding these non-environmental predictors could yield more comprehensive insights and potentially improve model accuracy.

This is a good suggestion; unfortunately, no reliable data sources are readily available from any of the countries related to herd immunity, animal movement or vaccination. Even if available, not in the same geographic scale in this model, and incorporating categorical data values repeatedly in a continuous data model, will compromise the model performance.

The absence of time-based analysis also limits the model's ability to predict outbreaks dynamically, making it less actionable for real-time decision-making.

Thank you for your observation. This is added as a limitation line 443-445.

“Here we did not consider the temporal aspect of FMD outbreaks but FMD has a strong temporal component that results in the specific epidemic curve, which indicates this identified high-risk areas are not high risk all the time.”

Reviewer 2

Dear Editor and Reviewer,

We would like to thank for your detailed review of the manuscript and for the useful comments given. We carefully considered all your comments and thus we addressed them with the appropriate corrections in the text.

Replies and explanations to each comment are listed below.

1. Lack of Transparency on Data Comparability: A significant concern due to the lack of detail regarding data comparability between the three countries. While the authors mention their data sources, they do not adequately address potential variations in data collection, reporting standards, and diagnostic confirmation practices. This lack of transparency could misinterpret the risk predictions and undermine the study's overall reliability.

The authors need to provide a more detailed account of the data used for each country, acknowledging potential limitations in comparability. This might involve describing the data collection methodologies used in each country, discussing any inconsistencies in data reporting formats or definitions, and addressing t

Attachment

Submitted filename: Reviwer1comment_MA_UG.docx

pone.0320921.s008.docx (20.4KB, docx)

Decision Letter 1

Nussieba A Osman

27 Feb 2025

Ecological niche modeling for surveillance of foot-and-mouth disease in South Asia

PONE-D-24-43580R1

Dear Dr. Gunasekera,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager®  and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Nussieba A. Osman, Dr. Med. Vet.

Academic Editor

PLOS ONE

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Authors have now addressed all the comments as per observations made. The manuscript may now be accepted for publication.

Reviewer #2: Reviewer Comments and Recommendation for Acceptance

We appreciate the authors’ efforts in revising the manuscript Ecological Niche Modeling for Surveillance of Foot-and-Mouth Disease in South Asia."(PONE-D-24-43580_R1)" The authors have successfully addressed all major concerns, providing clear explanations, methodological improvements, and additional details that enhance the transparency and rigor of the study. Below is a summary of the key reviewer concerns and how the authors have responded effectively.

1. Transparency on Data Comparability

The initial concern was the lack of detailed information regarding how data were collected and compared across India, Bangladesh, and Sri Lanka. The authors have now included comprehensive explanations about data sources, reporting standards, and diagnostic confirmation practices, ensuring greater clarity and reducing potential misinterpretations.

2. Comparability of Predicted Risk Across Countries

The reviewer questioned whether disparities in data availability influenced differences in predicted risk levels rather than actual disease risk. The authors have acknowledged this limitation and further discussed how surveillance capabilities vary across the three countries. Additionally, they have suggested future research to improve data consistency, particularly in Bangladesh, where centralized surveillance is lacking.

3. Potential Data Leakage

Concerns about possible data leakage were raised, as feature selection was performed before splitting the data into training and testing sets. The authors have addressed this by explaining that cross-validation helps mitigate this risk, and they have now included this as a potential limitation to ensure transparency.

4. Class Distribution in Model Evaluation

The absence of an explicit mention of class distribution in model evaluation was noted. The authors have now clarified that a 1:1 case-control ratio was used, ensuring balanced representation of presence and absence cases. Additionally, ROC curves confirmed that no class imbalance was present.

5. Use of the Caret Package for Hyperparameter Tuning

The reviewer suggested that the authors explicitly mention their use of the caret package for hyperparameter tuning. The authors have acknowledged this and have now provided a brief explanation for transparency and reproducibility.

6. Clarification on Variables and Feature Selection

The reviewer requested details on the total number of variables considered and the number removed after correlation filtering. The authors have now included a supplementary table listing all 24 variables, clarified the 9 variables removed due to correlation (>0.9), and confirmed that 15 variables were ultimately used in the analysis.

7. Model Runs and Robustness Evaluation

The reviewer inquired about the number of times the models were run and the robustness of the evaluation process. The authors have now specified that each model underwent 10 runs of 10-fold cross-validation, ensuring reliability and reducing overfitting. This additional detail strengthens the credibility of the model performance results.

Final Recommendation

With these revisions, the manuscript now provides a more transparent, methodologically sound, and well-documented study on ecological niche modeling for foot-and-mouth disease surveillance. The authors have successfully addressed all concerns with thorough explanations, added details, and clear acknowledgments of limitations.

Given the improvements, I recommend accepting this manuscript for publication. This study offers valuable insights into regional FMD risk assessment using machine learning, and the refinements made in response to the reviews significantly enhance its impact and reliability.

Best wishes,

Reviewer 2

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #1: Yes:  Rabindra Prasad Singh

Reviewer #2: No

**********

Attachment

Submitted filename: Reviewer Comments and Recommendation for Acceptance.pdf

pone.0320921.s007.pdf (83.2KB, pdf)

Acceptance letter

Nussieba A Osman

PONE-D-24-43580R1

PLOS ONE

Dear Dr. Gunasekera,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Nussieba A. Osman

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Outbreak numbers in different years from Bangladesh, India and Sri Lanka used in the analysis.

    (TIF)

    pone.0320921.s001.tif (175KB, tif)
    S1 Table. Source of the different predictors, spatial resolution, and the data collected period considered in this study.

    (DOCX)

    pone.0320921.s002.docx (14.8KB, docx)
    S1 File. Outbreak location data repository.

    (DOCX)

    pone.0320921.s003.docx (14.8KB, docx)
    Attachment

    Submitted filename: Comments-Final to be submitted.docx

    pone.0320921.s004.docx (16.9KB, docx)
    Attachment

    Submitted filename: Reviewer Comments for PONE-D-24-43580.pdf

    pone.0320921.s005.pdf (100.3KB, pdf)
    Attachment

    Submitted filename: Reviwer1comment_MA_UG.docx

    pone.0320921.s008.docx (20.4KB, docx)
    Attachment

    Submitted filename: Reviewer Comments and Recommendation for Acceptance.pdf

    pone.0320921.s007.pdf (83.2KB, pdf)

    Data Availability Statement

    Outbreak data used in this study are from three countries obtained with collaboration of listed coauthors in each country. Listed below are the links to Sri Lanka and India government websites where the data are available. Sri Lanka: https://daph.gov.lk/downloads/daph-publications accessed on 3/13/2025 India: http://www.pdfmd.ernet.in/ accessed on 3/13/2025 For Bangladesh, no central FMD outbreak data repository is available. Data used here are published in the following link cited in the manuscript. Further information should be obtained by contacting the authors of that manuscript; Bangladesh: ‘Epidemiological Surveillance and Mutational Pattern Analysis of Foot-and-Mouth Disease Outbreaks in Bangladesh during 2012–2021’. https://onlinelibrary-wiley-com.ezp3.lib.umn.edu/doi/10.1155/2023/8896572 accessed on 3/13/2025 Other data source links are included in the supplementary files including the outbreak locations we used in this study. Interested researchers can replicate the study findings by obtaining the data from the third party sources and following the information outlined in the Methods section.


    Articles from PLOS One are provided here courtesy of PLOS

    RESOURCES