Machine Learning‐Based Geospatial Risk Modeling of Global Avian Influenza Outbreaks

Mehak Jindal; Samsung Lim; C Raina MacIntyre

doi:10.1155/tbed/6615342

. 2026 Apr 20;2026:6615342. doi: 10.1155/tbed/6615342

Machine Learning‐Based Geospatial Risk Modeling of Global Avian Influenza Outbreaks

Mehak Jindal ^1,², Samsung Lim ^1,^2,^✉, C Raina MacIntyre ^2,³

Editor: Zhi-Jie Zhang

PMCID: PMC13095849 PMID: 42022449

Abstract

The rapid spread of H5N1 avian influenza poses a global threat, highlighting the need for robust spatiotemporal risk assessment. In this study, we developed a global modeling framework integrating machine learning (ML) models and geospatial analysis to characterize H5N1 outbreak risk under varying environmental, ecological, and anthropogenic conditions. Confirmed H5N1 presence locations were extracted from World Animal Health Information System (WAHIS) (2012–2023), and pseudo‐absence locations were generated using a target‐group background (TGB) approach to account for heterogeneous surveillance effort. 5 ML algorithms, namely logistic regression (LR), support vector machines (SVMs), random forest (RF), light gradient boosting machine (LGBM), and extreme gradient boosting (XGB) were evaluated using spatial block cross‐validation on data from 2012 to 2021 and an independent temporal holdout dataset from 2022 to 2023. Tree‐based ensemble techniques (RF, LGBM, and XGB) achieved stronger and stable performance across both spatial and temporal validation. Seasonal Maximum Entropy (MaxEnt) models were applied to visualize broad‐scale outbreak risk patterns across the annual cycle. Seasonal maps revealed higher risk during autumn and winter, intermediate risk during spring migration, and reduced suitability during summer, consistent with large‐scale migratory connectivity, poultry production intensity, and seasonal environmental gradients. Predictor analysis indicated that livestock density and anthropogenic variables were the strongest correlates of outbreak occurrence in multivariate models, while wild bird abundance and climatic variables contributed heterogeneously and in a season‐dependent manner.

Keywords: avian influenza, disease prediction, geospatial analysis, H5N1, machine learning, MaxEnt, risk mapping

1. Introduction

Avian influenza is an infectious disease that was first identified in poultry in the early 1900s [1]. Over time, it has evolved into a zoonotic threat, with the first human infections being documented in 1997 [2]. Since 2020, there has been an unprecedented spread of the avian influenza virus (AIV) in both birds and mammals, driven largely by the H5N1 subtype of clade 2.3.4.4 b strain [3]. H5N1 has proven to be extremely infectious, lethal and has an expansive host range that now includes numerous wild bird species, various mammalian species, and humans. The mortality rate among infected humans is ~52% [4]. In 2005, the virus spread from Asia into Russia, Western Europe, Africa, and the Middle East, causing high mortality in wild bird populations [5]. By 2021, H5N1 had reached North America, with further spread to Central and South America in 2022. In the United States, the virus rapidly disseminated from the East Coast across the country, reaching the West Coast within ~4–6 months [6]. Infection of dairy cattle was documented for the first time in March 2024, and H5N1 now affects 19 States in the United States, including 758 outbreaks on dairy farms in California, the largest dairy producer in the country. Our research suggests that the first introduction into dairy farms in the United States was likely due to contact with wild birds [7]. H5N1 took about 10 months to spread along the western coast of South America, decimating populations of marine mammals such as sea lions. This suggests that the virus was introduced into the Pacific region from both the Asian and Atlantic flyways, indicating multiple independent incursions [8].

Past studies have examined the relationship between AIV outbreaks and socioeconomic and environmental factors. Migratory birds, particularly waterfowl, have been associated with the long‐distance spread of AIV along migration routes and are known as natural reservoirs of the AIV [9–12]. However, since 2020, H5N1 Infections have been documented across a much broader range of wild bird taxa, suggesting that outbreak risk may no longer be adequately described by traditional reservoir‐focused frameworks alone. In parallel, anthropogenic processes, poultry production systems, and environmental factors have been increasingly recognized as important contributors in AIV transmission [13–17].

Despite this growing complexity, global‐scale predictive modeling of H5N1 risk has remained limited, and many existing studies rely on region‐specific analyses or validation approaches that do not adequately account for spatial and temporal dependence. The rapid and uncontained spread of H5N1 underscores the urgent need to understand its transmission mechanisms. Machine learning (ML) methods offer the ability to integrate heterogeneous data sources and capture complex, nonlinear relationships, and help in identifying hidden patterns.

In this study, we develop a global modeling framework that integrates ML methods with geospatial analysis to characterize H5N1 outbreak risk under diverse environmental, ecological, and anthropogenic conditions. Confirmed outbreak locations were combined with pseudo‐absence data generated using a target‐group background (TGB) approach to better reflect heterogeneous surveillance effort. A diverse set of climatic variables, livestock density, wild bird abundance [18], anthropogenic proxies such as population density and nighttime light (NTL) intensity activity [19]. were assembled. Exploratory feature screening was employed to characterize predictor behavior and redundancy, while final model configuration was determined through spatial block cross‐validation and independent temporal hold‐out evaluation. By explicitly validating model performance across both space and time and by complementing ML predictions with seasonal presence‐only risk mapping, this study aims to provide a realistic and transferable assessment of global H5N1 risk patterns. The resulting framework is intended to support interpretation of seasonal outbreak dynamics and to inform surveillance strategies under evolving ecological and epidemiological conditions.

2. Materials and Methods

This section outlines the data sources, preprocessing steps, and modeling techniques used to assess the spatial and temporal risk distribution of H5N1 avian influenza outbreaks.

2.1. Data

2.1.1. H5N1 Disease Records

World Animal Health Information System (WAHIS) is an open‐source database managed by the World Organization for Animal Health (WOAH) [20], 2025 that provides detailed reports on confirmed disease outbreaks with the latitude, longitude, outbreak location, and affected species. The spatial precision of reported locations may vary across countries and time periods. Some reports are likely reported at or near the outbreak site, while others may correspond to administrative centroids. WAHIS does not provide metadata indicating the spatial resolution of each record, preventing explicit stratification by locational accuracy. Confirmed H5N1 outbreak records spanning 2012–2023 were used as presence in this study. These records include outbreaks reported in poultry, wild birds and mammals. For wild bird outbreaks, species‐level information was compiled and aggregated into 11 avian taxonomic families to enable consistent ecological analysis at a global scale: Anseriformes (waterfowl such as ducks, geese, and swans), Accipitriformes (hawks and eagles), Charadriiformes (shorebirds and gulls), Ciconiiformes (herons and storks), Columbiformes (pigeons and doves), Galliformes (turkeys), Gruiformes (cranes and rails), Passeriformes (sparrows and finches), Pelecaniformes (pelicans and cormorants), Podicipediformes (grebes), and Strigiformes (owls).

2.1.2. Bioclimatic Data

Giovanni is an Earth data portal that provides tools for visualizing, analyzing, and accessing remote sensing data. This study utilizes data from the Global Land Data Assimilation System (GLDAS) Noah Land Surface Model L4 Monthly 1.0° × 1.0° Version 2.1 (GLDAS_NOAH10_M) [21]. The dataset spans from January 2000 to December 2024, with a spatial resolution of 1.0° and a temporal resolution of 1 month [22, 23]. Each data file is ~2 MB in size and covers global latitudinal bounds from −60° to 90° and longitudinal bounds from −180° to 180°. GLDAS‐2.1 is reliable and has a large temporal and spatial coverage making it suitable for environmental modeling and geospatial analyses. The following environmental parameters were considered for modeling: air temperature, precipitation, soil moisture, specific humidity, surface air pressure, and wind speed.

2.1.2.1. Geographic Landscape

Vegetation indices were obtained from the moderate resolution imaging spectroradiometer (MODIS) dataset to capture vegetation dynamics. Additionally, elevation data were obtained from the ASTER global digital elevation map (GDEM) [24]. Each pixel in the raster data represents a spatial resolution of 1 arc‐second, that is, an area of ~30 m × 30 m on the ground. The dataset provides global elevation coverage between 83°N and 83°S latitude. The elevation data for the year 2019 were used consistently across all years from 2005 to 2023 to ensure uniformity in the analysis.

2.1.3. Bird Abundance Data

Weekly estimated wild bird abundance data were retrieved from eBird Status and Trends [25] for every affected wild bird species. Abundance estimates are produced at an approximate spatial resolution of 14 km × 14 km and represent modeled relative abundance rather than raw counts. Species‐level abundance estimates were retrieved for all wild bird species reported as affected by H5N1 in the WAHIS database. Each species was assigned to 1 of 11 avian taxonomic families: Accipitriformes, Anseriformes, Charadriiformes, Ciconiiformes, Columbiformes, Galliformes, Gruiformes, Passeriformes, Pelecaniformes, Podicipediformes, and Strigiformes. Family level aggregation reduces sparsity associated with species‐level reporting and supports integration with other predictors that vary at broader spatial resolutions. Weekly abundance estimates were aggregated to monthly averages to match the temporal resolution of other environmental predictors and outbreak records used in the modeling framework.

2.1.4. Anthropogenic Data

2.1.4.1. Population Density

The Gridded Population of the World dataset [26] was utilized to estimate human population density, indicating the number of people per square kilometer. It used a proportion‐assigning algorithm to assign the people count to 30‐arc second grids. The population density was then calculated by dividing the count by the land area. These data have been made available at 5‐year intervals (2000, 2005, 2010, 2015, and 2020). For this study, the most recent available dataset (2020) was used as a static proxy to represent human population distribution.

2.1.4.2. Livestock Density

The Gridded Livestock of the World dataset was used to represent poultry species densities, including cattle, buffaloes, horses, sheep, goats, pigs, chickens, and ducks. This dataset has a 5‐min arc resolution (~0.083 decimal degrees) and provides data for 2015. Dasymetric weighed layers that represent livestock numbers disaggregated within census polygons were used. They are basically livestock numbers disaggregated within census polygons according to weights established by statistical models using high‐resolution spatial covariates [27]. Livestock density variables were treated as static representations of production intensity and spatial distribution rather than as temporally dynamic predictors.

2.1.4.3. NTL

Anthropogenic activity and infrastructure were represented using NTL products derived from the visible infrared imaging radiometer suite (VIIRS) onboard the Suomi National Polar‐orbiting Partnership (Suomi NPP) satellite [28]. The VIIRS instrument includes the day/night band (DNB), which is specifically designed to detect low‐light emissions such as city lights and gas flares, enabling consistent observation of nocturnal human activity on a global scale. These data products are made freely available by the earth observation group (EOG) at the Payne Institute for Public Policy, Colorado School of Mines, in collaboration with the U.S. National Oceanic and Atmospheric Administration (NOAA). The VIIRS‐DNB NTL data is available from April 2012 onward [29] and has a spatial resolution of 15 arc‐seconds (~500 m) across a 3040 km swath, offering monthly global coverage. For this study, we utilized monthly averaged NTL composites, which were resampled to match the resolution and extent of other environmental and anthropogenic datasets used in the analysis [30].

2.2. Methods

A step‐by‐step approach was implemented to assess the environmental suitability and the outbreak risk of H5N1. First, a harmonized global dataset was assembled by integrating climatic, environmental, anthropogenic, and biological predictors from multiple sources. Second, a feature screening workflow was applied, combining univariate, multivariate, and model‐based selection techniques alongside multicollinearity diagnostics. Then candidate feature sets were used as inputs to 5 ML algorithms, including random forest (RF), support vector machines (SVMs), logistic regression (LR), light gradient boosting machine (LGBM), and extreme gradient boosting (XGB). Model performance was evaluated using the area under the receiver operating characteristic curve (AUC) and other performance metrics. Finally, seasonal Maximum Entropy (MaxEnt) models were used to generate spatial risk maps highlighting regions of high outbreak probability.

2.2.1. Spatial Harmonization of Predictors

To ensure spatial consistency across datasets, all predictors were harmonized to a global grid of 1.0° × 1.0° in geographic coordinates (EPSG:4326). The GLDAS NOAH land‐surface variables (air temperature, precipitation, soil moisture, specific humidity, surface air pressure, and wind speed) are natively provided at this spatial resolution.

Predictors available at finer resolutions, including MODIS NDVI (250–500 m), ASTER GDEM elevation (30 m), GPWv4 population density (1 km), Gridded Livestock of the World poultry densities (0.083°), VIIRS‐DNB NTLs (500 m), and eBird abundance (~14 km²), were aggregated to the 1.0° grid using bilinear interpolation. This resolution was considered because all core climatic are available at this resolution and to ensure spatial alignment and avoiding upscaling errors. The aim of this study was to model broad‐scale risk patterns rather than local farm‐scale variability. Aggregating predictors to 1.0° reduces noise from fine‐scale fluctuations while retaining global environmental gradients and ensuring computational feasibility.

2.2.2. Presence–Background Data Construction

WAHIS outbreak records provide presence‐only information. However, supervised ML algorithms require both presence (1) and absence (0) observations. To address this, absence points were considered using a TGB approach to account for spatial and reporting bias.

TGB sampling aims to approximate the spatial distribution of surveillance effort by drawing background points from locations where similar reporting processes occur, rather than from the entire geographic domain. This approach reduces bias arising from uneven reporting intensity and has been widely recommended for species distribution and disease risk modeling [31]. Background points were generated in Python by sampling locations from the same spatiotemporal domain as the observed outbreaks. This ensured that both presence and background points reflected comparable observation effort and data availability, avoiding unrealistic contrasts between well‐surveyed and poorly surveyed regions.

For each presence and background point, environmental and anthropogenic predictors were extracted using the point sampling tool in QGIS [32] from temporally matched raster datasets based on the outbreak date or sampling month. This presence‐background dataset formed the basis for all subsequent ML and MaxEnt analyses.

2.2.3. Preprocessing and Exploratory Variable Screening

To prepare predictors for ML analysis, a structured preprocessing and exploratory screening pipeline was implemented to assess data quality, redundancy, and stability. This analysis was used to characterize predictor behavior and to inform the construction of alternative feature sets.

1.
Data ceaning and outlier detection: Records containing missing values (NaN) were removed to ensure data consistency. Potential outliers were removed using the Isolation Forest algorithm [33]. This method is effective for high‐dimensional datasets since it does not assume any specific data distribution. Contamination rate is specified as a parameter during model training and defines the proportion of the most extreme data points to be flagged as anomalies. The anomaly score distribution was inspected, and approximately the lower tail of the distribution (~5% of observations) were excluded to reduce the influence of extreme outliers [34].
2.
Univariate analysis of variance (ANOVA): Initial feature relevance was assessed using ANOVA [35], which quantifies the degree to which each predictor individually discriminates between presence and absence classes. Features were ranked using the ANOVA F‐statistics calculated using the Equation (1)

\begin{matrix} F = \frac{Variance between groups}{Variance within groups} . \end{matrix}

(1)

where groups are absence (0) and presence (1). High F‐value means the feature causes large separation between groups and is a strong predictor. Low F‐value means the differences in the feature are mostly just noise in the groups and might not be helpful in predicting the outcome.

3.
Multivariate screening using AIC: To further assess the predictor relevance, we employed the AIC [36] within a generalized linear model framework. AIC measures how well a given feature contributes to predicting the response variable by assessing the log‐likelihood of the model. Since we were working with a moderate dataset size, we opted not to use the Bayesian information criterion (BIC), which applies a stricter penalty for additional parameters. AIC values were used comparatively to assess how predictor groupings influenced explanatory power while accounting for model complexity.
4.
Feature importance using RF: RF feature importance [37] was computed to explore nonlinear relationships and interactions among predictors. Feature importance scores show the relative contribution of each predictor to reducing classification impurity across the ensemble, and hence help identifying the dominant environmental, anthropogenic, and biological drivers.
5.
Correlation and multicollinearity checks: Pairwise correlation and variance inflation factors (VIF) were examined to identify groups of highly correlated predictors [38]. Tree‐based models were allowed to handle correlated predictors, while stability and performance were evaluated during cross‐validation.

Based on the exploratory assessments described above, multiple candidate feature sets were constructed, reflecting different assumptions about predictor relevance and redundancy. The candidate feature sets were evaluated using ML models. Final predictor retention was guided by spatially cross‐validated model performance.

2.2.4. ML Models

To evaluate the risk of H5N1 outbreaks 5 ML models were implemented: LR, SVM, RF, LGBM, and XGB [39–43]. These models were chosen to capture both linear and complex nonlinear relationships between predictors and outbreak occurrences. Model training, feature‐set comparison, and hyperparameter tuning were conducted using data from the training period (2012–2021). Hyperparameter tuning was done using spatial block cross‐validation to account for spatial autocorrelation. Parameter search ranges and optimal configurations are summarized in Table S1.

For spatial validation, model performance was quantified using median performance across spatially independent folds. AUC was measured which assesses the ability of a model to distinguish outbreak presences from pseudo‐absences across all probability thresholds Given class imbalance, precision‐recall AUC (PR‐AUC) was additionally used to emphasize model performance in correctly identifying outbreak locations. Model calibration was evaluated using the Brier score, which quantifies the accuracy of predicted probabilities, while overall classification accuracy was reported as a summary measure.

To assess temporal generalization, final models were retrained on the full training dataset (2012–2021) and evaluated on an independent temporal hold‐out dataset from 2022 to 2023. The hold‐out evaluation incorporated both outbreak presences and target‐group pseudo‐absences. For the temporal hold‐out, discrimination, calibration, and classification performance were summarized using AUC, PR‐AUC, Brier score, accuracy, precision, recall, F1‐score, and confusion matrices for the final selected model.

2.2.5. MaxEnt Analysis

MaxEnt modeling, a presence‐only framework [44] was employed to generate seasonal spatial risk maps of H5N1 outbreaks.

Confirmed H5N1 outbreak locations from the training period (2012–2021) were used as presence inputs. To account for changes in disease occurrence influenced by bird migration and climate, the dataset was divided into four seasonal periods of the northern hemisphere:

Season 1: December–February (winter).

Season 2: March–May (spring).

Season 3: June–August (summer).

Season 4: September–November (autumn).

For each season, the monthly raster layers of the environmental variables including temperature, precipitation, wind speed, and air pressure, were averaged to create seasonal composites. These seasonal predictors were used to fit MaxEnt models with linear and quadratic feature classes, allowing for nonlinear ecological responses while maintaining model interpretability. Season‐specific MaxEnt models were trained using data from 2012 to 2021 and subsequently projected onto environmental conditions from 2022 to 2023 to assess the temporal generalization of spatial risk patterns. Model performance was evaluated using AUC and omission rate (OR) metrics. MaxEnt outputs were interpreted as continuous relative suitability surfaces rather than binary classifications.

2.2.6. Data Visualization and Analysis

A combination of spatial and statistical visualization techniques was employed to interpret model outputs and to examine H5N1 outbreak risk patterns. These visualizations helped in summarizing model performance, exploring predictor‐response relationships, and communicating spatiotemporal risk patterns.

1.
Model performance comparison: ML models were evaluated quantitatively using spatial block cross‐validation and an independent temporal hold‐out period. Model calibration was assessed using AUC, PR‐AUC, Brier score, and overall accuracy. Performance summaries were reported using median values across spatial folds and final metrics for the temporal hold‐out period.
2.
Variable influence and response patterns: MaxEnt partial response curves were generated to assess the influence of individual environmental variables on outbreak probability while holding other variables constant. This helps in understanding potential drivers of outbreak risk and the relationship between disease prevalence and ecological factors.
3.
Geospatial risk mapping: Seasonal risk maps were generated using MaxEnt to visualize and highlight high‐risk regions. These maps provided insights into the spatiotemporal spread of H5N1 and risk variation across the annual cycle. Seasonal maps corresponding to winter (December–February), spring (March–May), summer (June–August), and autumn (September–November) were produced to illustrate how relative risk patterns vary across the annual cycle.

3. Results and Discussion

In this section, we present the results of exploratory variable screening, model performance evaluation, and spatiotemporal risk mapping.

3.1. Exploratory Feature Screening

To evaluate the robustness and relevance of candidate predictors, we applied multiple complementary screening approaches, including anomaly detection, univariate ranking, multivariate diagnostics, ML‐based importance measures, and collinearity assessment. These analyses were used to characterize predictor behavior and redundancy.

Figure 1 illustrates the distribution of anomaly scores derived using the Isolation Forest algorithm. The red dashed line represents the model’s decision boundary at an anomaly score of zero, which separates typical observations from anomalous patterns in the multivariate predictor space. Approximately 5% of observations in the lower tail of the anomaly score distribution were identified as extreme anomalies and excluded. The right‐skewed distribution indicates that only a small subset of observations deviated substantially from majority of the data.

Distribution of anomaly scores using the isolation forest algorithm.

Figure 2 displays the top features ranked by ANOVA F‐statistics, which quantify the degree of univariate separation between H5N1 outbreak presence and TGB pseudo‐absence locations. Bird abundance features and livestock density including Passeriformes, Columbiformes, Pelecaniformes, chicken, and cattle density were ranked among the highest, indicating strong univariate separation. Climatic and environmental variables generally ranked lower than bird abundance and livestock predictors. Air temperature showed moderate univariate separation, whereas precipitation, soil moisture, surface pressure, wind speed, and specific humidity exhibited comparatively low F‐statistics. Latitude and longitude ranked toward the lower end of the distribution. The ANOVA rankings were used to construct candidate predictor subsets of varying sizes.

To further examine predictor relevance in a multivariate context, stepwise regression using the AIC was conducted (Figure S1). The initial model exhibited a high AIC value, which decreased as predictors with negligible contributions to model likelihood were removed. Some predictors, including elevation and selected bird families like Accipitriformes, contributed minimally within this linear modeling framework.

RF feature importance rankings are shown in Figure 3. The importance scores summarize how frequently and effectively each predictor contributed to reducing node impurity during tree construction and are used here to rank predictors according to their relative contribution in a nonlinear model. Livestock density variables and anthropogenic proxies, including cattle, chicken, pig density, and NTL, ranked highest, indicating that production intensity and human activity are dominant correlates of outbreak occurrence in the model. Wild bird abundance variables exhibited heterogeneous contributions. While Columbiformes and Passeriformes showed moderate importance, Anseriformes ranked lower relative to several other predictors, and no single bird group dominated the model. Climatic variables displayed comparatively lower importance scores in this ranking.

Top features using the random forest algorithm.

Figure 4 displays the pairwise Pearsons correlation coefficients among environmental, anthropogenic, livestock, and wild bird abundance predictors. Overall, most predictor pairs exhibited weak to moderate correlations, indicating that no single variable was a near‐linear substitute for another across the global dataset.

Clusters of moderate correlation were observed among livestock density variables (cattle, chicken, pig, sheep, and buffalo densities), reflecting shared agricultural and production across regions. Similarly, several wild bird family abundance metrics showed moderate positive correlations, consistent with overlapping ecological niches and co‐occurrence patterns at broad spatial scales. Anthropogenic proxies such as population density and NTL were also moderately correlated with livestock densities, reflecting underlying links between human activity and animal production systems.

Climatic variables, including air temperature, precipitation, soil moisture, wind speed, specific humidity, and surface pressure, showed low‐to‐moderate pairwise correlations, with no single climatic variable showing uniformly strong correlation with all others. Latitude and longitude showed weak correlations with most predictors, reflecting large‐scale geographic gradients in climate and land use. Importantly, no pattern of near‐perfect multicollinearity was observed across the predictor set. While some variables formed correlated groups, particularly within thematic categories (e.g., livestock densities or bird families), correlations were not extreme.

Figure 5 presents VIF values on a logarithmic scale to accommodate the wide range of multicollinearity levels observed across variables. Most climatic predictors exhibited high VIF values, reflecting a strong linear dependence among atmospheric variables. Moderate VIF values were observed for latitude and elevation, suggesting a partial association with climatic predictors. In contrast, most livestock density variables, wild bird family abundance metrics, vegetation index, and anthropogenic proxies (e.g., NTL and population density) exhibited relatively low VIF values, indicating limited linear redundancy.

Variance inflation factor scores of features.

Overall, exploratory feature screening revealed that no single predictor consistently dominated across univariate, multivariate, and nonlinear analyses and that outbreak risk is associated with a combination of livestock, anthropogenic, host‐related, and environmental factors. These analyses were therefore used to understand data structure, redundancy, and predictor behavior and to guide the construction and comparison of candidate feature sets. This was important for ensuring that subsequent ML models were evaluated within an informed framework, with final predictor configuration determined by spatially and temporally validated performance.

3.2. ML Model Performance

3.2.1. Spatial Cross‐Validation Performance

Model performance under spatial block cross‐validation on training data from 2012 to 2021 is summarized in Table 1. Across all algorithms, high discriminatory performance was observed when evaluated on spatially independent folds, indicating strong generalization beyond localized outbreak clusters. Tree‐based models consistently outperformed linear approaches.

Table 1.

Spatial block cross‐validation performance of machine‐learning models on training data (2012–2021).

Model	AUC median	PR AUC median	Brier median	Accuracy median
Logistic regression	0.948	0.987	0.088	0.905
Support vector machines	0.964	0.982	0.083	0.908
Random forest	0.984	0.996	0.065	0.929
Light gradient boosting machine	0.978	0.993	0.069	0.918
Extreme gradient boosting	0.983	0.996	0.079	0.903

Open in a new tab

RF achieved the highest median performance, with a median AUC of 0.984 (IQR: 0.957–0.990) and PR‐AUC of 0.996 (IQR: 0.965–0.998), accompanied by the lowest median Brier score (0.065). XGB and LGBM showed similarly strong discrimination, with median AUC values of 0.983 and 0.978, respectively. LR and SVM exhibited slightly lower but still high spatial performance, with median AUC values exceeding 0.94 across models. Overall, spatial cross‐validation results indicate that the models capture geographically transferable signal.

3.2.2. Feature‐Set Sensitivity Under Spatial Validation

To evaluate sensitivity to predictor selection, multiple candidate feature sets informed by exploratory screening were compared under spatial block cross‐validation. Table S2 reports median performance metrics and ΔAUC values relative to the best‐performing feature set for each model. LR and SVM models showed greater sensitivity to feature reduction; however, their overall performance remained lower than that of tree‐based methods. Based on these results, the full predictor set was retained for final model training to maximize spatial generalization and stability. For all tree‐based models (RF, XGB, and LGBM), the full predictor set consistently achieved the highest spatially validated performance. Reduced feature sets including ecological and tree‐specific subsets resulted in low median AUC, often exceeding the predefined tolerance (ΔAUC ≤ 0.01). Feature sets informed by univariate ANOVA or RF rankings retained competitive performance in some cases but did not consistently match the full predictor set across models.

3.2.3. Temporal Hold‐Out Evaluation

Final model performance was evaluated on an independent temporal hold‐out dataset from 2022 to 2023, incorporating both outbreak presences and pseudo‐absences (Table 2). Temporal performance closely mirrored spatial cross‐validation results, indicating robust generalization across time.

Table 2.

Model performance on an independent temporal hold‐out dataset (2022–2023).

Model	Accuracy	Brier	Log loss	AUC	PR AUC
Random forest	0.948	0.038	0.133	0.990	0.977
Extreme gradient boosting	0.935	0.048	0.185	0.985	0.966
Light gradient boosting machine	0.932	0.053	0.196	0.981	0.959
Support vector machines	0.933	0.053	0.208	0.968	0.944
Logistic regression	0.946	0.046	0.227	0.952	0.941

Open in a new tab

RF achieved the strongest performance, with an AUC of 0.989, PR‐AUC of 0.977, and the lowest Brier score (0.038). XGB and LGBM also performed strongly, with AUC values of 0.984 and 0.981, respectively. LR and SVM exhibited lower but still high temporal discrimination, with AUC values exceeding 0.950.

Although ML models produced a strong predictive performance during model benchmarking, they rely on availability of presence‐absence data. In this study, the WAHIS database provides presence‐only outbreak reports. The pseudo‐absence points generated were suitable for comparative model evaluation but are not appropriate for global risk mapping as they may introduce spatial sampling bias. However, MaxEnt is based on presence‐only modeling and incorporates background sampling, regularization, and environmental constraints. Hence, MaxEnt is more robust for producing spatial risk maps for H5N1 and was adopted as the final modeling framework for generating risk projections, while the ensemble models were used to assess covariate importance and to benchmark predictive performance.

3.3. Seasonal MaxEnt‐Based Risk Mapping

Seasonal MaxEnt models were used as a presence‐only framework to visualize broad‐scale spatial patterns in H5N1 outbreak suitability across the annual cycle. Model performance metrics for each season are summarized in Table 3. Across all seasons, MaxEnt models achieved high discriminatory ability, with AUC values ranging from 0.90 to 0.93, indicating that outbreak locations were consistently differentiated from background locations in environmental space.

Table 3.

Metrics for MaxEnt algorithm on presence data for different seasons.

Season	Omission rate	AUC
1 (December, January, February)	0.1235	0.9047
2 (March, April, May)	0.1510	0.9329
3 (June, July, August)	0.3435	0.9301
4 (September, October, November)	0.2653	0.9197

Open in a new tab

Season 2 (March–May) shows the highest AUC of (0.933), while Season 1 (December–February) had the lowest AUC of (0.905) as well as the lowest OR (0.123), indicating that a large proportion of outbreak locations were captured by the model during winter. The OR of Season 3 (June–August) is slightly higher (0.344) suggesting a reduced alignment between the selected environmental predictors and outbreak locations during the summer months. Seasonal differences in ORs highlight variation in how well presence‐only environmental suitability captures outbreak patterns across the year.

ROC curves for the four seasonal models (Figure S2) exhibit similar shapes, indicating stable discrimination across seasons. OR curves (Figure S3) show the expected trade‐off between sensitivity and specificity as suitability thresholds increase, with comparatively higher omission observed in summer, consistent with the seasonal AUC results.

Partial response curves (Figure 6) illustrate how individual predictors influence predicted suitability within each seasonal model. Air temperature shows a positive association with predicted suitability across all seasons; however, this pattern likely reflects indirect associations with geographic, ecological, or anthropogenic gradients rather than a direct effect on viral survival. Livestock density variables (chicken, duck, and pig density) exhibit strong and generally monotonic associations with predicted suitability, highlighting the importance of poultry production systems in shaping outbreak risk patterns. Wild bird abundance variables show heterogeneous seasonal responses, with different bird families contributing variably across seasons. While Anseriformes, Columbiformes, and Podicipediformes display elevated suitability in some seasons, no single bird family consistently dominates across all seasonal models. Anthropogenic proxies, including population density and NTL, show positive associations with suitability in several seasons, indicating that human activity and production intensity are correlated with outbreak occurrence. Meteorological variables such as wind speed and surface pressure exhibit weaker and nonlinear responses, suggesting secondary or context‐dependent influence on suitability patterns.

Partial response curve of every feature across different seasons using MaxEnt.

Overall, the seasonal MaxEnt results highlight broad spatiotemporal patterns of relative outbreak suitability and seasonal variation in predictor‐response relationships. These maps are intended to support qualitative interpretation of seasonal risk dynamics and complement the spatially and temporally validated ML results. The observed seasonal variability underscores the importance of considering temporal context when interpreting environmental suitability for H5N1 outbreaks.

3.4. Seasonal Risk Mapping of H5N1 Outbreaks Using MaxEnt

Seasonal MaxEnt models based on presence‐only outbreak data were used to map the relative probability of H5N1 occurrence across the annual cycle (Figures 7–10). Probabilities were grouped into four classes (0–0.25, 0.25−0.50, 0.50–0.75, 0.75−1.0) to highlight areas of elevated risk.

3.4.1. Season 1 (December–February)

During the winter, extensive high‐probability zones are visible across Europe, the Eastern United States, and large parts of South and East Asia, particularly India and China (Figure 7). Elevated risk also appears in parts of South America (Argentina and Southern Brazil) and along coastal regions where migratory waterbirds overwinter and poultry production is dense. Northern and Eastern Australia show moderate suitability, but areas east of the Wallace Line remain largely at low risk, consistent with this biogeographical boundary that demarcates a stark difference in species composition between Southeast Asia and Australasia and may act as a natural barrier to the movement of certain bird species. This ecological natural barrier could help limit the southward spread of H5N1 to Australia by restricting the overlap of avian populations that usually carry the virus.

3.4.2. Season 2 (March–May)

In spring, high‐risk regions persist across Europe and Asia, again with strong signals over the Indo‐Gangetic plain and Eastern China (Figure 8). This period coincides with northward migration toward breeding grounds, which likely concentrates infection risk along major flyways and stop‐over wetlands. Much of Africa shows low to moderate suitability, whereas risk in South America decreases relative to winter, reflecting shifting environmental suitability and migration routes.

3.4.3. Season 3 (June–August)

Summer maps show a general reduction in predicted risk (Figure 9). Most regions display low to moderate probabilities, with only scattered hotspots remaining in parts of Northern Eurasia and localized areas of Asia. By this time, many migratory birds have reached breeding grounds, and long‐distance movement is reduced, which may lessen opportunities for long‐range spread. Residual risk during this period is likely driven more by local poultry production systems and resident bird populations than by large‐scale migration.

3.4.4. Season 4 (September–November)

Autumn shows a renewed expansion of high‐probability areas, particularly across Europe and temperate Asia (Figure 10). A broad belt of elevated risk stretches from Western Europe through Central Asia into East Asia, mirroring southward return migration and use of agricultural and wetland stop‐over sites. India, Eastern China, and parts of the Mediterranean again emerge as persistent hotspots, consistent with large poultry populations and suitable environmental conditions. The spatial pattern in Season 4 closely resembles that of Season 1, indicating a cyclical risk structure driven by migration and seasonal climate.

Overall, the seasonal MaxEnt maps highlight a strong annual cycle in predicted H5N1 suitability: risk is highest during the main migration periods (Seasons 1 and 4), intermediate during spring migration (Season 2), and lowest when birds are largely settled on breeding or wintering grounds (Season 3). These patterns support the role of migratory connectivity and seasonal environments in shaping global H5N1 risk and underscore the value of season‐specific surveillance and control strategies, particularly along major flyways and in regions with dense poultry populations.

4. Conclusion

This study presents a global, spatiotemporally validated assessment of H5N1 outbreak risk using an integrated ML framework. By combining environmental conditions, livestock density, wild bird abundance, and anthropogenic proxies. Tree‐based ML models, particularly RF, consistently demonstrated the strongest and most stable performance under both spatial and temporal validation. Feature‐set sensitivity analyses showed that retaining the full predictor set yielded superior generalization compared with reduced subsets, indicating that outbreak risk is shaped by the combined influence of multiple correlated factors rather than a single dominant driver. Livestock density and anthropogenic variables emerged as the strongest correlates of outbreak occurrence in multivariate models, while wild bird abundance and climatic variables contributed heterogeneously and in a season‐dependent manner. Although certain wild bird families, including Columbiformes and Passeriformes, showed elevated associations with predicted risk in some analyses, no single avian group consistently dominated across models or seasons. Similarly, apparent positive associations with climatic variables such as temperature likely reflect indirect relationships with geographic, ecological, or anthropogenic gradients.

The presence‐only seasonal MaxEnt model was used to visualize broad‐scale patterns of relative outbreak suitability across the annual cycle. These maps revealed a clear seasonal rhythm, with elevated relative risk during autumn and winter, intermediate risk during spring migration, and reduced suitability during summer months. These patterns are consistent with large‐scale migratory connectivity, poultry production intensity, and seasonal environmental gradients. The risk maps suggest that Australia remains comparatively low risk relative to other continents, with limited suitability east of the Wallace Line. This biogeographical boundary may reduce overlap between Asian and Australasian avifauna and thus constrain long‐distance viral spread. However, the recent detection of H5N1 in Antarctica highlights the need for continued vigilance as changes in migratory connectivity or environmental conditions could alter regional risk profiles in the future.

Overall, this study demonstrates the value of integrating spatially and temporally validated ML models with presence‐only mapping to understand global H5N1 risk patterns. Future work could further refine predictions by incorporating dynamic poultry trade data, real‐time outbreak reporting, and explicitly spatiotemporal modeling approaches. Such advances would strengthen early warning systems and support more targeted surveillance and intervention strategies aimed at limiting the global spread of highly pathogenic avian influenza.

Funding

No funding was received for this manuscript. Open access publishing facilitated by University of New South Wales, as part of the Wiley ‐ University of New South Wales agreement via the Council of Australasian University Librarians.

Conflicts of Interest

The authors declare no conflicts of interest.

Supporting Information

Additional supporting information can be found online in the Supporting Information section.

Supporting information

Supporting Information Figure S1. AIC scores during stepwise regression. Figure S2. Receiver operating characteristic (ROC) curve for each season. Figure S3. Omission rate (OR) curve for each season. Table S1. Parameter search ranges and optimal configurations using spatial cross‐validation on training data (2012–2021). Table S2. Differences in median AUC (ΔAUC) across candidate feature sets for each machine‐learning model under spatial block cross‐validation (training period 2012–2021), relative to the best‐performing feature set for each model.

TBED-2026-6615342-s001.docx^{(259.6KB, docx)}

Jindal, Mehak , Lim, Samsung , MacIntyre, C. Raina , Machine Learning‐Based Geospatial Risk Modeling of Global Avian Influenza Outbreaks, Transboundary and Emerging Diseases, 2026, 6615342, 15 pages, 2026. 10.1155/tbed/6615342

Academic Editor: Zhi‐Jie Zhang

Contributor Information

Samsung Lim, Email: s.lim@unsw.edu.au.

Zhi-Jie Zhang, Email: epistat@gmail.com.

Data Availability Statement

Data available upon request from the authors.

References

1. Alexander D. J. and Brown I. H., History of Highly Pathogenic Avian Influenza, Revue Scientifique et Technique de l’OIE. (2009) 28, no. 1, 19–38, 10.20506/rst.28.1.1856, 2-s2.0-68149159094. [DOI] [PubMed] [Google Scholar]
2. Peiris J. S. M., de Jong M. D., and Guan Y., Avian Influenza Virus (H5N1): A Threat to Human Health, Clinical Microbiology Reviews. (2007) 20, no. 2, 243–267, 10.1128/CMR.00037-06, 2-s2.0-34248170234. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Peacock T. P., Moncla L., and Dudas G., et al.The Global H5N1 Influenza Panzootic in Mammals, Nature. (2025) 637, no. 8045, 304–313, 10.1038/s41586-024-08054-z. [DOI] [PubMed] [Google Scholar]
4. Merrifield R., Human Bird Flu Case in UK Confirmed after “Contact” at Farm—As Experts Warn Bug Is “One Mutation from Pandemic, The Scottish Sun, 2025, https://www.thescottishsun.co.uk/health/14239058/first-human-bird-flu-case-in-uk.
5. Jindal M., Stone H., Lim S., and MacIntyre C. R., A Geospatial Perspective Toward the Role of Wild Bird Migrations and Global Poultry Trade in the Spread of Highly Pathogenic Avian Influenza H5N1, GeoHealth. (2025) 9, no. 3, 10.1029/2024GH001296, 2024GH001296. [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Caliendo V., Lewis N. S., and Pohlmann A., et al.Transatlantic Spread of Highly Pathogenic Avian Influenza H5N1 by Wild Birds From Europe to North America in 2021, Scientific Reports. (2022) 12, no. 1, 10.1038/s41598-022-13447-z, 11729. [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Stone H., Jindal M., and Lim S., et al.Potential Pathways of Spread of Highly Pathogenic Avian Influenza A/H5N1 Clade 2.3.4.4b Across Dairy Farms in the United States, medRxiv. (2024) 10.1101/2024.05.02.24306785, 24306785. [DOI] [Google Scholar]
8. Alkie T. N., Lopes S., and Hisanaga T., et al.A Threat From Both Sides: Multiple Introductions of Genetically Distinct H5 HPAI Viruses into Canada via Both East Asia-Australasia/Pacific and Atlantic Flyways, Virus Evolution. (2022) 8, no. 2, 10.1093/ve/veac077, veac077. [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Si Y., Skidmore A. K., and Wang T., et al.Spatio-Temporal Dynamics of Global H5N1 Outbreaks Match Bird Migration Patterns, Geospatial Health. (2009) 4, no. 1, 10.4081/gh.2009.211, 2-s2.0-70450159430, 65. [DOI] [PubMed] [Google Scholar]
10. Price K., Bird Flu Is Picking its Way Across the Animal Kingdom—and Climate Change Could Be Making It Worse, Inside Climate News, 2024.
11. Xiao X., Gilbert M., Slingenbergh J., Lei F., and Boles S., Remote Sensing, Ecological Variables, and Wild Bird Migration Related to Outbreaks of Highly Pathogenic H5N1 Avian Influenza, Journal of Wildlife Diseases. (2007) 43, no. 3 (suppl.), S40–S46. [PMC free article] [PubMed] [Google Scholar]
12. Tian H., Zhou S., and Dong L., et al.Avian Influenza H5N1 Viral and Bird Migration Networks in Asia, Proceedings of the National Academy of Sciences. (2015) 112, no. 1, 172–177, 10.1073/pnas.1405216112, 2-s2.0-84920375388. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Si Y., Wang T., Skidmore A. K., de Boer W. F., Li L., and Prins H. H. T., Environmental Factors Influencing the Spread of the Highly Pathogenic Avian Influenza H5N1 Virus in Wild Birds in Europe, Ecology and Society. (2010) 15, no. 3, https://www.jstor.org/stable/26268169, 10.5751/ES-03622-150326, 2-s2.0-77958483625, 26268169. [DOI] [Google Scholar]
14. Carrel M. A., Emch M., Nguyen T., Todd Jobe R., and Wan X.-F., Population-Environment Drivers of H5N1 Avian Influenza Molecular Change in Vietnam, Health & Place. (2012) 18, no. 5, 1122–1131, 10.1016/j.healthplace.2012.04.009, 2-s2.0-84864825935. [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Sooryanarain H. and Elankumaran S., Environmental Role in Influenza Virus Outbreaks, Annual Review of Animal Biosciences. (2015) 3, no. 1, 347–373, 10.1146/annurev-animal-022114-111017, 2-s2.0-84923163169. [DOI] [PubMed] [Google Scholar]
16. Chen W., Zhang X., Zhao W., Yang L., Wang Z., and Bi H., Environmental Factors and Spatiotemporal Distribution Characteristics of the Global Outbreaks of the Highly Pathogenic Avian Influenza H5N1, Environmental Science and Pollution Research. (2022) 29, no. 29, 44175–44185, 10.1007/s11356-022-19016-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Prosser D. J., Teitelbaum C. S., Yin S., Hill N. J., and Xiao X., Climate Change Impacts on Bird Migration and Highly Pathogenic Avian Influenza, Nature Microbiology. (2023) 8, no. 12, 2223–2225, 10.1038/s41564-023-01538-0. [DOI] [PubMed] [Google Scholar]
18. Yang Q., Wang B., and Lemey P., et al.Synchrony of Bird Migration with Global Dispersal of Avian Influenza Reveals Exposed Bird Orders, Nature Communications. (2024) 15, no. 1, 10.1038/s41467-024-45462-1, 1126. [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Esposito M. M., Turku S., Lehrfield L., and Shoman A., The Impact of Human Activities on Zoonotic Infection Transmissions, Animals. (2023) 13, no. 10, 10.3390/ani13101646, 1646. [DOI] [PMC free article] [PubMed] [Google Scholar]
20. World Organisation for Animal Health (WOAH), World Animal Health Information System (WAHIS) [Dataset]. World Organisation for Animal Health (WOAH), 2025, https://wahis.woah.org/#/event-management.
21. Houser P. R. and Rodell M., GLDAS Noah Land Surface Model L4 monthly 1.0 × 1.0° V2.1, Earth Sciences Data and Information Services Center, 2020, GES DISC. [Google Scholar]
22. Beaudoing H., Rodell M., and NASA/GSFC/HSL, GLDAS Noah Land Surface Model L4 Monthly 0.25 x 0.25 Degree, Version 2.1 [Dataset], NASA Goddard Earth Sciences Data and Information Services Center, 2020. [Google Scholar]
23. Rodell M., Houser P. R., and Jambor U., et al.The Global Land Data Assimilation System, Bulletin of the American Meteorological Society. (2004) 85, no. 3, 381–394, 10.1175/BAMS-85-3-381, 2-s2.0-11144356588. [DOI] [Google Scholar]
24. NASA/METI/AIST/Japan Spacesystems and U.S./Japan ASTER Science Team ASTER Global Digital Elevation Model Version, 3 (Version 3), NASA EOSDIS Land Processes DAAC, 2019, 10.5067/ASTER/ASTGTM.003. [DOI] [Google Scholar]
25. Fink D., Auer T., and Johnston A., et al.eBird Status and Trends (Version 2022), 2023, Cornell Lab of Ornithology, 10.2173/ebirdst.2022. [DOI] [Google Scholar]
26. Center For International Earth Science Information Network-CIESIN-Columbia University, Gridded Population of the World, Version 4 (GPWv4): Population Count, 2018, NASA SEDAC, 10.7927/H49C6VHW. [DOI] [Google Scholar]
27. Gilbert M., Nicolas G., and Cinardi G., et al.Global Distribution Data for Cattle, Buffaloes, Horses, Sheep, Goats, Pigs, Chickens and Ducks in 2010, Scientific Data. (2018) 5, no. 1, 10.1038/sdata.2018.227, 2-s2.0-85055614271, 180227. [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Elvidge C. D., Baugh K., Zhizhin M., Hsu F. C., and Ghosh T., VIIRS Night-Time Lights, International Journal of Remote Sensing. (2017) 38, no. 21, 5860–5879, 10.1080/01431161.2017.1342050, 2-s2.0-85021300522. [DOI] [Google Scholar]
29. Hillger D., Kopp T., and Lee T., et al.First-Light Imagery From Suomi NPP VIIRS, Bulletin of the American Meteorological Society. (2013) 94, no. 7, 1019–1029, 10.1175/BAMS-D-12-00097.1, 2-s2.0-84880893492. [DOI] [Google Scholar]
30. NOAA, NOAA/NGDC - Earth Observation Group—Defense Meteorological Satellite Progam, 1992, https://ngdc.noaa.gov/eog/sensors/ols.html. [Google Scholar]
31. Phillips S. J., Dudík M., and Elith J., et al.Sample Selection Bias and Presence-Only Distribution Models: Implications for Background and Pseudo-Absence Data, Ecological Applications. (2009) 19, no. 1, 181–197, 10.1890/07-2153.1, 2-s2.0-63849333773. [DOI] [PubMed] [Google Scholar]
32. Jurgiel B., Point Sampling Tool [QGIS Plugin] (Version 0.5.4), QGIS Python Plugins Repository, 2022, https://plugins.qgis.org/plugins/pointsamplingtool/. [Google Scholar]
33. Liu F. T., Ting K. M., and Zhou Z.-H., Isolation Forest, 2008, 2008 Eighth IEEE International Conference on Data Mining, 2008, 413–422, 10.1109/ICDM.2008.17, 2-s2.0-67049142378. [DOI] [Google Scholar]
34. Dataman C. K., 2024, Handbook of Anomaly Detection—(4) Isolation Forest. Dataman in AI.
35. Guyon I. and Elisseeff A., An Introduction to Variable and Feature Selection, 2003.
36. Cavanaugh J. E. and Neath A. A., The Akaike Information Criterion: Background, Derivation, Properties, Application, Interpretation, and Refinements, WIREs Computational Statistics. (2019) 11, no. 3, 10.1002/wics.1460, 2-s2.0-85062947290. [DOI] [Google Scholar]
37. Breiman L., Random Forests, Machine Learning. (2001) 45, no. 1, 5–32, 10.1023/A:1010933404324, 2-s2.0-0035478854. [DOI] [Google Scholar]
38. Dormann C. F., Elith J., and Bacher S., et al.Collinearity: A Review of Methods to Deal With it and a Simulation Study Evaluating Their Performance, Ecography. (2013) 36, no. 1, 27–46, 10.1111/j.1600-0587.2012.07348.x, 2-s2.0-84874725861. [DOI] [Google Scholar]
39. Cox D. R., The Regression Analysis of Binary Sequences, Journal of the Royal Statistical Society Series B: Statistical Methodology. (1958) 20, no. 2, 215–232, https://www.jstor.org/stable/2983890, 10.1111/j.2517-6161.1958.tb00292.x. [DOI] [Google Scholar]
40. Cortes C. and Vapnik V., Support-Vector Networks, Machine Learning. (1995) 20, no. 3, 273–297, 10.1023/A:1022627411411, 2-s2.0-34249753618. [DOI] [Google Scholar]
41. Quinlan J. R., Induction of Decision Trees, Machine Learning. (1986) 1, no. 1, 81–106, 10.1023/A:1022643204877, 2-s2.0-33744584654. [DOI] [Google Scholar]
42. Ke G., Meng Q., and Finley T., et al.LightGBM: A highly efficient gradient boosting decision tree, 17, Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, 3149–3157, https://dl.acm.org/doi/10.5555/3294996.3295074. [Google Scholar]
43. Friedman J. H., Greedy Function Approximation: A Gradient Boosting Machine, The Annals of Statistics. (2001) 29, no. 5, 1189–1232, 10.1214/aos/1013203451. [DOI] [Google Scholar]
44. Phillips S. J., Anderson R. P., and Schapire R. E., Maximum Entropy Modeling of Species Geographic Distributions, Ecological Modelling. (2006) 190, no. 3-4, 231–259, 10.1016/j.ecolmodel.2005.03.026, 2-s2.0-33746218412. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

TBED-2026-6615342-s001.docx^{(259.6KB, docx)}

Data Availability Statement

Data available upon request from the authors.

[bib-0001] 1. Alexander D. J. and Brown I. H., History of Highly Pathogenic Avian Influenza, Revue Scientifique et Technique de l’OIE. (2009) 28, no. 1, 19–38, 10.20506/rst.28.1.1856, 2-s2.0-68149159094. [DOI] [PubMed] [Google Scholar]

[bib-0002] 2. Peiris J. S. M., de Jong M. D., and Guan Y., Avian Influenza Virus (H5N1): A Threat to Human Health, Clinical Microbiology Reviews. (2007) 20, no. 2, 243–267, 10.1128/CMR.00037-06, 2-s2.0-34248170234. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-0003] 3. Peacock T. P., Moncla L., and Dudas G., et al.The Global H5N1 Influenza Panzootic in Mammals, Nature. (2025) 637, no. 8045, 304–313, 10.1038/s41586-024-08054-z. [DOI] [PubMed] [Google Scholar]

[bib-0004] 4. Merrifield R., Human Bird Flu Case in UK Confirmed after “Contact” at Farm—As Experts Warn Bug Is “One Mutation from Pandemic, The Scottish Sun, 2025, https://www.thescottishsun.co.uk/health/14239058/first-human-bird-flu-case-in-uk.

[bib-0005] 5. Jindal M., Stone H., Lim S., and MacIntyre C. R., A Geospatial Perspective Toward the Role of Wild Bird Migrations and Global Poultry Trade in the Spread of Highly Pathogenic Avian Influenza H5N1, GeoHealth. (2025) 9, no. 3, 10.1029/2024GH001296, 2024GH001296. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-0006] 6. Caliendo V., Lewis N. S., and Pohlmann A., et al.Transatlantic Spread of Highly Pathogenic Avian Influenza H5N1 by Wild Birds From Europe to North America in 2021, Scientific Reports. (2022) 12, no. 1, 10.1038/s41598-022-13447-z, 11729. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-0007] 7. Stone H., Jindal M., and Lim S., et al.Potential Pathways of Spread of Highly Pathogenic Avian Influenza A/H5N1 Clade 2.3.4.4b Across Dairy Farms in the United States, medRxiv. (2024) 10.1101/2024.05.02.24306785, 24306785. [DOI] [Google Scholar]

[bib-0008] 8. Alkie T. N., Lopes S., and Hisanaga T., et al.A Threat From Both Sides: Multiple Introductions of Genetically Distinct H5 HPAI Viruses into Canada via Both East Asia-Australasia/Pacific and Atlantic Flyways, Virus Evolution. (2022) 8, no. 2, 10.1093/ve/veac077, veac077. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-0009] 9. Si Y., Skidmore A. K., and Wang T., et al.Spatio-Temporal Dynamics of Global H5N1 Outbreaks Match Bird Migration Patterns, Geospatial Health. (2009) 4, no. 1, 10.4081/gh.2009.211, 2-s2.0-70450159430, 65. [DOI] [PubMed] [Google Scholar]

[bib-0010] 10. Price K., Bird Flu Is Picking its Way Across the Animal Kingdom—and Climate Change Could Be Making It Worse, Inside Climate News, 2024.

[bib-0011] 11. Xiao X., Gilbert M., Slingenbergh J., Lei F., and Boles S., Remote Sensing, Ecological Variables, and Wild Bird Migration Related to Outbreaks of Highly Pathogenic H5N1 Avian Influenza, Journal of Wildlife Diseases. (2007) 43, no. 3 (suppl.), S40–S46. [PMC free article] [PubMed] [Google Scholar]

[bib-0012] 12. Tian H., Zhou S., and Dong L., et al.Avian Influenza H5N1 Viral and Bird Migration Networks in Asia, Proceedings of the National Academy of Sciences. (2015) 112, no. 1, 172–177, 10.1073/pnas.1405216112, 2-s2.0-84920375388. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-0013] 13. Si Y., Wang T., Skidmore A. K., de Boer W. F., Li L., and Prins H. H. T., Environmental Factors Influencing the Spread of the Highly Pathogenic Avian Influenza H5N1 Virus in Wild Birds in Europe, Ecology and Society. (2010) 15, no. 3, https://www.jstor.org/stable/26268169, 10.5751/ES-03622-150326, 2-s2.0-77958483625, 26268169. [DOI] [Google Scholar]

[bib-0014] 14. Carrel M. A., Emch M., Nguyen T., Todd Jobe R., and Wan X.-F., Population-Environment Drivers of H5N1 Avian Influenza Molecular Change in Vietnam, Health & Place. (2012) 18, no. 5, 1122–1131, 10.1016/j.healthplace.2012.04.009, 2-s2.0-84864825935. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-0015] 15. Sooryanarain H. and Elankumaran S., Environmental Role in Influenza Virus Outbreaks, Annual Review of Animal Biosciences. (2015) 3, no. 1, 347–373, 10.1146/annurev-animal-022114-111017, 2-s2.0-84923163169. [DOI] [PubMed] [Google Scholar]

[bib-0016] 16. Chen W., Zhang X., Zhao W., Yang L., Wang Z., and Bi H., Environmental Factors and Spatiotemporal Distribution Characteristics of the Global Outbreaks of the Highly Pathogenic Avian Influenza H5N1, Environmental Science and Pollution Research. (2022) 29, no. 29, 44175–44185, 10.1007/s11356-022-19016-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-0017] 17. Prosser D. J., Teitelbaum C. S., Yin S., Hill N. J., and Xiao X., Climate Change Impacts on Bird Migration and Highly Pathogenic Avian Influenza, Nature Microbiology. (2023) 8, no. 12, 2223–2225, 10.1038/s41564-023-01538-0. [DOI] [PubMed] [Google Scholar]

[bib-0018] 18. Yang Q., Wang B., and Lemey P., et al.Synchrony of Bird Migration with Global Dispersal of Avian Influenza Reveals Exposed Bird Orders, Nature Communications. (2024) 15, no. 1, 10.1038/s41467-024-45462-1, 1126. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-0019] 19. Esposito M. M., Turku S., Lehrfield L., and Shoman A., The Impact of Human Activities on Zoonotic Infection Transmissions, Animals. (2023) 13, no. 10, 10.3390/ani13101646, 1646. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-0020] 20. World Organisation for Animal Health (WOAH), World Animal Health Information System (WAHIS) [Dataset]. World Organisation for Animal Health (WOAH), 2025, https://wahis.woah.org/#/event-management.

[bib-0021] 21. Houser P. R. and Rodell M., GLDAS Noah Land Surface Model L4 monthly 1.0 × 1.0° V2.1, Earth Sciences Data and Information Services Center, 2020, GES DISC. [Google Scholar]

[bib-0022] 22. Beaudoing H., Rodell M., and NASA/GSFC/HSL, GLDAS Noah Land Surface Model L4 Monthly 0.25 x 0.25 Degree, Version 2.1 [Dataset], NASA Goddard Earth Sciences Data and Information Services Center, 2020. [Google Scholar]

[bib-0023] 23. Rodell M., Houser P. R., and Jambor U., et al.The Global Land Data Assimilation System, Bulletin of the American Meteorological Society. (2004) 85, no. 3, 381–394, 10.1175/BAMS-85-3-381, 2-s2.0-11144356588. [DOI] [Google Scholar]

[bib-0024] 24. NASA/METI/AIST/Japan Spacesystems and U.S./Japan ASTER Science Team ASTER Global Digital Elevation Model Version, 3 (Version 3), NASA EOSDIS Land Processes DAAC, 2019, 10.5067/ASTER/ASTGTM.003. [DOI] [Google Scholar]

[bib-0025] 25. Fink D., Auer T., and Johnston A., et al.eBird Status and Trends (Version 2022), 2023, Cornell Lab of Ornithology, 10.2173/ebirdst.2022. [DOI] [Google Scholar]

[bib-0026] 26. Center For International Earth Science Information Network-CIESIN-Columbia University, Gridded Population of the World, Version 4 (GPWv4): Population Count, 2018, NASA SEDAC, 10.7927/H49C6VHW. [DOI] [Google Scholar]

[bib-0027] 27. Gilbert M., Nicolas G., and Cinardi G., et al.Global Distribution Data for Cattle, Buffaloes, Horses, Sheep, Goats, Pigs, Chickens and Ducks in 2010, Scientific Data. (2018) 5, no. 1, 10.1038/sdata.2018.227, 2-s2.0-85055614271, 180227. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-0028] 28. Elvidge C. D., Baugh K., Zhizhin M., Hsu F. C., and Ghosh T., VIIRS Night-Time Lights, International Journal of Remote Sensing. (2017) 38, no. 21, 5860–5879, 10.1080/01431161.2017.1342050, 2-s2.0-85021300522. [DOI] [Google Scholar]

[bib-0029] 29. Hillger D., Kopp T., and Lee T., et al.First-Light Imagery From Suomi NPP VIIRS, Bulletin of the American Meteorological Society. (2013) 94, no. 7, 1019–1029, 10.1175/BAMS-D-12-00097.1, 2-s2.0-84880893492. [DOI] [Google Scholar]

[bib-0030] 30. NOAA, NOAA/NGDC - Earth Observation Group—Defense Meteorological Satellite Progam, 1992, https://ngdc.noaa.gov/eog/sensors/ols.html. [Google Scholar]

[bib-0031] 31. Phillips S. J., Dudík M., and Elith J., et al.Sample Selection Bias and Presence-Only Distribution Models: Implications for Background and Pseudo-Absence Data, Ecological Applications. (2009) 19, no. 1, 181–197, 10.1890/07-2153.1, 2-s2.0-63849333773. [DOI] [PubMed] [Google Scholar]

[bib-0032] 32. Jurgiel B., Point Sampling Tool [QGIS Plugin] (Version 0.5.4), QGIS Python Plugins Repository, 2022, https://plugins.qgis.org/plugins/pointsamplingtool/. [Google Scholar]

[bib-0033] 33. Liu F. T., Ting K. M., and Zhou Z.-H., Isolation Forest, 2008, 2008 Eighth IEEE International Conference on Data Mining, 2008, 413–422, 10.1109/ICDM.2008.17, 2-s2.0-67049142378. [DOI] [Google Scholar]

[bib-0034] 34. Dataman C. K., 2024, Handbook of Anomaly Detection—(4) Isolation Forest. Dataman in AI.

[bib-0035] 35. Guyon I. and Elisseeff A., An Introduction to Variable and Feature Selection, 2003.

[bib-0036] 36. Cavanaugh J. E. and Neath A. A., The Akaike Information Criterion: Background, Derivation, Properties, Application, Interpretation, and Refinements, WIREs Computational Statistics. (2019) 11, no. 3, 10.1002/wics.1460, 2-s2.0-85062947290. [DOI] [Google Scholar]

[bib-0037] 37. Breiman L., Random Forests, Machine Learning. (2001) 45, no. 1, 5–32, 10.1023/A:1010933404324, 2-s2.0-0035478854. [DOI] [Google Scholar]

[bib-0038] 38. Dormann C. F., Elith J., and Bacher S., et al.Collinearity: A Review of Methods to Deal With it and a Simulation Study Evaluating Their Performance, Ecography. (2013) 36, no. 1, 27–46, 10.1111/j.1600-0587.2012.07348.x, 2-s2.0-84874725861. [DOI] [Google Scholar]

[bib-0039] 39. Cox D. R., The Regression Analysis of Binary Sequences, Journal of the Royal Statistical Society Series B: Statistical Methodology. (1958) 20, no. 2, 215–232, https://www.jstor.org/stable/2983890, 10.1111/j.2517-6161.1958.tb00292.x. [DOI] [Google Scholar]

[bib-0040] 40. Cortes C. and Vapnik V., Support-Vector Networks, Machine Learning. (1995) 20, no. 3, 273–297, 10.1023/A:1022627411411, 2-s2.0-34249753618. [DOI] [Google Scholar]

[bib-0041] 41. Quinlan J. R., Induction of Decision Trees, Machine Learning. (1986) 1, no. 1, 81–106, 10.1023/A:1022643204877, 2-s2.0-33744584654. [DOI] [Google Scholar]

[bib-0042] 42. Ke G., Meng Q., and Finley T., et al.LightGBM: A highly efficient gradient boosting decision tree, 17, Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, 3149–3157, https://dl.acm.org/doi/10.5555/3294996.3295074. [Google Scholar]

[bib-0043] 43. Friedman J. H., Greedy Function Approximation: A Gradient Boosting Machine, The Annals of Statistics. (2001) 29, no. 5, 1189–1232, 10.1214/aos/1013203451. [DOI] [Google Scholar]

[bib-0044] 44. Phillips S. J., Anderson R. P., and Schapire R. E., Maximum Entropy Modeling of Species Geographic Distributions, Ecological Modelling. (2006) 190, no. 3-4, 231–259, 10.1016/j.ecolmodel.2005.03.026, 2-s2.0-33746218412. [DOI] [Google Scholar]

PERMALINK

Machine Learning‐Based Geospatial Risk Modeling of Global Avian Influenza Outbreaks

Mehak Jindal

Samsung Lim

C Raina MacIntyre

Abstract

1. Introduction

2. Materials and Methods

2.1. Data

2.1.1. H5N1 Disease Records

2.1.2. Bioclimatic Data

2.1.2.1. Geographic Landscape

2.1.3. Bird Abundance Data

2.1.4. Anthropogenic Data

2.1.4.1. Population Density

2.1.4.2. Livestock Density

2.1.4.3. NTL

2.2. Methods

2.2.1. Spatial Harmonization of Predictors

2.2.2. Presence–Background Data Construction

2.2.3. Preprocessing and Exploratory Variable Screening

2.2.4. ML Models

2.2.5. MaxEnt Analysis

2.2.6. Data Visualization and Analysis

3. Results and Discussion

3.1. Exploratory Feature Screening

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

3.2. ML Model Performance

3.2.1. Spatial Cross‐Validation Performance

Table 1.

3.2.2. Feature‐Set Sensitivity Under Spatial Validation

3.2.3. Temporal Hold‐Out Evaluation

Table 2.

3.3. Seasonal MaxEnt‐Based Risk Mapping

Table 3.

Figure 6.

3.4. Seasonal Risk Mapping of H5N1 Outbreaks Using MaxEnt

Figure 7.

Figure 10.

3.4.1. Season 1 (December–February)

3.4.2. Season 2 (March–May)

Figure 8.

3.4.3. Season 3 (June–August)

Figure 9.

3.4.4. Season 4 (September–November)

4. Conclusion

Funding

Conflicts of Interest

Supporting Information

Supporting information

Contributor Information

Data Availability Statement

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases