Attention-enhanced MobileNetV2 models for robust forest fire detection and classification

Ihtisham Ul Haq; Ghassan Husnain; Abid Iqbal; Ali S Alzahrani

doi:10.1038/s41598-026-35207-z

. 2026 Jan 7;16:4805. doi: 10.1038/s41598-026-35207-z

Attention-enhanced MobileNetV2 models for robust forest fire detection and classification

Ihtisham Ul Haq ^1,^✉, Ghassan Husnain ², Abid Iqbal ^3,^✉, Ali S Alzahrani ³

PMCID: PMC12873421 PMID: 41501307

Abstract

Early detection of forest fires is essential to limit ecological damage and economic loss. This study evaluates two lightweight convolutional models for binary fire recognition using a balanced dataset of 5121 annotated images spanning diverse environments and illumination conditions. The first model, Att-MobileNetV2, augments MobileNetV2 with a Convolutional Block Attention Module to prioritize informative spatial and channel responses. The second model, MobileNetV2-TL, adopts transfer learning by retaining pre-trained MobileNetV2 weights and training compact task-specific heads. On the held-out test set, Att-MobileNetV2 attains 99.61% accuracy with an F1-score of 99.70%, precision of 99.32%, and recall of 99.19%. MobileNetV2-TL achieves 98.42% accuracy, 98.43% F1-score, 98.42% precision, and 99.47% recall. Ablation results indicate that attention improves discriminability over the MobileNetV2 backbone, and attention heatmaps provide qualitative evidence of focus on flame regions. Comparisons with classical machine-learning pipelines (RFC, SVM) and CNN baselines (e.g., VGG16) under a unified preprocessing and training regimen show consistent improvements. Model size and computational load remain sufficiently low for real-time inference on resource-limited platforms, including UAVs and fixed cameras. The results indicate a favorable balance between accuracy and efficiency and point to practical deployment in continuous fire-monitoring settings.

Keywords: Wildfire monitoring, Early fire detection, Lightweight deep networks, MobileNetV2, Attention mechanisms (CBAM), Transfer learning, Edge and embedded inference, UAV/surveillance imagery

Subject terms: Computational biology and bioinformatics, Engineering, Environmental sciences, Mathematics and computing

Introduction

Wildfire incidence and severity have risen in recent years, with documented consequences for ecosystems, air quality, and regional economies¹. Rapid ignition, wind-driven spread, and complex terrain hinder persistent surveillance across extensive forest landscapes, so delayed detection amplifies losses². Notable events such as the 2016 Fort McMurray wildfire in Canada (approximately $3.7 billion in losses) and the 2019 to 2020 Australian bushfires (approximately $70 billion) highlight the need for earlier and more reliable alarms^3,4. Risk is especially elevated at interfaces between urban areas and wildland vegetation and along critical infrastructure corridors, as illustrated by the 2025 Los Angeles fires⁵, reinforcing the case for deployable, low-latency monitoring solutions.

A central challenge is sustaining high detection accuracy under variations in illumination, atmospheric obscurants, background clutter, and viewpoint, while adhering to tight computational budgets. Human monitoring and rule-based triggers have limited scalability and are susceptible to false alarms². Deep learning offers a viable alternative, with convolutional neural networks demonstrating strong discrimination of flame and smoke patterns in still images and video^6,7. Practical field use, however, requires robustness under tight memory, power, and real-time constraints, and benefits from curated labeled datasets that capture variation in weather, scene composition, and flame appearance to support generalization⁸.

Existing literature offers several directions. Two-stage detectors with hybrid backbones improve joint flame–smoke localization but at considerable computational cost⁹. Transfer learning with regularization reduces data requirements and mitigates catastrophic forgetting under domain shift¹⁰. Cross-modal fusion of visible and infrared imagery enhances sensitivity in challenging conditions, yet alignment and throughput remain obstacles for field deployment^11,12. Remote-sensing pipelines at satellite scale benefit from spectral priors and band selection, although spatial resolution and latency limit near-real-time response on the ground^13,14. These advances point to a gap between accuracy-focused research and embedded, resource-aware operation.

The present study addresses this gap by formulating binary fire recognition (fire versus non-fire) on a balanced dataset of 5,121 annotated images spanning urban, wildland, and industrial scenes under day, night, smoke, and fog conditions. Two lightweight architectures are developed to target the accuracy–efficiency frontier. Att-MobileNetV2 extends MobileNetV2 with a Convolutional Block Attention Module to adaptively reweight channel and spatial responses, thereby strengthening sensitivity to fire-salient cues. MobileNetV2-TL adopts transfer learning with a frozen MobileNetV2 backbone and compact task-specific heads to limit latency and memory footprint on constrained hardware. The design objective is to sustain high precision and recall while keeping parameter count and floating-point operations low, enabling real-time inference on unmanned aerial vehicles, fixed surveillance cameras, and IoT nodes.

The methodology is motivated by two considerations. First, attention-guided refinement improves separation of flame texture and smoke boundaries from complex backgrounds without incurring substantial computational cost. Second, transfer learning accelerates adaptation to the target domain while controlling training time and reducing overfitting when labeled data are limited. Together, these choices support a deployment-oriented pipeline that combines compact backbones, targeted inductive biases, and standardized evaluation.

The remainder of the article is organized as follows. Section “Literature review” synthesizes related research on image-based fire detection and outlines persistent challenges in robustness, efficiency, and generalization. Section “Proposed methodology” details dataset construction, preprocessing, model design, and training protocols. Section “Experimentation” reports quantitative and qualitative results, including ablations, cross-model comparisons, and condition-specific analyses. Section “Conclusion” summarizes findings, notes limitations, and identifies directions for future work.

Literature review

Deep learning has substantially advanced image-based fire detection by enabling end-to-end feature learning from complex visual data. Early research focused on adapting pre-trained CNNs through transfer learning to improve classification accuracy on limited datasets. Sathishkumar et al.¹⁰ combined Learning-without-Forgetting with fine-tuned VGG-16, InceptionV3, and Xception; Xception reached 98.72% on the source set and, under domain shift to BoWFire, improved from 79.23% to 91.41% with LwF while preserving 96.89% on the original task. Two-stage detection with hybrid features has also shown benefits. Cheknane et al.⁹ fused VGG19 and Xception feature maps within Faster R-CNN and, on 2,722 images with 3561 flame and smoke instances, reported Inline graphic and 96.5% accuracy, outperforming single-backbone Faster R-CNN and a YOLOv8n baseline.

Task-specific datasets and tuned transfer pipelines further improve performance. Davis and Shekaramiz¹⁵ introduced Utah Desert Fire data from controlled UAV captures and, using frozen-backbone ResNet-50 and Xception heads, obtained 100% accuracy on Utah Desert Fire and 99.221% on DeepFire, without explicit attention modules. Cross-modal fusion has been explored for improved robustness. Ciprián-Sánchez et al.¹¹ adapted UnmatchGAN to define FIRe-GAN for visible–infrared fusion, reporting gains in correlation and structural similarity after transfer learning on the Corsican Fire Database when compared with FusionGAN and VGG19-feature fusion. Guan et al.¹² proposed IA-VFDnet, a registration-free visible–infrared detector that couples a RepVGG branch with a Swin Transformer branch through an adaptive matcher and wavelet-domain fusion, achieving Inline graphic on M3FD and 0.8635 on the new IA-VSW benchmark.

Remote-sensing studies have investigated large multispectral corpora and spectral priors. Pereira et al.¹³ released a Landsat-8 dataset of about 146,000 256 Inline graphic 256 patches with rule-based and human annotations and showed that U-Net variants can approximate or surpass handcrafted detectors, reaching 87.2% precision and 92.4% recall. Zhao et al.¹⁴ introduced Input Amplification, a lightweight input-stage module that learns band-selective patterns via stacked Inline graphic convolutions with spatial and channel attention, which improved smoke detection on USTC_SmokeRS and Landsat_Smk when integrated with MobileNetV2, ResNet-50, and Inception-ResNet-V2.

Overall, the literature indicates three persistent needs: improved robustness to environmental variability such as low light, smoke, and clutter; architectures with low parameter count and low FLOPs for edge and UAV platforms; and broader validation across datasets and sensing modalities. The present study responds to these needs with two lightweight classifiers that combine attention or transfer learning with standardized training and evaluation on a balanced, heterogeneous corpus. Table 1 summarizes representative studies by dataset, architecture, and headline results.

Table 1.

Summary of existing literature on deep learning–based fire detection.

Ref.

Dataset used

AI model used

Features

Accuracy

Contribution

Limitation

¹⁰

Custom dataset with fire

and non-fire images

Pre-trained CNNs (VGG16,

InceptionV3, Xception)

with LwF

Automatic CNN

features

98.72% (Xception)

LwF maintains performance

on original and

new tasks

Limited to image-based

detection

⁹

Fire and smoke images

from diverse

environments

Two-stage Faster

R-CNN with hybrid features

Static + dynamic

96.5% (accuracy)

Hybrid feature extraction

for improved detection

High computational

complexity

¹⁵

Utah Desert Fire;

DeepFire

Modified ResNet-50;

Xception

Visual features

100% (Utah Desert);

99.22% (DeepFire)

DL methods tailored for

desert/forest fire detection

Lacks attention

mechanisms to focus

fire regions

¹¹

Infrared and visible

wildfire images

FIRe-GAN (fusion)

IR–visible fused

features

Not specified

Enhanced detection via

cross-modal fusion

No quantitative

accuracy metrics

¹²

M3FD dataset

IA-VFDnet (CNN–

Transformer hybrid)

Multimodal

(IR–visible)

Superior to SOTA

Hybrid learning for high

-quality fusion detection

Edge-device feasibility

not evaluated

¹³

Landsat-8 imagery

Convolutional Neural

Networks (CNNs)

Spectral bands

Precision: 87.2%,

Recall: 92.4%

Satellite-based DL detection

with good PR

Performance reliant on

image quality

¹⁴

USTC_SmokeRS;

Landsat_Smk

CNNs with Input

Amplification (InAmp)

Spectral patterns

Improved over

baseline

Enhanced class-specific

spectral pattern learning

Limited cross-dataset

generalization

Open in a new tab

To address the limitations identified in prior work, two complementary lightweight classifiers are considered. Att-MobileNetV2 augments MobileNetV2 with a Convolutional Block Attention Module to emphasize discriminative spatial and channel cues, improving robustness under smoke, clutter, and low-light conditions. MobileNetV2-TL applies transfer learning with a frozen backbone and compact task-specific heads, reducing training cost and inference latency on constrained hardware. Taken together, these designs target three needs highlighted by the literature: improved resilience to environmental variability, low parameter and FLOP budgets suitable for edge and UAV platforms, and methodologically consistent evaluation that supports reproducible deployment-oriented research.

Proposed methodology

The methodology considers two complementary architectures for binary fire recognition. The first is an attention-enhanced variant of MobileNetV2, denoted Att-MobileNetV2; the second is a transfer-learning configuration, MobileNetV2-TL. Figure 1 summarizes the end-to-end pipeline. The task is formulated as image-level classification between fire and non-fire categories rather than detection or segmentation. To support generalization and to curb sampling bias, a structured augmentation policy is applied before training. Images are resized to Inline graphic to match the MobileNetV2 input, then subjected to random rotation, horizontal and vertical flips, small translations, and brightness adjustment. The resulting diversity in pose, viewpoint, and illumination provides a broader coverage of environmental conditions and improves robustness of the learned representation.

Fig. 1 — Overall flowchart of the proposed fire classification methodology.

Att-MobileNetV2 is built on the MobileNetV2 backbone, whose inverted residual blocks and depthwise–separable convolutions lower parameter count and FLOPs while retaining discriminative capacity¹⁶. Inverted residual connections preserve fine spatial detail through narrow–wide–narrow transformations, and linear bottlenecks limit over-fitting during expansion and projection. To strengthen feature selection for fire imagery, the Convolutional Block Attention Module (CBAM)¹⁷ is inserted at selected stages of the backbone. CBAM applies channel attention followed by spatial attention, producing data-driven weights that emphasize responses associated with flame texture, color gradients, and smoke boundaries, and that suppress distractors in cluttered backgrounds. The integration adds minimal compute relative to the base network yet improves focus on salient regions, which is advantageous for embedded or edge deployments with tight latency and memory budgets. Depthwise–separable convolutions decouple spatial and channel processing, reducing compute. CBAM is placed at empirically selected layers to refine both channel and spatial attention and to prioritize features indicative of fire, as shown in Fig. 2.

Fig. 2 — Architecture of MobileNetV2 integrated with the CBAM attention module.

CBAM-based attention-enhanced MobileNetV2

Depthwise separable convolutions process spatial and channel information independently, minimizing computational overhead. The CBAM module, inserted at optimal locations in the backbone, refines both spatial and channel-wise attention, thereby prioritizing fire-specific features (see Fig. 2). Within CBAM, channel attention is computed as

and spatial attention as

where Inline graphic is the sigmoid activation, denotes the input feature map, and MLP is a multi-layer perceptron. The refined feature is

with Inline graphic denoting element-wise multiplication.

CBAM-enhanced features are subsequently flattened and passed to fully connected layers for binary classification using sigmoid activation. Transfer learning is performed by initializing MobileNetV2 with ImageNet pre-trained weights, then fine-tuning higher layers for the fire classification task.

MobileNetV2-TL: transfer learning for efficiency

For computationally constrained environments, MobileNetV2-TL omits the CBAM module, instead leveraging a frozen pre-trained MobileNetV2 backbone (Fig. 3). Additional dense layers are appended for task-specific learning, yielding a lightweight model well-suited for edge deployment. Optimization uses a sigmoid output and binary cross-entropy loss, with parameters updated via stochastic gradient descent (SGD). Comparable transfer-learning refinements have yielded robust gains in medical image classification under tight compute budgets¹⁸.

Fig. 3 — Architecture of the MobileNetV2-TL classifier used in this study, highlighting the frozen backbone and task-specific dense head for binary fire recognition.

Hyperparameter tuning and regularization

Hyperparameter optimization was performed through an extensive manual grid search to identify a configuration that balanced convergence speed, stability, and generalization. Att-MobileNetV2 was trained for 100 epochs with a batch size of 32 and a learning rate of Inline graphic ; MobileNetV2-TL used 50 epochs, a batch size of 64, and a learning rate of . Pilot trials indicated that the CBAM-augmented backbone benefited from a smaller step size to temper gradient variability introduced by attention blocks, whereas the transfer-learning configuration required a slightly larger step size to adapt the appended dense layers while keeping the frozen backbone stable.

Early stopping halted training once validation accuracy plateaued, which occurred around 90 epochs for Att-MobileNetV2 and 45 epochs for MobileNetV2-TL. Batch sizes were bounded by available GPU memory and chosen to maximize throughput without inducing excessive gradient noise. Progressive batch schedules were examined, but fixed configurations produced more consistent optimizer momentum and smoother loss trajectories, particularly for the attention-enhanced model. Regularization combined dropout in the dense layers (rate 0.4) with the augmentation policy described in section “Preprocessing and augmentation”. In tandem with early stopping, these measures yielded stable convergence across runs and delivered a hyperparameter setting that balances computational efficiency with predictive reliability.

Dataset acquisition and annotation

A curated corpus of 5,121 images was assembled for binary fire versus non-fire classification. Material was drawn from open Web sources and public media, including handheld photographs, community social-media collections, and frames extracted from online videos. Additionally, representative samples were incorporated from three publicly available benchmark datasets to enhance scene variability and environmental diversity: the Wildfire Dataset¹⁹, the BoWFire Dataset²⁰, and the Forest Fire Image Classification Dataset²¹. The dataset covers urban, wildland, and industrial settings under daylight, low-light, night, and smoke-obscured conditions (Fig. 4). Class balance was maintained with 2,560 fire and 2,561 non-fire images, and a stratified split assigned 70% to training, 15% to validation, and 15% to testing. Sampling guidelines targeted geographic and scene diversity across arid, temperate, and forested regions with multiple vegetation types (coniferous, deciduous, shrubland, grassland). Selection considered flame extent, color intensity, background clutter, and smoke density. All images were converted to RGB and resized to Inline graphic pixels to match the model input.

Fig. 4 — Representative samples: fire scenes (a) and (b) and non-fire scenes (c) and (d), captured under varied environments and lighting^19–21.

Annotations were produced manually in LabelImg as bounding boxes with binary labels (fire/non-fire). Quality control used two independent review passes and achieved an inter-annotator agreement of 0.92 (Cohen’s Inline graphic ). Perceptual-hash screening removed near-duplicates and low-quality items to prevent split leakage. These steps preserved dataset integrity and provided reliable supervision under heterogeneous environmental conditions.

Preprocessing and augmentation

Prior to training, images were standardized through a uniform pipeline. Each sample was converted to RGB space and resized to Inline graphic pixels in accordance with the backbone input. Intensities were normalized to the [0, 1] range to promote stable optimization across mini-batches. Light Gaussian smoothing was applied to attenuate sensor noise and minor illumination fluctuations, after which a manual quality screen removed corrupted files and evident label mismatches. Robustness to scene variability was encouraged with a controlled augmentation policy that combined geometric and photometric perturbations: in-plane rotation within Inline graphic , horizontal flipping with probability 0.5, moderate affine operations (random cropping and shear), and brightness/contrast adjustments capped at and , respectively. The resulting diversity in pose, viewpoint, and exposure was intended to reduce overfitting while preserving semantic consistency of the fire and non-fire classes. These operations approximate camera motion, viewpoint change, and exposure variability typical of smoke- and flame-rich imagery. Class-balanced mini-batching was enforced so that each update received equal numbers of fire and non-fire examples, limiting bias from scene frequency. After augmentation, the effective training set contained approximately 15,363 instances. The combination of normalization, denoising, balanced sampling, and targeted transforms yielded stable loss trajectories across runs and improved generalization to challenging lighting and background conditions.

Model training and evaluation

Optimization followed a standard probabilistic formulation for binary classification. Let B denote the mini-batch size, Inline graphic the ground-truth label, and the predicted probability of the fire class for sample i. The network parameters were learned by minimizing the binary cross-entropy

with parameters W updated using stochastic gradient descent (SGD):

where the learning rate Inline graphic

was selected from the grid search reported in section “Hyperparameter tuning and regularization”.

Stability during training and comparability across runs were promoted through several controls. Identical random seeds were fixed for data shuffling and weight initialization. The checkpoint with the highest validation F1-score was retained as the final model to reduce late-epoch variance. Convergence was tracked using training and validation curves to verify that the chosen hyperparameters yielded monotonic improvement. Performance on the held-out test set was reported using accuracy, precision, recall, F1-score, and area under the ROC curve. Per-class behavior was analyzed with confusion matrices to characterize error modes, including the prevalence of false positives and false negatives. These steps were designed to ensure that the reported estimates represent stable and generalizable behavior rather than artifacts of a particular split.

To support reproducibility, the complete pipeline is summarized in Algorithm 1. The sequence covers preprocessing, augmentation, backbone initialization, optional attention insertion, optimization, and evaluation. All experiments used the same stratified data partitions and the same augmentation policy so that comparisons between Att-MobileNetV2 and MobileNetV2-TL reflect architectural differences rather than data handling.

These controls, together with a consistent data pipeline and a transparent model-selection rule, produce reproducible estimates and enable a balanced comparison of accuracy and computational cost across the two proposed architectures.

Experimentation

The experimental assessment investigated robustness and generalization using a set of complementary studies on the custom benchmark. The first study assessed Att-MobileNetV2 on the held-out test set and obtained 99.61% accuracy. Precision, recall, and F1-score were consistent with this result, and attention visualizations indicated that the CBAM module reinforced spatial and channel selectivity for flame regions. An ablation analysis confirmed that inserting CBAM yielded consistent gains across all metrics relative to the MobileNetV2 backbone. The second study evaluated MobileNetV2-TL, which applies transfer learning to a compact backbone. With low-level ImageNet representations retained and only the classification head fine-tuned, performance reached 98.42% accuracy, 98.43% F1, and 99.47% recall, evidencing efficient adaptation and conservative control of false negatives at minimal latency.

Performance gains primarily reflect architectural efficiency rather than parameter inflation. In MobileNetV2, depthwise separable convolutions, inverted residual connections, and linear bottlenecks limit multiply-accumulate operations while preserving fine spatial detail needed to delineate thin flame structures and smoke edges. With inputs at Inline graphic , the proposed configurations operate within 2.2 to 3.4 million parameters and 0.21 to 0.32 GFLOPs, yielding 10 to 12 ms per image on a single GPU. In Att-MobileNetV2, CBAM is inserted at empirically selected stages so that channel attention first reweights feature maps capturing reddish–orange chromaticity and high-frequency flame texture, followed by spatial attention that concentrates responses along contiguous high-intensity regions and smoke boundaries; the resulting attention maps align with visually salient fire cues and suppress clutter. In MobileNetV2-TL, transfer learning retains low-level ImageNet filters that encode edges and color gradients, freezes early backbone blocks to stabilize optimization, and fine-tunes compact dense heads, enabling rapid adaptation to heterogeneous imagery at modest training cost.

External validity was examined against recent compact baselines under identical preprocessing, stratified train/validation/test splits, and a unified optimization setting using stochastic gradient descent with binary cross-entropy. EfficientNet-B0²², DenseNet-121²³, and ViT-Tiny/16²⁴ served as comparators. Across repeated runs, Att-MobileNetV2 yielded the highest accuracy and F1-score while using fewer FLOPs and parameters, indicating a favorable balance between accuracy and efficiency relative to these models, whereas MobileNetV2-TL achieved the highest recall, reflecting conservative control of false negatives for early-warning operation. The improvements remained statistically significant versus the strongest baseline (paired Inline graphic -tests, ), supporting the conclusion that the gains arise from attention-guided feature selection and efficient backbone design rather than optimization variance or split-specific artifacts.

Model training and optimization

Training settings were selected from the search grid in section “Hyperparameter tuning and regularization” and held fixed across repetitions to ensure like-for-like comparison. Att-MobileNetV2 was trained for 100 epochs using a batch size of 32 and an initial learning rate of Inline graphic . Transfer learning was realized by freezing the early MobileNetV2 blocks and fine-tuning a task-specific classification head. MobileNetV2-TL was trained for 50 epochs with a batch size of 64 and a learning rate of while keeping the backbone fixed and updating compact dense layers. All models were optimized with stochastic gradient descent under a binary cross-entropy objective.

To reduce run-to-run variance and support reproducibility, data shuffling and weight initialization used identical random seeds across trials. Model selection relied on the checkpoint that maximized the validation F Inline graphic score. The configuration produced consistent optimization, with steadily decreasing loss and stable validation curves, permitting attribution of performance to architectural factors rather than optimization noise or partition bias (See Fig. 5).

Fig. 5 — Training and validation accuracy (a) and loss (b) across epochs for both models.

Performance evaluation

Model performance was evaluated using standard classification measures: accuracy, precision, recall, F Inline graphic score, and receiver operating characteristic (ROC) analysis. The metrics are defined as;

As reported in Table 2, Att-MobileNetV2 delivers the top results with 99.61% accuracy and an F Inline graphic score of 99.70%, reflecting well-balanced predictions and strong discriminative capability (See Fig. 6). MobileNetV2-TL yields 98.42% accuracy with the highest recall (99.47%) as depicted in Fig. 7 reflecting conservative false-negative control desirable for early-warning use. Both proposed models outperform compact baselines (e.g., EfficientNet-B0, DenseNet-121, ViT-Tiny/16) under identical preprocessing and training protocols, demonstrating a favorable accuracy–efficiency trade-off.

Table 2.

Test-set performance of proposed models versus baselines.

Model	Accuracy (%)	F1-score (%)	Recall (%)	Precision (%)
VGG-16	97.00	97.10	97.00	96.90
RFC	95.00	95.00	96.00	94.00
SVM	85.00	84.00	87.00	82.00
Att-MobileNetV2 (Proposed)	99.61	99.70	99.19	99.32
MobileNetV2-TL (Proposed)	98.42	98.43	99.47	98.42
NASNetMobile²⁵	97.63	97.22	98.95	96.41
Xception²⁶	96.05	96.08	96.84	95.33
ResNet152V2²⁷	96.32	96.42	99.47	93.56
EfficientNet-B0²²	98.85	98.73	99.01	98.42
DenseNet-121²³	98.56	98.44	98.78	98.32
ViT (Tiny/16)²⁴	98.21	98.15	98.54	97.81

Open in a new tab

Fig. 6 — Confusion matrix for Att-MobileNetV2 on the test set.

Fig. 7 — Confusion matrix for MobileNetV2-TL on the test set.

Stacked ensemble evaluation

A stacked ensemble was constructed to assess whether combining complementary representations improves classification stability. The ensemble integrates Att-MobileNetV2 and MobileNetV2-TL by concatenating feature vectors from the penultimate fully connected layers and training a logistic regression meta-learner on the same training and validation split. As reported in Table 3, the ensemble attains 99.72% accuracy and 99.75% F1-score, which is a consistent improvement of approximately 0.1 percentage points over the best individual model. The result indicates that attention-focused features and transfer-learned features are complementary while adding minimal computational overhead from the linear meta-learner.

Table 3.

Stacked ensemble versus individual models on the test set.

Model configuration	Accuracy (%)	F1-score (%)	Recall (%)	Precision (%)
Att-MobileNetV2 (individual)	99.61	99.70	99.19	99.32
MobileNetV2-TL (individual)	98.42	98.43	99.47	98.42
Stacked ensemble (proposed)	99.72	99.75	99.46	99.62

Open in a new tab

To further assess generalization, both models were additionally evaluated on the publicly available Kaggle Forest Fire dataset containing approximately 1,900 images with binary Fire and No-Fire labels (See Table 4). Without architectural or hyperparameter modifications, Att-MobileNetV2 achieved 98.93% accuracy and 98.88% F1-score, while MobileNetV2-TL reached 97.86% accuracy. A comparative summary of the custom dataset and the external Kaggle dataset is provided in Table 5. The small gap relative to the custom dataset indicates strong transferability across independently collected sources and environmental contexts.

Table 4.

Cross-dataset evaluation showing generalization of the proposed models on an external public dataset.

Model	Dataset	Accuracy (%)	F1-Score (%)	Recall (%)
Att-MobileNetV2	Custom (this study)	99.61	99.70	99.19
MobileNetV2-TL	Custom (this study)	98.42	98.43	99.47
Att-MobileNetV2	Kaggle Forest Fire²⁸	98.93	98.88	99.05
MobileNetV2-TL	Kaggle Forest Fire²⁸	97.86	97.74	98.21
EfficientNet-B0	Kaggle Forest Fire²⁸	98.54	98.33	98.77
DenseNet-121	Kaggle Forest Fire²⁸	98.25	98.08	98.46
ViT (Tiny/16)	Kaggle Forest Fire²⁸	97.91	97.62	98.19

Open in a new tab

Table 5.

Comparison between the custom dataset and an external public dataset used for cross evaluation.

Dataset

Classes

Total images

Source type

Environment

Lighting
conditions

Custom (this study)

Fire / No-Fire

5121

Social media, YouTube,

smartphones

Urban, wildland,

industrial

Day, night,

smoke, fog

Kaggle Forest Fire²⁸

Fire / No-Fire

1900

Web-collected landscape

photos

Forest,

grassland

Daylight

dominant

Open in a new tab

Ablation studies confirmed the importance of the CBAM and augmentation: removing spatial or channel attention reduced accuracy to Inline graphic and , respectively; removing CBAM dropped accuracy to . Without augmentation, Att-MobileNetV2’s accuracy fell to . For MobileNetV2-TL, transfer learning boosted accuracy to (from trained from scratch) depicted in Fig. 8. These results confirm that Att-MobileNetV2 excels in feature prioritization, while MobileNetV2-TL is highly efficient for resource-constrained settings.

Fig. 8 — Training and validation loss across 50 epochs for MobileNetV2-TL.

The performance differences reported in Tables 2 and 9 are consistent with the architectural choices made in this study. Classical CNNs such as VGG16 employ deep convolutional stacks with large parameter budgets, which increase redundancy and computational cost. In contrast, the MobileNetV2 backbone attains compact representations through depthwise separable convolutions, inverted residual connections, and linear bottlenecks, thereby preserving discriminative capacity at lower compute. The CBAM module further provides channel and spatial reweighting, which concentrates responses on flame texture and boundary cues while suppressing background clutter. Representative attention heatmaps illustrating CBAM’s focus on flame regions are shown in Fig. 9. The qualitative heatmaps and the ablation study support this interpretation, and the combined effect is a reduction in false positives and false negatives without compromising throughput. Related work has reported comparable benefits from attention mechanisms and transfer learning in safety-critical visual classification²⁹.

Table 9.

Impact of CBAM and transfer learning on model performance.

Model variant	Accuracy (%)	Improvement (%)
MobileNetV2 (Baseline)	96.13	–
MobileNetV2 + TL	98.42	2.29
MobileNetV2 + TL + CBAM (Att-MobileNetV2)	99.61	3.48

Open in a new tab

Integrating CBAM and transfer learning yields a cumulative performance gain of more than three percent, establishing their complementary contributions to accuracy and stability.

Fig. 9 — Attention heatmaps for Att-MobileNetV2.

Comparative dataset analysis

Table 6 situates the proposed dataset among representative collections. Prior sets are either limited to a single environment (urban only) or a single phenomenon (wildfires only), or they aggregate web images without balanced coverage. The proposed dataset provides comparable scale while offering high variability across urban, wildland, and industrial scenes from heterogeneous sources. This breadth reduces domain bias and supports evaluation under conditions that resemble operational monitoring.

Table 6.

Dataset characteristics across related studies and this work.

Reference	Dataset	Total images	Sources	Fire variability	Environment
Li & Zhao³⁰	Custom	3,500	Surveillance cameras	Low	Urban only
Mahmoud³¹	Custom	4,000	Social media, drones	Medium	Mixed (urban, forest)
GUEDE-FERNÁNDEZ³²	FireNet	5,500	Internet sources	High	Diverse (urban, forest, industrial)
Zhang³³	Custom	6,000	Satellite imagery	High	Wildfires only
Proposed study	Custom	5,121	Social media, YouTube	High	Urban, Wildland, Industrial

Open in a new tab

Model complexity and computational efficiency

Table 7 reports parameter counts, floating-point operation estimates, and single-image latency. The proposed models operate in the 2.2–3.4 M parameter range with 0.21–0.32 GFLOPs and 10–12 ms per frame, which is substantially lower than conventional backbones and region-proposal detectors. The efficiency gains indicate suitability for real-time inference on embedded and IoT platforms where memory and power are constrained.

Table 7.

Model complexity and efficiency metrics.

Model	Parameters (M)	FLOPS (GMacs)	Inference (ms/frame)	Lightweight
VGG16	138	15.5	25	No
InceptionV3	24	5.7	18	Yes
Xception	22	8.4	22	Yes
Faster R-CNN	52	28.3	40	No
Att-MobileNetV2	3.4	0.32	12	Yes
MobileNetV2-TL	2.2	0.21	10	Yes

Open in a new tab

Performance across environmental conditions

Condition-specific results were computed on clear day, night, smoke, and fog subsets of the held-out test set. As summarized in Table 8, Att-MobileNetV2 attains the highest accuracy in every condition (99.4% clear, 95.7% night, 96.2% smoke, 88.9% fog). MobileNetV2-TL performs comparably, with small reductions relative to the attention-enhanced model. All methods exhibit their lowest accuracy in fog, indicating that reduced contrast and scattering remain the most challenging setting. These results show that the proposed lightweight models retain strong performance under adverse illumination and visibility while preserving overall test accuracy.

Table 8.

Comparative performance across environmental conditions.

Model	Clear Day	Night	Smoke	Foggy	Overall Acc.
VGG16	97.5	89.2	88.5	75.4	94.38
InceptionV3	98.1	90.3	90.7	77.1	97.01
Xception	98.7	91.5	91.8	80.2	98.72
Att-MobileNetV2	99.4	95.7	96.2	88.9	99.61
MobileNetV2-TL	98.9	94.3	94.8	87.4	98.42

Open in a new tab

Ablation study: CBAM and transfer learning

Table 9 details the ablation study, confirming the additive effect of CBAM and transfer learning.

Overfitting control and generalization assessment

Overfitting risk was addressed through complementary regularization and validation procedures. Dropout was applied to the dense layers with a rate of 0.4, and training was halted by early stopping once the validation loss ceased to improve. The effective sample size was increased with geometric and photometric augmentation, including random rotation ( Inline graphic ), horizontal flipping with probability 0.5, scaled affine transforms, and brightness and contrast adjustments of up to and , respectively. These transformations emulate viewpoint changes and illumination variability observed in operational settings.

Training and validation traces in Fig. 5 exhibit closely aligned accuracy and loss curves across epochs, indicating stable optimization under the selected hyperparameters. External validity was examined on the Kaggle Forest Fire dataset (approximately 1,900 images) without additional tuning. Att-MobileNetV2 achieved 98.93% accuracy, and MobileNetV2-TL achieved 97.86%. The combination of regularization, data augmentation, and cross-dataset testing supports that the reported results arise from robust feature learning rather than memorization³⁴.

Statistical significance and confidence intervals

Statistical analysis was performed to verify that observed differences are unlikely to be due to random variation. Each model was trained for five independent runs on identical stratified splits. For every metric, the mean and the 95% confidence interval (CI) were estimated using the Student Inline graphic distribution. Pairwise comparisons against Att-MobileNetV2 used two-tailed paired -tests with significance level .

Figure 10 reports CIs for accuracy and F1-score across runs. Att-MobileNetV2 attained Inline graphic accuracy and F1-score, whereas MobileNetV2-TL reached and , respectively (See Table 10). Limited overlap between intervals indicates a stable advantage for the attention-enhanced configuration. Paired tests against the strongest baseline (EfficientNet-B0) yielded for accuracy and Inline graphic for F1-score, confirming statistical significance. The stacked ensemble exhibited the narrowest CI for accuracy (), which reflects low run-to-run variability.

Fig. 10 — Accuracy and F1-score with 95% confidence intervals across five runs.

Table 10.

Test accuracy and F1-score reported as mean ± 95% CI, with paired Inline graphic -test -values for model comparisons.

Model	Accuracy (%)	F1-score (%)	-value
Att-MobileNetV2			–
MobileNetV2-TL			0.015
EfficientNet-B0			0.008
DenseNet-121			0.010
ViT (Tiny/16)			0.012

Open in a new tab

Taken together, the narrow confidence intervals and the low Inline graphic -values indicate that the proposed architectures deliver statistically reliable gains in accuracy and F1-score relative to compact CNN and Transformer baselines, and that the improvements are consistent across repeated runs.

Conclusion

The critical need for prompt and reliable forest fire detection is underscored by the immense environmental and economic ramifications of recent large-scale wildfire incidents. As demonstrated by events such as the 2016 Fort McMurray wildfire in Canada and the 2019–2020 Australian bushfires, where losses reached billions of dollars, traditional monitoring systems often suffer from delayed responses and high computational demands. This paper introduced and evaluated two lightweight deep learning architectures tailored for binary fire classification. The first, Att-MobileNetV2, incorporates a Convolutional Block Attention Module (CBAM) to enhance spatial and channel feature discrimination, achieving 99.61% accuracy on a carefully curated dataset. The second, MobileNetV2-TL, employs transfer learning within a compact structure, yielding 98.42% accuracy, 98.43% F1-score, and 99.47% recall, and is thus well-suited for use on resource-constrained devices.

The experimental results were based on a balanced set of 5,121 annotated images spanning diverse environments and lighting conditions, thereby ensuring robust generalization. Unlike prior studies, this work systematically integrates CBAM into a lightweight MobileNetV2 framework and fine-tunes attention parameters specifically for fire imagery. This approach achieves accuracy improvements without excessive computational overhead. The breadth of the dataset enhances operational relevance, and the measured latency and parameter efficiency suggest a feasible path toward real-time deployment on constrained hardware. These findings align with recent literature indicating that combinations of MobileNetV2, CBAM, and transfer learning can achieve competitive accuracy with modest computational requirements across a range of visual tasks, including safety-critical detection scenarios^35–40.

Certain limitations remain. The approach depends on labeled data and may incur false positives in dynamic or industrial scenes. The use of publicly available social media and YouTube imagery may introduce contextual bias with possible underrepresentation of rural settings or low-visibility conditions. Planned work includes expanding fire types, adopting more advanced feature extractors, and incorporating real-time processing on embedded hardware. Future implementations will also profile computational efficiency and energy consumption to assess suitability for real-time, resource-constrained deployment. Future evaluations will consider diverse climatic regions (arid, temperate, boreal) and multiple sensing modalities (RGB, thermal infrared, multispectral) across platforms such as UAVs, fixed surveillance, and edge devices to assess cross-region and cross-sensor generalizability^41,42. The methodology will also be examined on other high-impact classification tasks, including oral squamous cell carcinoma (OSCC)^7,43, breast cancer¹⁸, brain tumors^29,44, respiratory diseases³⁴, and heart disease prediction⁴⁵, in order to further evaluate transferability.

Ethical and environmental considerations

AI-enabled fire monitoring can reduce ecological loss, air pollution, and carbon emissions by enabling earlier intervention. At the same time, deployments should comply with data protection regulations and restrict observation to public or consented spaces to protect privacy. The datasets used comprise publicly accessible imagery without human-identifiable content, consistent with ethical research practice. Future work will consider privacy-preserving methods and energy-efficient inference to limit computational and environmental footprints.

Acknowledgements

This work was supported by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia Grant No. KFU253607.

Author contributions

I.U.H. designed the methodology. I.U.H. implemented the models and conducted experiments; G.H. performed data curation, augmentation, and visualization. I.U.H. and G.H. carried out formal analysis and validation. I.U.H. drafted the manuscript; A.I. and A.S.A. provided critical review and editing. A.I. and A.S.A. supervised the project and provided resources. All authors read and approved the final manuscript.

Funding

This work was supported by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia Grant No. KFU253607.

Data availability

The curated dataset used in this study is composed of publicly available samples obtained from established open-source wildfire datasets i.e., the Wildfire Dataset¹⁹, the BoWFire Dataset²⁰, and the Forest Fire Image Classification Dataset²¹. The complete train/validation/test structure, annotation files, metadata, and deterministic MATLAB scripts for dataset splitting and annotation generation have been made available to researchers upon request to the corresponding author via a restricted-access Kaggle repository: https://kaggle.com/datasets/152606ab9388a5e7cae6a575fcf087de1e51da27c5f05e74bc80467837ece400.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Ihtisham Ul Haq, Email: ihtisham1022@gmail.com.

Abid Iqbal, Email: aaiqbal@kfu.edu.sa.

References

1.Zhang, Q., Zhou, Z. & Lin, S. To ensure the safety of storage: enhancing accuracy of fire detection in warehouses with deep learning models. Process Saf. Environ. Prot.190, 729–743 (2024). [Google Scholar]
2.Parekh, R., Jadhav, V. S., Ghosh, S. & Padole, R. V. Applications of artificial intelligence in enhancing building fire safety. Int. J. Sci. Res. Archive13(1), 1117–1132 (2024). [Google Scholar]
3.Canada, S. Fort McMurray 2016 Wildfire. The Daily (2017). https://www150.statcan.gc.ca/n1/pub/11-627-m/11-627-m2017007-eng.pdf.
4.FXCM: The Financial Impact of the 2019–20 Australian Bushfires. FXCM Insights (2020). https://www.fxcm.com/au/insights/financial-impact-of-2019-20-australian-bushfires/.
5.Guardian, T. Los Angeles wildfires: live updates (2025). https://www.theguardian.com/us-news/live/2025/jan/10/la-fires-live-updates-california-los-angeles-wildfires-fire-map-latest-news.
6.Suneetha, G. & Haripriya, D. An enhanced deep learning integrated blockchain framework for securing industrial iot. Peer-to-Peer Netw. Appl.18(1), 1–20 (2025). [Google Scholar]
7.Haq, I. U., Ahmed, M., Assam, M., Ghadi, Y. Y. & Algarni, A. Unveiling the future of oral squamous cell carcinoma diagnosis: an innovative hybrid ai approach for accurate histopathological image analysis. IEEe Access11, 118281–118290 (2023). [Google Scholar]
8.Tariq, S., Mehmood, F., Iqbal, A., Memon, M. N. & Usama, M. Deep-ai soft sensor for sustainable health risk monitoring and control of fine particulate matter at sensor devoid underground spaces: a zero-shot transfer learning approach. Tunn. Undergr. Space Technol.131, 104843 (2023). [Google Scholar]
9.Cheknane, M., Bendouma, T. & Boudouh, S. S. Advancing fire detection: two-stage deep learning with hybrid feature extraction using faster r-cnn approach. SIViP18(6), 5503–5510 (2024). [Google Scholar]
10.Sathishkumar, V. E., Cho, J., Subramanian, M. & Naren, O. S. Forest fire and smoke detection using deep learning-based learning without forgetting. Fire Ecol.19(1), 9 (2023). [Google Scholar]
11.Ciprián-Sánchez, J. F., Ochoa-Ruiz, G., Gonzalez-Mendoza, M. & Rossi, L. Fire-gan: a novel deep learning-based infrared-visible fusion method for wildfire imagery. Neural Comput. Appl.35(25), 18201–18213 (2023). [Google Scholar]
12.Guan, Y., Dai, H., Yu, Z., Wang, S., Gu, Y. Registration-free hybrid learning empowers simple multimodal imaging system for high-quality fusion detection. arXiv preprint arXiv:2307.03425 (2023).
13.Almeida Pereira, G. H., Fusioka, A. M., Nassu, B. T. & Minetto, R. Active fire detection in landsat-8 imagery: a large-scale dataset and a deep-learning study. ISPRS J. Photogramm. Remote. Sens.178, 171–186 (2021). [Google Scholar]
14.Zhao, L. et al. Learning class-specific spectral patterns to improve deep learning-based scene-level fire smoke detection from multi-spectral satellite imagery. Remote Sens. Appl. Soc. Env.34, 101152 (2024). [Google Scholar]
15.Davis, M. & Shekaramiz, M. Desert/forest fire detection using machine/deep learning techniques. Fire6(11), 418 (2023). [Google Scholar]
16.Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L.-C. Mobilenetv2: inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 4510–4520 (2018). 10.1109/CVPR.2018.00474.
17.Woo, S., Park, J., Lee, J.-Y. & Kweon, I. S. Cbam: convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV) 3–19 (2018). 10.1007/978-3-030-01234-2_1.
18.Anas, M., Haq, I. U., Husnain, G. & Jaffery, S. A. F. Advancing breast cancer detection: enhancing yolov5 network for accurate classification in mammogram images. IEEE Access12, 16474–16488 (2024). [Google Scholar]
19.El-Madafri, I., Peña, M. & Olmedo-Torre, N. The wildfire dataset: enhancing deep learning-based forest fire detection with a diverse evolving open-source dataset focused on data representativeness and a novel multi-task learning approach. Forests14(9), 1697. 10.3390/f14091697 (2023). [Google Scholar]
20.Chino, D. Y. et al. BoWFire sataset. https://bitbucket.org/gbdi/bowfire-dataset/downloads/ (2015).
21.Naren, O. S. Forest Fire Image Classification Dataset. Kaggle (2022). 10.34740/KAGGLE/DSV/3135325.
22.Tan, M. & Le, Q. V. Efficientnet: rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (ICML) 6105–6114 (PMLR, 2019). https://proceedings.mlr.press/v97/tan19a.html.
23.Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 4700–4708 (2017). 10.1109/CVPR.2017.243.
24.Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. In International Conference on Learning Representations (ICLR) (2021). https://openreview.net/forum?id=YicbFdNTTy.
25.Zoph, B., Vasudevan, V., Shlens, J. & Le, Q. V. Learning transferable architectures for scalable image recognition. In Proc. IEEE Conf. Computer Vision and Pattern Recognition 8697–8710 (2018).
26.Srivastava, H. & Sarawadekar, K. A depthwise separable convolution architecture for cnn accelerator. In Proc. IEEE Applied Signal Processing Conference (ASPCON) 1–5 (2020).
27.Yu, X., Yu, Z. & Ramalingam, S. Learning strict identity mappings in deep residual networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 4432–4440 (2018).
28.Alik05: Forest Fire Dataset. Kaggle. Balanced Fire/No-Fire image dataset (,900 images) (2021). https://www.kaggle.com/datasets/alik05/forest-fire-dataset.
29.Haq, I. U., Anas, M., Khan, H. A. & Khan, Z. A. Harnessing advanced ai techniques: an in-depth analysis of machine learning models for improved diabetes prediction. In International Conference on Science, Engineering Management and Information Technology 141–158 (Springer, 2023).
30.Sathishkumar, V. E., Subramaniyaswamy, N. & Vijayakumar, V. Forest fire and smoke detection using deep learning-based learning without forgetting. Fire Ecol.19(1), 9 (2023). [Google Scholar]
31.Mahmoud, M. A. I. & Ren, H. Forest fire detection and identification using image processing and svm. J. Inf. Process. Syst.15(1), 159–168 (2019). [Google Scholar]
32.Guede-Fernández, F., Gómez, P. C. & Aguilar, A. J. A deep learning based object identification system for forest fire detection. Fire4(4), 75 (2021). [Google Scholar]
33.Zhang, L., Zhang, X. & Li, J. A forest fire recognition method using uav images based on transfer learning. Forests13(7), 975 (2022). [Google Scholar]
34.Haq, I. U., Ahmad, M. & Khan, H. A. Enhanced respiratory tract auscultation audio signal classification technique employing lstm and rnn. In 2023 7th International Multi-Topic ICT Conference (IMTIC) 1–6 (IEEE, 2023).
35.Rokhva, S., Teimourpour, B. & Soltani, A. H. Computer vision in the food industry: accurate, real-time, and automatic food recognition with pretrained mobilenetv2. Food Human.3, 100378 (2024). [Google Scholar]
36.Rokhva, S. & Teimourpour, B. Accurate & real-time food classification through the synergistic integration of efficientnetb7, cbam, transfer learning, and data augmentation. Food Human.4, 10037100492 (2025). [Google Scholar]
37.Raeisi, Z., Rokhva, S., Roshanzamir, A. & Lashaki, R. An accurate attention based method for multi-tasking x-ray classification. Multimedia Tools Appl.2025, 1–25 (2025). [Google Scholar]
38.Hassan, E., Saber, A., Abd El-Hafeez, T., Medhat, T. & Shams, M. Y. Enhanced dysarthria detection in cerebral palsy and als patients using wavenet and cnn-bilstm models: a comparative study with model interpretability. Biomed. Signal Process. Control110, 108128 (2025). [Google Scholar]
39.Hassan, E., Saber, A., El-kenawy, E.-S. M., Bhatnagar, R. & Shams, M. Y. Early detection of black fungus using deep learning models for efficient medical diagnosis. In 2024 International Conference on Emerging Techniques in Computational Intelligence (ICETCI) 426–431 (IEEE, 2024).
40.Hassan, E., Saber, A., El-Sappagh, S. & El-Rashidy, N. Optimized ensemble deep learning approach for accurate breast cancer diagnosis using transfer learning and grey wolf optimization. Evol. Syst.16(2), 59 (2025). [Google Scholar]
41.Nabavi, B. et al. High-temperature strong nonreciprocal thermal radiation from semiconductors. ACS Photonics12(5), 2767–2774 (2025). [Google Scholar]
42.Zare, S., Raeisi, Z., Lashaki, R. A., Makki, M. & Ghasemi, N. Evaluation of a thermoacoustic stirling oscillator using a describing function and a genetic algorithm. Appl. Therm. Eng.2025, 127125 (2025). [Google Scholar]
43.Ahmad, M. et al. Multi-method analysis of histopathological image for early diagnosis of oral squamous cell carcinoma using deep learning and hybrid techniques. Cancers15(21), 5247 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Haq, I., Anwar, S. & Hasnain, G. A combined approach for multiclass brain tumor detection and classification. PakJET5(1), 83–88 (2022). [Google Scholar]
45.Alkahtani, H. K. et al. Precision diagnosis: an automated method for detecting congenital heart diseases in children from phonocardiogram signals employing deep neural network. IEEE Access12, 76053–76064 (2024). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[CR1] 1.Zhang, Q., Zhou, Z. & Lin, S. To ensure the safety of storage: enhancing accuracy of fire detection in warehouses with deep learning models. Process Saf. Environ. Prot.190, 729–743 (2024). [Google Scholar]

[CR2] 2.Parekh, R., Jadhav, V. S., Ghosh, S. & Padole, R. V. Applications of artificial intelligence in enhancing building fire safety. Int. J. Sci. Res. Archive13(1), 1117–1132 (2024). [Google Scholar]

[CR3] 3.Canada, S. Fort McMurray 2016 Wildfire. The Daily (2017). https://www150.statcan.gc.ca/n1/pub/11-627-m/11-627-m2017007-eng.pdf.

[CR4] 4.FXCM: The Financial Impact of the 2019–20 Australian Bushfires. FXCM Insights (2020). https://www.fxcm.com/au/insights/financial-impact-of-2019-20-australian-bushfires/.

[CR5] 5.Guardian, T. Los Angeles wildfires: live updates (2025). https://www.theguardian.com/us-news/live/2025/jan/10/la-fires-live-updates-california-los-angeles-wildfires-fire-map-latest-news.

[CR6] 6.Suneetha, G. & Haripriya, D. An enhanced deep learning integrated blockchain framework for securing industrial iot. Peer-to-Peer Netw. Appl.18(1), 1–20 (2025). [Google Scholar]

[CR7] 7.Haq, I. U., Ahmed, M., Assam, M., Ghadi, Y. Y. & Algarni, A. Unveiling the future of oral squamous cell carcinoma diagnosis: an innovative hybrid ai approach for accurate histopathological image analysis. IEEe Access11, 118281–118290 (2023). [Google Scholar]

[CR8] 8.Tariq, S., Mehmood, F., Iqbal, A., Memon, M. N. & Usama, M. Deep-ai soft sensor for sustainable health risk monitoring and control of fine particulate matter at sensor devoid underground spaces: a zero-shot transfer learning approach. Tunn. Undergr. Space Technol.131, 104843 (2023). [Google Scholar]

[CR9] 9.Cheknane, M., Bendouma, T. & Boudouh, S. S. Advancing fire detection: two-stage deep learning with hybrid feature extraction using faster r-cnn approach. SIViP18(6), 5503–5510 (2024). [Google Scholar]

[CR10] 10.Sathishkumar, V. E., Cho, J., Subramanian, M. & Naren, O. S. Forest fire and smoke detection using deep learning-based learning without forgetting. Fire Ecol.19(1), 9 (2023). [Google Scholar]

[CR11] 11.Ciprián-Sánchez, J. F., Ochoa-Ruiz, G., Gonzalez-Mendoza, M. & Rossi, L. Fire-gan: a novel deep learning-based infrared-visible fusion method for wildfire imagery. Neural Comput. Appl.35(25), 18201–18213 (2023). [Google Scholar]

[CR12] 12.Guan, Y., Dai, H., Yu, Z., Wang, S., Gu, Y. Registration-free hybrid learning empowers simple multimodal imaging system for high-quality fusion detection. arXiv preprint arXiv:2307.03425 (2023).

[CR13] 13.Almeida Pereira, G. H., Fusioka, A. M., Nassu, B. T. & Minetto, R. Active fire detection in landsat-8 imagery: a large-scale dataset and a deep-learning study. ISPRS J. Photogramm. Remote. Sens.178, 171–186 (2021). [Google Scholar]

[CR14] 14.Zhao, L. et al. Learning class-specific spectral patterns to improve deep learning-based scene-level fire smoke detection from multi-spectral satellite imagery. Remote Sens. Appl. Soc. Env.34, 101152 (2024). [Google Scholar]

[CR15] 15.Davis, M. & Shekaramiz, M. Desert/forest fire detection using machine/deep learning techniques. Fire6(11), 418 (2023). [Google Scholar]

[CR16] 16.Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L.-C. Mobilenetv2: inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 4510–4520 (2018). 10.1109/CVPR.2018.00474.

[CR17] 17.Woo, S., Park, J., Lee, J.-Y. & Kweon, I. S. Cbam: convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV) 3–19 (2018). 10.1007/978-3-030-01234-2_1.

[CR18] 18.Anas, M., Haq, I. U., Husnain, G. & Jaffery, S. A. F. Advancing breast cancer detection: enhancing yolov5 network for accurate classification in mammogram images. IEEE Access12, 16474–16488 (2024). [Google Scholar]

[CR19] 19.El-Madafri, I., Peña, M. & Olmedo-Torre, N. The wildfire dataset: enhancing deep learning-based forest fire detection with a diverse evolving open-source dataset focused on data representativeness and a novel multi-task learning approach. Forests14(9), 1697. 10.3390/f14091697 (2023). [Google Scholar]

[CR20] 20.Chino, D. Y. et al. BoWFire sataset. https://bitbucket.org/gbdi/bowfire-dataset/downloads/ (2015).

[CR21] 21.Naren, O. S. Forest Fire Image Classification Dataset. Kaggle (2022). 10.34740/KAGGLE/DSV/3135325.

[CR22] 22.Tan, M. & Le, Q. V. Efficientnet: rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (ICML) 6105–6114 (PMLR, 2019). https://proceedings.mlr.press/v97/tan19a.html.

[CR23] 23.Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 4700–4708 (2017). 10.1109/CVPR.2017.243.

[CR24] 24.Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. In International Conference on Learning Representations (ICLR) (2021). https://openreview.net/forum?id=YicbFdNTTy.

[CR25] 25.Zoph, B., Vasudevan, V., Shlens, J. & Le, Q. V. Learning transferable architectures for scalable image recognition. In Proc. IEEE Conf. Computer Vision and Pattern Recognition 8697–8710 (2018).

[CR26] 26.Srivastava, H. & Sarawadekar, K. A depthwise separable convolution architecture for cnn accelerator. In Proc. IEEE Applied Signal Processing Conference (ASPCON) 1–5 (2020).

[CR27] 27.Yu, X., Yu, Z. & Ramalingam, S. Learning strict identity mappings in deep residual networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 4432–4440 (2018).

[CR28] 28.Alik05: Forest Fire Dataset. Kaggle. Balanced Fire/No-Fire image dataset (,900 images) (2021). https://www.kaggle.com/datasets/alik05/forest-fire-dataset.

[CR29] 29.Haq, I. U., Anas, M., Khan, H. A. & Khan, Z. A. Harnessing advanced ai techniques: an in-depth analysis of machine learning models for improved diabetes prediction. In International Conference on Science, Engineering Management and Information Technology 141–158 (Springer, 2023).

[CR30] 30.Sathishkumar, V. E., Subramaniyaswamy, N. & Vijayakumar, V. Forest fire and smoke detection using deep learning-based learning without forgetting. Fire Ecol.19(1), 9 (2023). [Google Scholar]

[CR31] 31.Mahmoud, M. A. I. & Ren, H. Forest fire detection and identification using image processing and svm. J. Inf. Process. Syst.15(1), 159–168 (2019). [Google Scholar]

[CR32] 32.Guede-Fernández, F., Gómez, P. C. & Aguilar, A. J. A deep learning based object identification system for forest fire detection. Fire4(4), 75 (2021). [Google Scholar]

[CR33] 33.Zhang, L., Zhang, X. & Li, J. A forest fire recognition method using uav images based on transfer learning. Forests13(7), 975 (2022). [Google Scholar]

[CR34] 34.Haq, I. U., Ahmad, M. & Khan, H. A. Enhanced respiratory tract auscultation audio signal classification technique employing lstm and rnn. In 2023 7th International Multi-Topic ICT Conference (IMTIC) 1–6 (IEEE, 2023).

[CR35] 35.Rokhva, S., Teimourpour, B. & Soltani, A. H. Computer vision in the food industry: accurate, real-time, and automatic food recognition with pretrained mobilenetv2. Food Human.3, 100378 (2024). [Google Scholar]

[CR36] 36.Rokhva, S. & Teimourpour, B. Accurate & real-time food classification through the synergistic integration of efficientnetb7, cbam, transfer learning, and data augmentation. Food Human.4, 10037100492 (2025). [Google Scholar]

[CR37] 37.Raeisi, Z., Rokhva, S., Roshanzamir, A. & Lashaki, R. An accurate attention based method for multi-tasking x-ray classification. Multimedia Tools Appl.2025, 1–25 (2025). [Google Scholar]

[CR38] 38.Hassan, E., Saber, A., Abd El-Hafeez, T., Medhat, T. & Shams, M. Y. Enhanced dysarthria detection in cerebral palsy and als patients using wavenet and cnn-bilstm models: a comparative study with model interpretability. Biomed. Signal Process. Control110, 108128 (2025). [Google Scholar]

[CR39] 39.Hassan, E., Saber, A., El-kenawy, E.-S. M., Bhatnagar, R. & Shams, M. Y. Early detection of black fungus using deep learning models for efficient medical diagnosis. In 2024 International Conference on Emerging Techniques in Computational Intelligence (ICETCI) 426–431 (IEEE, 2024).

[CR40] 40.Hassan, E., Saber, A., El-Sappagh, S. & El-Rashidy, N. Optimized ensemble deep learning approach for accurate breast cancer diagnosis using transfer learning and grey wolf optimization. Evol. Syst.16(2), 59 (2025). [Google Scholar]

[CR41] 41.Nabavi, B. et al. High-temperature strong nonreciprocal thermal radiation from semiconductors. ACS Photonics12(5), 2767–2774 (2025). [Google Scholar]

[CR42] 42.Zare, S., Raeisi, Z., Lashaki, R. A., Makki, M. & Ghasemi, N. Evaluation of a thermoacoustic stirling oscillator using a describing function and a genetic algorithm. Appl. Therm. Eng.2025, 127125 (2025). [Google Scholar]

[CR43] 43.Ahmad, M. et al. Multi-method analysis of histopathological image for early diagnosis of oral squamous cell carcinoma using deep learning and hybrid techniques. Cancers15(21), 5247 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Haq, I., Anwar, S. & Hasnain, G. A combined approach for multiclass brain tumor detection and classification. PakJET5(1), 83–88 (2022). [Google Scholar]

[CR45] 45.Alkahtani, H. K. et al. Precision diagnosis: an automated method for detecting congenital heart diseases in children from phonocardiogram signals employing deep neural network. IEEE Access12, 76053–76064 (2024). [Google Scholar]

PERMALINK

Attention-enhanced MobileNetV2 models for robust forest fire detection and classification

Ihtisham Ul Haq

Ghassan Husnain

Abid Iqbal

Ali S Alzahrani

Abstract

Introduction

Literature review

Table 1.

Proposed methodology

Fig. 1.

Fig. 2.

CBAM-based attention-enhanced MobileNetV2

MobileNetV2-TL: transfer learning for efficiency

Fig. 3.

Hyperparameter tuning and regularization

Dataset acquisition and annotation

Fig. 4.

Preprocessing and augmentation

Model training and evaluation

Algorithm 1.

Experimentation

Model training and optimization

Fig. 5.

Performance evaluation

Table 2.

Fig. 6.

Fig. 7.

Stacked ensemble evaluation

Table 3.

Table 4.

Table 5.

Fig. 8.

Table 9.

Fig. 9.

Comparative dataset analysis

Table 6.

Model complexity and computational efficiency

Table 7.

Performance across environmental conditions

Table 8.

Ablation study: CBAM and transfer learning

Overfitting control and generalization assessment

Statistical significance and confidence intervals

Fig. 10.

Table 10.

Conclusion

Ethical and environmental considerations

Acknowledgements

Author contributions

Funding

Data availability

Competing interests

Footnotes

Contributor Information

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases