Abstract
Forest fires pose a significant threat to public safety and the environment, and harmful pollutants spread rapidly in areas covered by vegetation. Early detection is very important for preventing forest fires from evolving into catastrophic fires. The traditional prediction methods have relatively low accuracy. They can only identify fires clearly after they occur, making it difficult to meet the requirements of precise real-time detection. The YOLOv5-PSG model proposed in this paper improves the YOLOv5 model. After 300 rounds of training, the average recognition accuracy rate of mAP can reach 93.1%, and the accuracy rate can reach approximately 0.802. After 300 rounds of training and learning, the confidence level can reach about 0.965. This improvement makes fire early warning and prediction more comprehensive and effective, ultimately protecting human life and the environment by mitigating the impact of wildfires.
1 Introduction
Fires have a significant impact on the daily lives of people. In 2023, global wildfires burned 384 million hectares (Mha), which is the highest recorded since 2017. These fires emitted an estimated 2524 × 1012 grams of carbon (Tg C) [1]. It is evident that fire not only endangers humans and animals and results in considerable economic losses to society but also affects the development of the ecological environment and social stability. This issue is of great significance. To protect people’s safety and property, it is important to identify fire sources in the early stages of a fire [2] and to prevent the spread of those sources.
In the past, familiar fire source detection methods were generally implemented using detectors that sense smoke, temperature, and light sources [3]. This method can only detect the degree of fire development [4] and cannot identify or provide feedback during the early stages of fire source occurrence. Moreover, it has a certain false alarm rate and is easily influenced by environmental factors, such as dust, smoke, water fog, high-temperature weather, and other forms of interference. This interference can lead to false alarms, necessitating regular checks by a certain amount of manpower [5].
With the continuous maturation and promotion of artificial intelligence technology, mature artificial intelligence algorithms have been applied in various industries.Kemal Akyol et al. developed an innovative artificial intelligence model for detecting forest fires from landscape images [6].This study proposes a hybrid approach that fuses Transformer-based deep features extracted from architectures such as BEiT, ViTHybrid [7], and SwinV2 with deep neural networks (DNNS) for classification [8]. The research aims to meet the demand for accurate and rapid fire detection in order to minimize the damage caused by forest fires.
Mounia Aaricha et al. focused on conducting research on forest fire detection and prediction by using deep learning methods and with the aid of satellite images [9]. The paper first introduces a variety of deep learning methods commonly used for forest fire detection, such as convolutional Neural Network (CNN), U-Net, InceptionV3 [10–12], etc. Then, the commonly used satellite image datasets in this field were sorted out, such as Sentinel-2, Landsat-8, MODIS, etc. These datasets have played an important role in forest fire detection. Finally, based on the existing research results, a comparative study was conducted on the performance of different deep learning methods in forest fire and smoke detection and prediction. By comprehensively analyzing the advantages and disadvantages of various methods and datasets, it provides a reference for subsequent research in selecting appropriate satellite images, constructing robust datasets, and improving deep learning methods.
Azlan Saleh et al. studied deep learning-based forest fire monitoring systems [13]. They analyzed 37 papers from 2018–2023, covering data types, augmentation methods, and model architectures. The research divided deep learning applications into five subfields and evaluated models using multiple indicators. Most models showed over 90% accuracy. The paper pointed out issues like scarce training data and dataset imbalance, and suggested optimizing hyperparameters, integrating multimodal satellite data, using generative augmentation, and refining model architectures to improve detection systems.
Ting Yun, Jian Li et al. systematically reviewed the current situation, application and prospects of deep learning technology in forest research in their paper, covering deep learning methods for various types of forest remote sensing data (such as images, point clouds, and fused data). The deep learning methods were classified based on data processing methods and operating principles. And the diverse applications of it in the forest were classified as examples [14]. The study also summarized the main forest-related datasets and analyzed the global geographical distribution of relevant literature in the past five years. Meanwhile, the advantages and limitations of the existing deep learning technologies in forest research were comprehensively sorted out, and the development directions of applying deep learning technologies in fields such as forest phenotype analysis, carbon storage estimation, and digital twin creation in the future were prospected, aiming to provide comprehensive references for scholars and industry experts in related fields.
In the algorithm based on “You Only Look Once” (YOLO), scholars such as Shikuan Wang proposed the innovative algorithm FireSmoke-YOLO for the detection problems of complex fire scenarios and small scattered targets [15]. This study is based on the YOLO framework. By introducing the Funnel Spatial pyramid pooling fast layer (FSPPF), the small object detection layer and the dynamic serpentine convolution (DSC), the feature extraction and fusion capabilities are optimized, and a comprehensive dataset containing 11,300 multi-scene images is constructed to improve the generalization of the model. Experiments show that the mAP50 of this model reaches 81.4% and the MAP50-95 reaches 59% [16]. It has both high precision and lightweight (56.1 MB), shows strong generalization ability for satellite remote sensing data, provides an efficient solution for fire monitoring, and has wide application potential in the fields of fire protection and environmental protection.
The structure of this article is as follows: Section 2 introduces our YOLOv5-PSG algorithm, explains the improvement of its network structure model, as well as the design and training strategies of the loss function. The subsequent sections 2.2 to 2.4 cover the construction, annotation and preprocessing of the dataset, as well as the details of the metrics. Section 3 introduces the influence of training rounds on the results. The subsequent sections 3.2 to 4.4 respectively analyze the experimental results, conduct ablation experiments on our model, and compare it with the classic algorithms of the YOLO series, analyzing the key differences of each algorithm. Finally, Section 5 presents the conclusion, summarizes the key points of this paper, and conducts a summative assessment of the potential and limitations of YOLO in fire detection.
2 Materials and methods
This paper first optimizes the branch sampling of small targets. Small targets usually occupy a very small pixel area in an image and are easily overwhelmed by the large-scale features on the high-level feature map. By adding an additional detection head to the shallow feature map, more local information can be utilized, thereby improving the detection accuracy of small targets. This detection head will perform detection on a finer feature map (i.e., a feature map with higher resolution) in order to better capture small targets. The framework of the YOLOv5-PSG model proposed in this paper is shown in Fig 1. In the yellow box part, in the Neck part of the basic YOLOv5, a new Conv-Upsample-Concat-C3 is added after the original head output of the 18th layer for small target branch sampling. Then, a Conv-Concat-C3 structure is used for the convolution of small target branches, and a detection head output is added.
Fig 1. YOLOv5-PSG network structure diagram.
In the task of fire target detection, the loss function is used to measure the difference between the model prediction results and the real labels. The loss function is generally composed of classification loss, regression loss and confidence loss. Suppose the model has N prediction boxes and C categories. The loss function of the improved YOLO model can be expressed as Formula (1):
| (1) |
Among them , and are weight coefficients, which are respectively used to balance the contributions of classification loss, regression loss, and confidence loss.
Classification Loss usually adopts Cross-Entropy loss. For each prediction box, calculate the cross-entropy between the probability distribution of its predicted category and the true category label. The classification loss function is as shown in Formula (2).
| (2) |
Among them, is the true label (0 or 1) of category in the -th prediction box, and is the probability of predicting the category in the -th prediction box.
In the task of fire target detection, the loss function is used to measure the difference between the model prediction results and the real labels. The loss function is used to measure the degree of overlap between the predicted box and the true box. This model adopts the loss function, as shown in Formula (3):
| (3) |
Among them, is the coordinate of the -th true bounding box (usually the center coordinate () and the width and height (, )), and is the coordinate of the -th predicted bounding box, which is the complete intersection-union ratio.The calculation formula of CIoU is Formula (4).
| (4) |
Among them, is the intersection and union ratio, is the square of the distance between the center points of the prediction box and the real box, is the diagonal length of the smallest circumscribed rectangle that contains the prediction box and the true box, is the weight coefficient, and is the parameter for measuring the consistency of the aspect ratio.
In the confidence loss , the confidence level represents the probability of whether there is a target in the prediction box. The binary classification cross-entropy loss is also adopted, such as formula (5):
| (5) |
Among them, is the true label (0 or 1) of the existence of the target in the -th prediction box, and is the probability of the existence of the predicted target in the -th prediction box.
When we use deep models to recognize images, we usually extract the local information of the image through convolution kernels. However, the influence of each local information on whether the image can be correctly recognized is different. How to make the model know the importance of different local information in the image, we introduce the attention mechanism here. The YOLOv5-PSG model proposed in this paper has a red elliptical part. Before each detection head, an attention layer GAM_Attention module is added.
The output of GAM_Attention is the product of the original feature and the attention weight, as shown in Formula (6):
| (6) |
Here, represents element-by-element multiplication (Hadamard product, that is, the product of elements at the corresponding position; represents the original feature map input into the GAM_Attention module, which is the feature information extracted by the model before this module; is the channel attention weight matrix output by the channel attention module. It is obtained through operations such as global average pooling, linear transformation, and activation on the input feature map , reflecting the importance of different channels; is the spatial attention weight matrix output by the spatial attention module. It is obtained through operations such as convolution and activation on the input feature map , reflecting the importance of different spatial positions of the feature map; is the output feature map processed by the GAM_Attention module, which is the result of weighting the original feature map F with the channel attention weight and the spatial attention weight .
During the training process, the total loss function of the model is the sum of the cross-entropy loss and the regularization term, as shown in Formula (7);
| (7) |
Among them: is the task loss function, which is the binary cross-entropy loss; is the model with GAM_Attention; is the true label; is the regularization term; is the hyperparameter of the regularization strength.
2.1 Experimental plan
Compared with YOLOv3, YOLOv5 [17] features a more optimised network structure, fewer parameters, a more lightweight model, faster detection speed and a more flexible recognition mode, allowing it to identify many images in a short time. In addition, YOLOv5 uses the PyTorch framework [18] combined with artificial intelligence vision technology [19], making it very convenient for users to train their own datasets. Therefore, YOLOv5 has been chosen as the basic framework for detecting and identifying high-altitude forest fire sources in this project. The specific system design is shown in Fig 2.
Fig 2. System design drawing.
2.2 Description of the experimental dataset
2.2.1 Overall overview of the experiment.
In the early stage of the experiment, 2,400 images of smoke from high-altitude forest fire sources and 600 images of candlelight were collected for comparison. The preliminary experimental results show that the recognized mAp50 is 0.675. Due to its poor prediction effect of fire source smoke, most of the smoke cannot be accurately identified. The environment in most fire source images is relatively complex and blurred, resulting in poor recognition effect of fire sources. Moreover, the performance of the model trained on small datasets is not satisfactory. To improve the experimental effect, we expanded the dataset and added 18,585 images of high-altitude forest smoke, bringing the total number of images to 21,585 [20]. We used the segmentation script to randomly extract one-tenth of the images from the 21,585 image dataset and began to conduct comparisons of 50, 100, 200, 300, 400 and 500 rounds of training. This work selects 300 as the optimal number of training rounds, optimizes the training model, and compares and analyzes the training graph results under the running directory. We studied and optimized the number of training rounds to determine whether the quality of the weighted document dataset could be improved, thereby enhancing the confidence level of image recognition [21].
2.2.2 Collection of datasets.
Finally, 21585 unified images of high-altitude forest fire source smoke and horizontal fire source smoke were selected in ascending order from Baidu Images and the Kaggle data platform to serve as the dataset for this experiment.As shown in Fig 3.
Fig 3. Fire source smoke dataset.
Moreover, 5. Additionally, 600 candle images were collected to analyze the firelight, as shown in Fig 4.To improve the effectiveness of model training, the first 600 images in the dataset consist of candlelight images, whereas the last 20985 images in the dataset comprise high-altitude fire source smoke images and horizontal fire source smoke images.
Fig 4. Candlelight dataset.
2.2.3 Dataset annotation.
Open-source software was used to select and name the rectangular frames of the pictures, labelling each one individually.As shown in Fig 5.
Fig 5. Target annotation page.
2.3 Training environment
The operating system of the hardware platform is Windows 11 Professional Edition, Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz, 64G, GeForce RTX 3090 24G, and the encoding platform is Pycharm, Python3.11. The main frameworks of deep learning are Pytorch and CUDA 12.4.
The total number of iterations for network training is 300 rounds. The initial learning rate is set at 0.01, and the final learning rate is set at 0.01. The first three rounds of training adopt Warmup preheating training, and the subsequent training uses the cosine annealing algorithm to stabilize the attenuation rate.
2.4 Evaluation index
In this paper, the performance of the algorithm is evaluated by using evaluation metrics such as Precision (P), recall rate (R), F1 score, AP(Average Precision), mAP (mean of Average Precision), and Frames Per Second (FPS). Accuracy is used to measure the correctness of the algorithm in all samples predicted as fires. The recall rate represents the ability of an algorithm to correctly identify fire samples. The F1 score is the harmonic mean of precision and recall rate. The frame rate per second represents the number of images detected per second and is used to evaluate the detection speed of the algorithm. AP is the average accuracy of a single category, reflecting the robustness and generalization ability of the model. mAP calculates the average value of aps for all categories. AP is the average accuracy of a single category, reflecting the robustness and generalization ability of the model. mAP calculates the average value of aps for all categories. The calculation formulas are shown in Equations (8)–(13).
| (8) |
| (9) |
| (10) |
| (11) |
| (12) |
| (13) |
TP represents the number of samples that are actually positive and correctly predicted to be positive. FN represents the number of samples that are actually in the positive class but are predicted to be in the negative class; FP represents the number of samples where the negative class is wrongly predicted as the positive class; TN represents the number of samples that are actually negative classes and correctly predicted to be negative classes, and K represents the total number of classes checked by the model.
3 Experiment and result analysis
3.1 Image visualisation renderings
The separated verification set and dataset were trained, with the number of training rounds set at 50. Additionally, 100, 200, 300, 400 and 500 rounds were stacked successively. A comparison of the visualisation effects of smoke and candlelight from high-altitude forest fire sources is shown below.As shown in Figs 6–8.
Fig 6. Visual comparison of candlelight.
Fig 8. Comparison of high-altitude smoke visualisations.
Fig 7. Visual comparison of high-altitude forest fire sources in complex scenes.
The comparison of the effects of different training rounds shown above reveals that when the number of training rounds reaches 300, the confidence degree of smoke detection [22] for high-altitude forest fire sources is at its highest. When the number of training rounds is either less than or greater than 300, the confidence degree of fire source smoke recognition decreases. However, the confidence degree is greater when the number of training rounds for target recognition [23] is 50 or less, compared with when the number of training rounds involves simple scenes, such as candlelight. The following is an analysis of the training results for high-altitude forest fire source detection under the run catalogue. The category recognition of the images in the test set is generally accurate, as shown in the comparison between Figs 9–11, with the confidence degree being at or above 0.9 for clear images of the fire source.
Fig 9. Diagram of data enhancement using arbitrary scaling cuts [20].
Fig 11. Confidence graph for test pictures [20].
Fig 10. Diagram of positioning picture categories [20].
3.2 Analysis chart of training results
According to Fig 12, the image consists of four parts. The first diagram shows the actual data amount for each category label, the second diagram displays the labelled boundary box [24], the third diagram indicates the actual coordinate value of the labelled central point, and the fourth diagram presents the width and height of the labelled matrix. We find that the sum of the data based on the three target types from the photograph is close to 50000.
Fig 12. Labels diagram.
As shown in Fig 13, when the number of training rounds reaches 300, the confidence of all types of recognition is approximately 0.803. Additionally, higher confidence correlates with higher accuracy.
Fig 13. Precision_curve diagram.
Fig 14 is a two-dimensional colour matrix graph that illustrates the correlation between the centre point coordinates (x,y) in the predicted label and the width and height of the box. The last graph in each row represents the overall distribution.
Fig 14. Labels.correlogram.
3.3 Ablation experiment
In order to evaluate the influence of the proposed model strategy on the detection performance, the module ablation method was adopted for experiments, and the optimal design choice was determined. The second group in Table 1 introduces the small object detection layer. The time complexity of this algorithm slightly increases, which leads to a slight decrease in the P of the model. However, this modification improved the performance of small object detection, increasing the Mean Average Precision (mAP) by 0.9%. Therefore, although the increase in the computational load of the model may lead to an increase in time complexity, to a certain extent, this impact can be ignored. In the task of detecting forest fires with small targets, the GAM attention mechanism is of crucial importance. Although its introduction has led to a slight decrease in P to a certain extent, it has improved the detection accuracy of the model. This is because GAM, by learning global importance, can fuse global information, enhance the feature expression of small target areas, make them more prominent in the feature map, and thereby help improve the detection performance of small targets. Overall, with the addition of the small object detection layer and GAM, the P of the model slightly decreased by 13.9%. However, it has increased the mean Average precision (mAP) by 1.4%.
Table 1. Results of the model ablation experiment.
| Experimental group (YOLOv5l) | Small target detection layer | GAM | Evaluation index | |||
|---|---|---|---|---|---|---|
| mAP50 (%) | P (%) | R (%) | F1-Score(%) | |||
| 1 | 91.7 | 94.2 | 93.0 | 89.0 | ||
| 2 | √ | 92.6 | 91,5 | 92.0 | 89.0 | |
| 3 | √ | √ | 93.1 | 80.3 | 92.0 | 89.0 |
Table 1 presents the ablation experiment data of the algorithm in this paper.
3.4 Performance comparison of different models
By comparing and analyzing the model with several object detection algorithms as shown in Table 2, the performance of the model in the task of small target forest fire detection was evaluated. mAP50 is a comprehensive indicator for measuring the detection accuracy of a model under different confidence thresholds, reflecting the overall detection ability of the model for multiple types of targets. YOLOv5-PSG led with 93.1% mAP50, an increase of 3.7 percentage points compared to the highest value of the YOLOv3 series (89.4% of YOLOv3-SPP), indicating a significant advantage in its detection accuracy for complex targets. The 5S/m/l of YOLOv were both 91.7%. Although it was slightly lower than that of PSG, it was significantly higher than that of YOLOv3 (88.1%) and YOLOv3-Tiny (80.1%), reflecting the optimization effect of the YOLOv5 infrastructure. The mAP50 of YOLOv3-tiny is only 80.1%, mainly due to the insufficient feature extraction ability caused by the lightweight of the model, and the poor detection effect on small targets or fuzzy targets. YOLOv3-spp enhances the multi-scale feature fusion ability through Spatial pyramid Pooling (SPP), with a mAP50 of 89.4%, but still lags behind the basic version of YOLOv5.
Table 2. Results of model comparison tests.
| Algorithm | Backbone network | Evaluation index | |||
|---|---|---|---|---|---|
| mAP50 (%) | P (%) | R (%) | F1-Score(%) | ||
| YOLOv3 | DarkNet | 88.1 | 96.8 | 89.0 | 86.0 |
| YOLOv3-spp | DarkNet | 89.4 | 99.6 | 90.0 | 86.0 |
| YOLOv3-tiny | DarkNet | 80.1 | 91.0 | 89.0 | 76.0 |
| YOLOv5s | CSPDarkNet | 91.7 | 94.2 | 93.0 | 89.0 |
| YOLOv5m | CSPDarkNet | 91.7 | 94.2 | 93.0 | 89.0 |
| YOLOv5l | CSPDarkNet | 91.7 | 94.2 | 93.0 | 89.0 |
| Ours | CSPDarkNet | 93.1 | 80.3 | 92.0 | 89.0 |
Table 1 presents the comparative data of the algorithms used in this paper.
The trade-off between accuracy (P) and recall rate (R) shows that the YOLOv3-spp (99.6%) model has an extremely low false detection rate under strict confidence screening, but the recall rate is only 90.0%, posing a risk of missed detection. The recall rate of YOLOv5s/m/l reaches 93.0%, and that of YOLOv5-PSG is 92.0%, indicating that its “missed detection rate” for the target is low. Even if there are a few false detections, they can be filtered through subsequent algorithms. The accuracy (96.8%) and recall rate (89.0%) of the traditional YOLOv3 are relatively balanced, but its overall performance is comprehensively surpassed by YOLOv5. F1-Score is the harmonic mean of accuracy and recall rate, reflecting the overall balancing ability of the model. The YOLOv5 series (n/s/m/PSG) are all 89.0%, indicating that it has achieved the optimal balance between accuracy and recall rate, and is particularly suitable for general detection tasks with high requirements for comprehensive performance. Although both YOLOv3-SPP and YOLOv3 have an F1-Score of 86.0%, the former relies on extremely high accuracy to lower the recall rate, while the latter limits its overall performance due to insufficient recall rate. YOLOv3-tiny is only 76.0%, which is a significant cost for the lightweighting of the model.
In the YOLO-PSG model of this paper, although the newly added small target detection layer pays more attention to small-scale features and can detect more small targets that were originally missed, improving the recall rate and promoting the increase of mAP50 which is sensitive to the recall rate, it may have feature utilization conflicts with the original detection layer and interfere with the detection features of large and medium targets. It reduces the detection accuracy of large and medium targets and lowers the overall accuracy. Although GAM_Attention can adaptively focus on the image area, assist in the extraction of small target features, improve the recall rate and promote the increase of mAP50, it may overly focus on the features of small targets and ignore the important detail features of large and medium targets, resulting in recognition errors and reduced accuracy. The main task of object detection in this paper is to conduct preliminary predictions of forest and mountain fires through high-altitude unmanned aerial vehicles (UAVs), and it is more hoped to effectively detect the early images of mountain fires. Although the reduced accuracy of the model will determine many non-fire data as fires, compared with the possible losses caused by mountain fires, the losses brought by these redundant judgments are much smaller.
4 Conclusions
This project involved optimisation research on high-altitude physiological fire source policy based on image recognition. The test results indicate that it can optimise the image recognition of high-altitude forest fire source detection by improving the image quality of datasets and gradually increasing the number of training rounds, ultimately selecting the optimal number of training rounds as 300. This selection is necessary due to the limited data and resources available for high-altitude forest fires on the internet, necessitating the filtering of dataset selection; therefore, only relevant parts of images of fire sources are included in datasets. In terms of the protection of high-altitude forest fire sources, real-life scenarios of these sources present challenges in identifying security risks. In the future, large, high-quality smoke datasets from high-altitude fire sources should be selected to ensure the use of a variety of complex scenes involving small fire sources at high altitudes and diverse smog datasets under different weather conditions. This approach will ensure varying image sizes within the datasets and facilitate multiscale training, thereby enhancing the testing capability of the modules and increasing the confidence level of recognition by optimising the number of rounds of training. Additionally, the robustness [25] and generalisation ability of the model [26] can be improved by rotating, translating and scaling the training datasets [27] while increasing the number of samples. High-altitude research based on image recognition is very important for advancing the field of fire protection in the future. The potential impact of breakthroughs in this area should not be underestimated.
Data Availability
The data used in this study are publicly available on the roboflow platform. fire-and-smoke Data: https://universe.roboflow.com/aamir-naseer-widhl/fire-and-smoke-2-gvl1q. The datasets comprise various image sets used for model training, validation, and evaluation, covering diverse art styles, scenes, and emotionally evocative images. All relevant data are available without restrictions, ensuring reproducibility of the study.
Funding Statement
This research was supported by the Youth Science and Technology Talents Growth Project from Guizhou Provincial Department of Education(Grant number: QJJ[2022]302 to Y.L.); High-level Talent Introduction Project Guiyang University Talent Introduction Start-up Fund Research Project (Grant number: GYU-KY- [2026] to Y.L. and H.T.); National Training Program for Innovation and Entrepreneurship of Chinese College Students (Grant number: S2024109760196 to Y.L); Supported by Guizhou Provincial Basic Research Program (Natural Science) (Grant number: ZK[2023]011 to H.T.); Supported by Guizhou Provincial Basic Research Program (Natural Science) (Grant Number: ZK[2024]520 to H.T.); Youth Science and Technology Talents Growth Project from Guizhou Provincial Department of Education (Grant Number: QJJ[2024]194 to H.T.); National Training Program for Innovation and Entrepreneurship of Chinese College Students (Grant Number: S2024109760194 to H.T.).
References
- 1.Liu Y . Climate Change Science Dynamic Monitoring Bulletin. Lanzhou Literature and Information Center, Chinese Academy of Sciences. 2024. [Google Scholar]
- 2.Li C. Application analysis of target detection technology in “intelligent fire protection”. China Fire Protection. 2023;S1:66–7, 70. [Google Scholar]
- 3.Zhang J, Peng D, Zhang C, et al. Forest fire prediction modeling based on deep learning in the Greater Hinggan Mountains of Inner Mongolia. Research in Forestry Sciences. 2024;37(01):31–40. [Google Scholar]
- 4.Deepa KR, Chaitra AS, Jhansi K, Anitha Kumari RD, Mallikarjun MK. Development of fire detection surveillance using machine learning & IoT. In: 2022 IEEE 2nd Mysore Sub Section International Conference (MysuruCon), 2022. 1–6. [Google Scholar]
- 5.Ganesan V, Ramasamy V, Manoj C, Tejaswi T. Contextual Emotional Classifier: An Advanced AI-Powered Emotional Health Ecosystem for Women Utilizing Edge Devices. TS. 2023;40(6):2481–94. doi: 10.18280/ts.400613 [DOI] [Google Scholar]
- 6.Akyol K. An innovative hybrid method utilizing fused transformer-based deep features and deep neural networks for detecting forest fires. Advances in Space Research. 2025;75(12):8583–98. doi: 10.1016/j.asr.2025.04.020 [DOI] [Google Scholar]
- 7.Han K, Wang Y, Chen H, Chen X, Guo J, Liu Z, et al. A Survey on Vision Transformer. IEEE Trans Pattern Anal Mach Intell. 2023;45(1):87–110. doi: 10.1109/TPAMI.2022.3152247 [DOI] [PubMed] [Google Scholar]
- 8.Cichy RM, Kaiser D. Deep Neural Networks as Scientific Models. Trends Cogn Sci. 2019;23(4):305–17. doi: 10.1016/j.tics.2019.01.009 [DOI] [PubMed] [Google Scholar]
- 9.Aarich M, Rouijel A, Amine A. Deep Learning Approaches for Forest Fires Detection and Prediction using satellite Images. Procedia Computer Science. 2024;251:758–63. doi: 10.1016/j.procs.2024.11.181 [DOI] [Google Scholar]
- 10.Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun ACM. 2017;60(6):84–90. doi: 10.1145/3065386 [DOI] [Google Scholar]
- 11.Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III, 2015. 234–41. [Google Scholar]
- 12.Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the Inception Architecture for Computer Vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 2818–26. doi: 10.1109/cvpr.2016.308 [DOI] [Google Scholar]
- 13.Saleh A, Zulkifley MA, Harun HH, Gaudreault F, Davison I, Spraggon M. Forest fire surveillance systems: A review of deep learning methods. Heliyon. 2023;10(1):e23127. doi: 10.1016/j.heliyon.2023.e23127 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Yun T, Li J, Ma L, Zhou J, Wang R, Eichhorn MP, et al. Status, advancements and prospects of deep learning methods applied in forest studies. International Journal of Applied Earth Observation and Geoinformation. 2024;131:103938. doi: 10.1016/j.jag.2024.103938 [DOI] [Google Scholar]
- 15.Wang S, Wu M, Wei X, Song X, Wang Q, Jiang Y, et al. An advanced multi-source data fusion method utilizing deep learning techniques for fire detection. Engineering Applications of Artificial Intelligence. 2025;142:109902. doi: 10.1016/j.engappai.2024.109902 [DOI] [Google Scholar]
- 16.Ramos LT, Casas E, Romero C, Rivas-Echeverría F, Bendek E. A study of YOLO architectures for wildfire and smoke detection in ground and aerial imagery. Results in Engineering. 2025;26:104869. doi: 10.1016/j.rineng.2025.104869 [DOI] [Google Scholar]
- 17.Jocher G, Chaurasia A, Stoken A. YOLOv5 release v7.0. 2022. https://github.com/ultralytics/yolov5
- 18.Meng K. Based on improved YOLOv5 transmission lines of hazard source identification technology. Shandong university. 2023. [Google Scholar]
- 19.Kutlugün E, Eyüpoğlu C. Artificial Intelligence Methods Used in Computer Vision. 2020 5th International Conference on Computer Science and Engineering (UBMK), Diyarbakir, Turkey, 2020, 214–8. [Google Scholar]
- 20.Naseer A. Fire and Smoke 2 Dataset. Roboflow Universe. 2022. https://universe.roboflow.com/aamir-naseer-widhl/fire-and-smoke-2-gvl1q [Google Scholar]
- 21.Li C, Li X, Chen M, Sun X. Deep Learning and Image Recognition. In: 2023 IEEE 6th International Conference on Electronic Information and Communication Technology (ICEICT), 2023. 557–62. doi: 10.1109/iceict57916.2023.10245041 [DOI] [Google Scholar]
- 22.Li Y. Forest fire smoke recognition and detection based on EfficientNet. In: 2022 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS), 2022. 712–9. doi: 10.1109/tocs56154.2022.10016028 [DOI] [Google Scholar]
- 23.Hu L. An Improved YOLOv5 Algorithm of Target Recognition. In: 2023 IEEE 2nd International Conference on Electrical Engineering, Big Data and Algorithms (EEBDA), 2023. 1373–7. doi: 10.1109/eebda56825.2023.10090620 [DOI] [Google Scholar]
- 24.Zhao K, Liu Y, Hao S, Lu S, Liu H, Zhou L. Bounding Boxes Are All We Need: Street View Image Classification via Context Encoding of Detected Buildings. IEEE Trans Geosci Remote Sensing. 2022;60:1–17. doi: 10.1109/tgrs.2021.3064316 [DOI] [Google Scholar]
- 25.IEEE Standard for Robustness Testing and Evaluation of Artificial Intelligence (AI)-based Image Recognition Service. IEEE Std 3129-2023. 2023. 1–34. [Google Scholar]
- 26.Edinç EB, Bayazıt U. Assessing the generalization ability of a global model for rapid building damage assessment in real-world disaster scenarios. In: 2024 International Conference on INnovations in Intelligent SysTems and Applications (INISTA), Craiova, Romania, 2024. 1–6. [Google Scholar]
- 27.Xie Z. On Data Scaling in Masked Image Modeling. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 2023. 10365–74. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data used in this study are publicly available on the roboflow platform. fire-and-smoke Data: https://universe.roboflow.com/aamir-naseer-widhl/fire-and-smoke-2-gvl1q. The datasets comprise various image sets used for model training, validation, and evaluation, covering diverse art styles, scenes, and emotionally evocative images. All relevant data are available without restrictions, ensuring reproducibility of the study.














