Enhance fashion classification of mosquito vector species via self-supervised vision transformer

Veerayuth Kittichai; Morakot Kaewthamasorn; Tanawat Chaiphongpachara; Sedthapong Laojun; Tawee Saiwichai; Kaung Myat Naing; Teerawat Tongloy; Siridech Boonsang; Santhad Chuwongin

doi:10.1038/s41598-024-83358-8

. 2024 Dec 28;14:31517. doi: 10.1038/s41598-024-83358-8

Enhance fashion classification of mosquito vector species via self-supervised vision transformer

Veerayuth Kittichai ¹, Morakot Kaewthamasorn ², Tanawat Chaiphongpachara ³, Sedthapong Laojun ³, Tawee Saiwichai ⁴, Kaung Myat Naing ⁶, Teerawat Tongloy ⁶, Siridech Boonsang ⁵, Santhad Chuwongin ^6,^✉

PMCID: PMC11682170 PMID: 39733133

Abstract

Vector-borne diseases pose a major worldwide health concern, impacting more than 1 billion people globally. Among various blood-feeding arthropods, mosquitoes stand out as the primary carriers of diseases significant in both medical and veterinary fields. Hence, comprehending their distinct role fulfilled by different mosquito types is crucial for efficiently addressing and enhancing control measures against mosquito-transmitted diseases. The conventional method for identifying mosquito species is laborious and requires significant effort to learn. Classification is subsequently carried out by skilled laboratory personnel, rendering the process inherently time-intensive and restricting the task to entomology specialists. Therefore, integrating artificial intelligence with standard taxonomy, such as molecular techniques, is essential for accurate mosquito species identification. Advancement in novel tools with artificial intelligence has challenged the task of developing an automated system for sample collection and identification. This study aims to introduce a self-supervised Vision Transformer supporting an automatic model for classifying mosquitoes found across various regions of Thailand. The objective is to utilize self-distillation with unlabeled data (DINOv2) to develop models on a mobile phone-captured dataset containing 16 species of female mosquitoes, including those known for transmitting malaria and dengue. The DINOv2 model surpassed the ViT baseline model in precision and recall for all mosquito species. When compared on a species-specific level, utilizing the DINOv2 model resulted in reductions in false negatives and false positives, along with enhancements in precision and recall values, in contrast to the baseline model, across all mosquito species. Notably, at least 10 classes exhibited outstanding performance, achieving above precision and recall rates exceeding 90%. Remarkably, when applying cropping techniques to the dataset instead of utilizing the original photographs, there was a significant improvement in performance across all DINOv2 models studied. This is demonstrated by an increase in recall to 87.86%, precision to 91.71%, F1 score to 88.71%, and accuracy to 98.45%, respectively. Malaria mosquito species can be easily distinguished from another genus like Aedes, Mansonia, Armigeres, and Culex, respectively. While classifying malaria vector species presented challenges for the DINOv2 model, utilizing the cropped images enhanced precision by up to 96% for identifying one of the top three malaria vectors in Thailand, Anopheles minimus. A proficiently trained DINOv2 model, coupled with effective data management, can contribute to the development of a mobile phone application. Furthermore, this method shows promise in supporting field professionals who are not entomology experts in effectively addressing pathogens responsible for diseases transmitted by female mosquitoes.

Keywords: Mosquito vector species, Artificial intelligence, Self-distillation with unlabeled data, Mobile phone application

Subject terms: Entomology, Malaria, Biodiversity

Introduction

According to an annual report issued by the World Health Organization (WHO), vector-borne diseases pose a significant global health challenge, affecting over 1 billion individuals worldwide¹. Among other blood-sucking arthropods, mosquitoes are the most important vectors transmitting pathogens of medical and veterinary importance. These diseases, caused by several agents including parasites, bacteria, and viruses, are primarily transmitted by certain mosquito species. Thailand, home to a diverse array of mosquito species, has five predominant mosquito genera, each playing distinct roles in disease transmission². Aedes aegypti (Linnaeus, 1762) and Aedes albopictus (Skuse, 1895) mosquitoes, for example, carry pathogens like chikungunya, dengue, Zika, and yellow fever viruses, while Anopheles mosquitoes are mainly responsible for spreading malaria. Armigeres subalbatus (Coquillett, 1898) mosquitoes transmit filariasis, Culex quinquefasciatus Say, 1823 mosquitoes play a role in diseases such as Japanese encephalitis and lymphatic filariasis³, while Mansonia uniformis (Theobald, 1901) mosquitoes are involved in the transmission of filarial parasites⁴. According to Harbach⁵, there are over 3,700 distinct species of mosquitoes worldwide^4,6. Moreover, the ongoing global warming as a result of climate changes is influencing the change of habitats and distribution of many blood sucking insect vectors^7,8. Understanding the distinct roles-based correction animal species played by different mosquito genera is therefore essential for effectively combating and better controlling mosquito-borne diseases.

Traditional mosquito monitoring is labor-intensive, involving tools like traps, resting boxes, and aspirators, followed by expert classification in the lab. This process is time-consuming and requires specialized entomologists. However, advancements in tools and machine learning, particularly deep learning for vision-based classification, are enabling automated mosquito collection and identification. These technologies can identify mosquito sex, genera, and species, even for non-entomologists, overcoming challenges posed by the small size of mosquitoes. Effective mosquito population management is essential for controlling disease spread, influenced by mosquito density and behavior. Implementing an active surveillance system, such as a mobile app, is recommended to enhance control strategies, allowing real-time data collection and monitoring in remote areas without extensive re-training.

Existing methodologies have leveraged the life cycle of mosquitoes, encompassing eggs, larvae, and adult stages. Direct detection of mosquito eggs and larvae is constrained as both stages remain submerged in water. Current approaches focus on automatic counting of mosquito eggs such as EggCountAI^9,10, MECVision¹¹, and ICount¹² software. In a recent development, EggCountAI, utilizing Mask RCNN (Region-based Convolutional Neural Network) for counting Ae. aegypti mosquito eggs on strips, outperformed both MECVision and ICount in terms of performance. Additionally, it has several advantages over other egg-counting methods, including higher accuracy, faster speed, and ease of use. The software can count eggs on flat surfaces, and it can distinguish between mosquito eggs and other small objects. However, EggCountAI has limitations in that it can only count eggs on flat surfaces, may not distinguish between eggs and debris, and struggles with overlapping eggs.

Numerous studies have addressed the detection and classification of the adult stage through diverse modalities such as wing-beat waveforms, audio signals, and body structures. Previous research has explored various developmental methods for classifying three mosquito species and detecting the absence of mosquitoes based on the distinct acoustic properties of their wingbeats. These methods include models based on median and interquartile analysis¹³, as well as Bayesian neural networks^14,15 designed to automatically detect mosquito wingbeats from recordings. In addition, Yin et al.¹⁶ introduced a deep learning-based pipeline for mosquito detection and classification based on unique wingbeat sounds for each species. Fernandes et al.¹⁷ presented an audio classification approach for detecting Ae. aegypti mosquitoes using convolutional neural networks. Additionally, numerous machine learning and deep learning approaches have been introduced for the detection and classification of entire mosquito body images. Okayasu et al.¹⁸ employed a vision-based classification technique, utilizing both feature-based conventional methods and Convolutional Neural Network (CNN)-based deep learning methods, to classify three mosquito species. Motta et al.¹⁹ proposed the use of three neural networks, namely LeNet, AlexNet, and GoogleNet, for the classification of six mosquito species. Couret et al.²⁰ extended the classification scope to 16 mosquito species, employing four additional neural networks, including ResNet, Wide ResNet, ResNext, and DenseNet. Goodwin et al.²¹ employed the Xception model to classify various mosquito species. Rustam et al.²² implemented a novel feature selection technique using CNN models for the classification of two genera, Aedes and Culex. Pise et al.²³ adopted two transfer learning approaches, training three pretrained models (VGG-16, ResNet-50, and GoogLeNet) to classify three vector mosquito species; Ae. aegypti, Anopheles stephensi Liston, 1901, and Cu. quinquefasciatus. However, in these classification models, a common challenge involves accurately locating objects within an image. Consequently, some researchers have proposed object detection methods to specifically detect mosquito species. Kittichai et al.²⁴ applied a two-stage YOLO-based object detection model to identify 11 mosquito species. Zhao et al.²⁵ demonstrated that a Swin Transformer-based mosquito species identification model exhibited the highest performance in detecting 17 mosquito species among various models. In summary, supervised learning models leveraging classification and object detection algorithms have shown successful applications in the automatic classification and detection of mosquitoes.

In parallel, researchers have suggested a mosquito species classification method that utilizes cropped images to focus on individual mosquitoes, given their small size. This technique has the potential to improve classification and detection models. Azam et al.²⁶ conducted experiments using four different AI model architectures (ResNet50, MobileNetV2, EfficientNet-B0, and ConvNeXtTiny) to classify three mosquito species. Additionally, Lee et al.²⁷ introduced a deep learning detection model that integrates Faster R-CNN and Swin Transformer to detect 11 mosquito species and one non-mosquito species simultaneously.

Furthermore, researchers have emphasized the significance of the female mosquito in the transmission of vector-borne diseases, as only female mosquitoes seek blood meals for reproduction. Consequently, certain classification tasks focus specifically on identifying female mosquitoes due to their blood-feeding behavior and significant role in epidemiology. Park et al.²⁸ employed Deep Convolutional Neural Networks (DCNN), including VGG-16, ResNet-50, and SqueezeNet, for the classification of eight female mosquito species. Similarly, Adhane et al.²⁹ utilized DCNN for the classification of Ae. albopictus species.

Despite the considerable efforts invested in developing various mosquito detection and classification models, a majority of them have relied on labeled data and focused on supervised learning models. Recently, there has been a shift towards exploring Transformers, which have achieved significant success in natural language processing, for direct application in image processing. The Vision Transformer (ViT) architecture³⁰, a recent introduction, has demonstrated efficiency, competing commendably with Convolutional Neural Networks (CNNs). To evaluate performance, a comparative analysis is conducted between supervised ViT and the DINOv2 model, designed for self-supervised learning (SSL) across multiple vision tasks, as pioneered by Oquab et al.³¹ Notable studies by Alaziz et al.³² have proposed ViT for fashion classification and DINOv2 for developing recommendation fashion systems. Additionally, Zhang et al.³³ have employed the ViT model for fine-grained classification, while Cui et al.³⁴ have introduced the DINOv2 model for depth estimation in endoscopic surgery. In medical field, DINOv2 SSL pretraining on non-medical images for chest radiographs was explored. SSL outperformed ImageNet-based pretraining, enhancing AI diagnostic accuracy³⁵. It demonstrated transformative shift towards efficient AI models in medical imaging. These endeavors provide inspiration for applying both ViT and DINOv2 in the field of mosquito classification.

This study aims to propose an automated species monitoring system for mosquitoes collected in Thailand using DINOv2 models. The goal is to train these models on a dataset comprising 16 classes of female mosquito species and evaluate their classification performances. Firstly, the contribution of the study is a remarkable DINOv2-ViT-Large model utilizing self-supervised-learning (SSL) neural network algorithm, tailored for classifying 16 mosquito species, notably eight of which are malarial vectors predominantly found across various provinces in Thailand. Secondly, although the application of cropping techniques on the dataset as opposed to using the original data, there was a notable enhancement in performance across all DINOv2 models listed in Table 6. This enhancement is evidenced by an increase in recall from 86.73 to 87.86%, precision from 90.68 to 91.71%, F1 score from 86.28 to 88.71%, and accuracy from 98.18 to 98.45%, respectively. Lastly, comparison on a species-by-species basis demonstrated that employing the SSL model led to lessen in false negative and false positive rates, as well as improvements in precision and recall values, when contrasted with the baseline model for all mosquito species.

Table 6.

Comparison between original images and cropped images using DINOv2 model with various ViT architecture. “OI” stands for original images and “CI” for cropped images. Bolt numbers represent the highest metric values per model trained within the ranges of OI and CI, respectively.

Method	Model	Recall		Precision		Accuracy		F1-Score
Method	Model	OI	CI	OI	CI	OI	CI	OI	CI
DINOv2	ViT-S/14	0.7842	0.8738	0.8185	0.9165	0.9715	0.9845	0.7847	0.8871
	ViT-B/14	0.8096	0.8673	0.8721	0.9123	0.9762	0.9818	0.8135	0.8628
	ViT-L/14	0.8398	0.8786	0.8987	0.9068	0.9807	0.9830	0.8480	0.8807
	ViT-g/14	0.8394	0.8734	0.8754	0.9171	0.9783	0.9826	0.8299	0.8695

Open in a new tab

The study further intends to explore the development of automated mosquito species monitoring by training on cropped images with a single mosquito. This approach holds potential for assisting non-entomologist experts in the field to effective work with vector-borne diseases transmitted by female mosquitoes.

Materials and methods

Geographical distribution and data collection

The collection of mosquitoes and the related research procedures for this study was reviewed and approved by the Institute Reviewer Board for the Animal Ethical issue from Suan Sunandha Rajabhat University, Bangkok, Thailand (Approval No. IACUC 66 − 001/2023).

Mosquito species of medical and veterinary importance were collected from several regions of Thailand, identified for their high prevalence of malaria case reports, using BG-Pro CDC-style traps (BioGents, Regensburg, Germany) baited with industrial CO₂ and BG-lure cartridges. Geographical ranges of the sample collection sites were shown in a modified Thai’ map (Fig. 1).

Fig. 1 — The geographical locations of sample collection sites and ranges of mosquito distribution were shown in a modified Thailand map (Left). The labels indicate the province and the corresponding distribution of mosquito species. The 16-class of mosquitos were separated based on a genus level such as eight *Anopheles* species of dominated malarial vectors in Thailand (Right).

All sample collections were conducted in 2023, following the receipt of animal ethics approval (Approval No. IACUC 66 − 001/2023). This study targeted malarial mosquitoes that exhibit a preference for seeking blood as their protein source during nighttime.

Table 1 outlined the specifics of data collection for each mosquito species. These 16 mosquito classes were sorted into separate folders based on their respective genera. This included eight Anopheles species, prominent as malaria vectors. Among these, the study focused on all three major malaria vectors in Thailand including An. dirus, An. minimus and An. maculatus, with the aim of developing a robust neural network model for future applications within the country. Additionally, samples from four other genera were included as outgroups: two Aedes species, one Armigeres species, three Culex species, and two Mansonia species.

Table 1.

Dataset analysis for mosquito studied. All 16 classes were shown below.

No.	Classes	Training	Validation	Testing	Total
1	Aedes_aegypti_f_TH2023	142	16	17	175
2	Aedes_albopictus_f_TH2023	201	22	28	251
3	Anopheles_barbirostris_Group_f_TH2023	258	28	22	308
4	Anopheles_dirus_Complex_f_TH2023	252	28	28	308
5	Anopheles_epiroticus_f_TH2023	213	23	28	264
6	Anopheles_hyrcanus_Group_f_TH2023	170	18	19	207
7	Anopheles_maculatus_f_TH2023	202	22	30	254
8	Anopheles_minimus_f_TH2023	161	17	30	208
9	Anopheles_sawadwongporni_f_TH2023	144	16	40	200
10	Anopheles_vagus_f_TH2023	153	17	38	208
11	Armigeres_subalbatus_f_TH2023	225	25	30	280
12	Culex_gelidus_f_TH2023	144	16	22	182
13	Culex_quinquefasciatus_f_TH2023	162	18	20	200
14	Culex_vishnui_Subgroup_f_TH2023	125	13	20	158
15	Mansonia_annulifera_f_TH2023	198	22	20	240
16	Mansonia_indiana_f_TH2023	216	24	20	260
	Total	2966	325	412	3703

Open in a new tab

After morphological identification of each mosquito species under a stereomicroscope (Nikon Corp., Tokyo, Japan), we captured digitized images of each mosquito using a digital camera attached to a mobile phone. All samples were placed against a uniform background for consistency during photography. Both right- and left-side views of the samples were used as standard views for training the neural network. Sample plates with controlled light intensity were employed during the photo capture to minimize selection bias and enhance model training and future applications. Each image was saved in JPG format with a resolution of 3472 × 4624 pixels. In total, 3703 images were collected and subsequently categorized by experts into training, validation, and testing sets at a ratio of 8:1:1. The mosquito species were verified by experienced and professional entomologists based on morphological taxonomic keys^36,37.

Supervised ViT network structure

In this study, the supervised Vision Transformer (ViT) network, as devised by Dosovitskiy et al.³⁰, was employed to classify 16 classes of mosquito images, leveraging its exceptional performance in image classification. The proposed ViT model consists of three primary sections: Patching and Embedding, Transformer Encoder, and MLP Head. During Patching and Embedding, the input image undergoes division into fixed-size patches, which are subsequently encoded with position embedding to yield a linearly embedded patch through the Transformer Encoder. This encoder, central to feature extraction, replaces the necessity for a CNN backbone and comprises multiple encoder blocks featuring a multi-head attention layer and a feed-forward layer, both followed by a normalized layer. The final section, MLP Head, incorporates two dense hidden layers with 2048 and 1024 units, contributing to the prediction process. As the MLP Head aids in predicting the outcome, the output layer generates 16 units corresponding to the number of classes in the mosquito dataset. An illustrative overview of the ViT model architecture is provided in the accompanying Fig. 2.

Fig. 2 — ViT model overview. The illustration was inspired by Dosovitskiy et al.³⁰.

DINOv2 network structure

For the identification of the 16-class mosquito dataset, we employed the DINO-v2 network, which comprises two primary sections: the pretrained DINOv2 model and the classification layer. Within this architecture, the pretrained DINOv2 model integrates self-distillation and serves as a backbone for feature extraction. The backbone incorporates a transformation network designed to generate a high-dimensional feature vector representation from input images. Notably, the features extracted by DINOv2 have exhibited superior performance on downstream tasks compared to task-specific models and existing state-of-the-art models. Subsequently, the classification layer is adapted to the specific task and connected to the end of the DINOv2 backbone model, utilizing the feature vector for image classification. The classifier layer consists of two linear layers: the first layer adjusts the data size from the previous layer to 256 embedding dimensions, followed by a layer transforms it to the corresponding class number (16 in this study). The ReLU activation function is applied at the conclusion of the first layer. A visual representation of the model’s overview in Fig. 3 offers a comprehensive visual understanding of the model’s structure, components, and interactions.

Experimental setup

The mosquito dataset, consisting of 16 mosquito species from 5 different genera, encompassed variations in mosquito positions, lighting conditions, backgrounds, and camera angles within each species. Each image contained at least one unique form. As mentioned earlier, the dataset was split into an 80% training set, 10% validation set, and 10% testing set. Based on the framework pytorch, all experiments were conducted on a server-grade processing workstation with a computer system of Ubuntu 20.04 LTS operating system. The workstation featured an AMD EPYC 7742 64-Core Processor x 44 processors and graphics of llvmpipe (LLVM 6.0, 128 bits) with 1007.7 GB memory. Table 2 outlined the hyperparameters utilized in both the ViT and DINOv2 models.

Table 2.

Hyperparameters configurations.

Parameter	ViT model	DINOv2 model
Input image size	224 × 224	224 × 224
Batch size	64	64
Learning rate	0.000001	0.000001
Maximum Epoch	1000	500
Loss function	CrossEntropyLoss	CrossEntropyLoss
Activation function	Linear, ReLU	Linear, ReLU
Optimizer	Adam	Adam

Open in a new tab

Evaluation metrics

The evaluation parameters including accuracy, recall, precision, specificity, and the F1-score are used to assess the classification performances using the prediction scores (true positive (TP), false positive (FP), true negative (TN) and false negative (FN)) from confusion matrix. Based on these prediction scores, the evaluation parameters are calculated as follows:

Accuracy: The ratio of correctly classified instances (both true positive and true negative) to the entire testing set.

Recall: The ratio of true positive instances to the total instances of the true class in the testing set.

Precision: The ratio of true positive instances to the total instances predicted as positive.

Specificity: The ratio of instances correctly predicted as negative to the total instances of the false class in the test set.

F1-score: A measure of a model’s accuracy that considers both the precision and recall of the test values. It is the harmonic means of precision and recall, providing a balance between the two metrics.

Results

This section presents a comprehensive summary of the training, testing, and evaluation results for the automated monitoring of malaria vector species in Thailand, utilizing self-supervised learning (SSL) techniques. The results are organized into three main categories: (1) evaluating the training outcomes of the Vision Transformer (ViT) and DINOv2 models to find the most optimized model, (2) comparing these models for mosquito species classification using original datasets, and (3) specifically assessing the performance of DINOv2 with cropped image data. The following section elaborates on the detailed experimental results.

The training results for ViT and DINOv2

This section reports the training results for both models, illustrating the accuracy curve and loss curve against the number of epochs. The models achieved optimization when both the training loss and accuracy curves stabilized, as depicted in the subsequent figures. This stability indicates that the models are likely to provide higher-quality predictions.

The training loss and accuracy of ViT model

The performance metrics, including accuracy and cross-entropy loss, of ViT model across various architectures was presented in the accompanying Fig. 4a-b. In the following, simplified notation is used to denote the model size and input patch size. For example, ViT-S/16 refers to the “Small” model with a 16 × 16 patch size, ViT-B/16 refers to the “Base” model with a 16 × 16 patch size, and ViT-L/16 refers to the “Large” model with a 16 × 16 patch size³⁰. Initially trained for 500 epochs, the ViT model exhibited notable instability in training accuracy and loss, with fluctuating minimum and maximum scores in consecutive epochs. Subsequently, an extended training period of 1,000 epochs was implemented to further investigate training loss and accuracy. Despite continued fluctuations in the score range, the model’s performance improved compared to the 500-epoch training. It is observed that the ViT-S architecture achieved its highest accuracy at epoch 915 and its lowest loss at epoch 930. Similarly, the ViT-B architecture attained peak accuracy at epoch 954 and best loss at epoch 743, while the ViT-L architecture demonstrated its highest accuracy at epoch 987 and lowest loss at epoch 872. Notably, the ViT-L architecture outperformed the other architectures in model training, exhibiting higher accuracy and lower loss. Throughout the training process, the trained parameters, including the weight file for model testing, were obtained based on the epoch corresponding to the highest accuracy achieved during training (Fig. 4a).

Fig. 4 — The model training results of ViT model based on different ViT architectures. (a) Epoch versus Accuracy and (b) Epoch versus Loss.

The training loss and accuracy of DINOv2 model

In parallel to the ViT model, Fig. 5 depicted the accuracy and cross-entropy loss of the DINOv2 model across various ViT architectures. Unlike the ViT model, the training accuracy and loss of the DINOv2 model exhibit notable stability throughout the 500-epoch training period, making the extension to 1000 epochs unnecessary. Upon analysis of Fig. 5, it is evident that the ViT-S architecture achieved its highest accuracy at epoch 281 and the best loss at epoch 413. Similarly, the ViT-B architecture attained peak accuracy at epoch 360 and the best loss at epoch 498. The ViT-L architecture demonstrated the best accuracy at epoch 470 and the best loss at epoch 872. Furthermore, ViT-g which denotes the “giant” model achieved the best accuracy at epoch 195 and the best loss at epoch 353. The plots additionally indicate that, while discerning the superior model between ViT-L and ViT-g is challenging, both architectures exhibit better accuracy than the other two models during training. Therefore, the subsequent section will present a performance comparison between these two models to determine the most suitable architecture for automated species monitoring of geographical malaria vectors.

Scenario 1 (Classified with Finetune model based on Supervised ViT and DINOv2 using original images)

To evaluate the classification performances among 16 classes, the two approaches based on ViT and DINOv2 were utilized and compared. Each approach was using a variety of linear sizes. These differences in linear sizes led to distinct model names and varying performances. The evaluation was conducted using the unseen testing dataset outlined in Table 1 for all model classifications. This section describes the performances of these finetune models.

Supervised ViT model

The ViT model’s effectiveness was assessed through the presentation of confusion matrices for three ViT architectures (Table 3). After examining these confusion matrices, it becomed apparent that ViT-S has the largest number of unclassified images, whereas ViT-L shows the smallest number of images that could not be classified, as illustrated in Fig. 6. Notably, species classification for Ae. albopictus, An. maculatus, An. minimus, and An. sawadwongporni pose significant challenges during model evaluation. Further details regarding model-wise and class-wise evaluations are elaborated in the following. Table 3 provided a model-wise performance analysis for comparison. ViT-B architecture exhibits the highest accuracy, precision, and F1 score, while ViT-L achieves the highest recall score, and ViT-S demonstrates the highest specificity score. It is noteworthy that the training performance may not directly correlate with real-time evaluation results. Consequently, the ViT-B architecture is selected for further comparisons of model performance.

Table 3.

Performance for ViT-based various ViT architecture comparison. All evaluation metrics have been thoroughly tested and assessed using data that was not seen during the model training process. This ensures that the metrics accurately reflect the model’s performance on new, unseen data. Bolt numbers represent the highest metric values.

Method	Model	Accuracy	Recall	Precision	Specificity	F1 score
ViT	ViT-S/16	0.9458	0.4135	0.5990	0.9840	0.4893
	ViT-B/16	0.9495	0.5765	0.6785	0.9774	0.6234
	ViT-L/16	0.9493	0.6040	0.6029	0.9755	0.6034

Open in a new tab

Fig. 6 — Confusion matrix for ViT-based various ViT architecture model comparison. (a) ViT-S/16, (b) ViT-B/16, (c) ViT-L/16. Increased color intensity corresponds to higher True Positive (TP) values, while the last row and column of the confusion matrix show the number of images that could not be classified.

DINOv2 model

During the evaluation of the DINOv2 model (Fig. 7), four confusion matrices were presented for the four ViT-based architectures. In contrast to the ViT model, the unclassified species in the confusion matrix for DINOv2 is notably reduced.

However, challenges persist in accurately classifying An. maculatus species, as the model tends to be confused with An. sawadwongporni.

Table 4 also provides a model-wise performance analysis for the DINOv2 model. The results indicate that the ViT-L architecture outperformed the other three architectures, namely ViT-S, ViT-B, and ViT-G. ViT-L achieved an accuracy of 0.9807, recall of 0.8398, precision of 0.8987, and an F1 score of 0.8480. Consequently, the ViT-L architecture is selected for further comparisons of model performance.

Table 4.

Performance for DINOv2-based various ViT architecture comparison. Bolt numbers represent the highest metric values.

Method	Model	Accuracy	Recall	Precision	Specificity	F1 score
DINOv2	ViT-S/14	0.9715	0.7842	0.8185	1.0000	0.7847
	ViT-B/14	0.9762	0.8096	0.8721	1.0000	0.8135
	ViT-L/14	0.9807	0.8398	0.8987	1.0000	0.8480
	ViT-g/14	0.9783	0.8394	0.8754	1.0000	0.8299

Open in a new tab

Additionally, Table 5 provides a class-wise evaluation of the two models, focusing on the better-performing ViT-based architectures. The quality of class-wise predictions is assessed using the confusion matrices shown in Fig. 7a–d. In this evaluation, the study considers class-wise precision and sensitivity, omitting accuracy and specificity scores due to potential bias arising from unbalanced classes in the dataset and the high number of true negatives in one-versus-all evaluation. The results, based on the use of original images, are denoted as ViT (Original) and DINOv2 (Original) in precision and recall evaluation. According to the findings, the DINOv2 model outperforms the ViT model in both precision and recall for all classes of mosquito species. Certain mosquito species, such as Ae. albopictus, An. maculatus, and An. minimus, pose challenges in the classification by the DINOv2 model, with recall scores below 60%. However, the majority of mosquito species (10 classes) demonstrate excellent performance, achieving above 90% in both precision and recall.

Table 5.

Performance comparison between DINOv2 based on ViT-L and ViT model based on ViT-B architecture using original images, and DINOv2 based on ViT-L architecture using cropped images. Bolt numbers represent the highest metric values.

Class name	Precision			Recall
Class name	ViT (Original)	DINOv2 (Original)	DINOv2 (Cropped)	ViT (Original)	DINOv2 (Original)	DINOv2 (Cropped)
Aedes_aegypti_f_TH2023	1.0000	1.0000	1.0000	0.9412	0.9412	1.0000
Aedes_albopictus_f_TH2023	0.5000	1.0000	1.0000	0.0714	0.3571	0.8571
Anopheles_barbirostris_Group_f_TH2023	0.5000	0.7097	0.9167	0.9545	1.0000	1.0000
Anopheles_dirus_Complex_f_TH2023	0.4375	0.6923	0.7500	0.5000	0.6429	0.6429
Anopheles_epiroticus_f_TH2023	0.8621	0.8750	0.8485	0.8929	1.0000	1.0000
Anopheles_hyrcanus Group_f_TH2023	0.3529	0.7368	0.8824	0.6316	0.7368	0.7895
Anopheles_maculatus_f_TH2023	0.0000	0.9375	0.7692	0.0000	0.5000	0.3333
Anopheles_minimus_f_TH2023	0.2857	0.8947	0.9600	0.1333	0.5667	0.8000
Anopheles_sawadwongporni_f_TH2023	0.2955	0.6290	0.5441	0.3250	0.9750	0.9250
Anopheles_vagus_f_TH2023	0.8824	1.0000	1.0000	0.3947	0.7632	0.7105
Armigeres_subalbatus_f_TH2023	0.9655	1.0000	0.9677	0.9333	1.0000	1.0000
Culex_gelidus_f_TH2023	0.7368	1.0000	1.0000	0.6364	0.9545	1.0000
Culex_quinquefasciatus_f_TH2023	0.5385	0.9524	1.0000	0.7000	1.0000	1.0000
Culex_vishnui_Subgroup_f_TH2023	0.6818	0.9524	0.8696	0.7500	1.0000	1.0000
Mansonia_annulifera_f_TH2023	0.7500	1.0000	1.0000	0.9000	1.0000	1.0000
Mansonia_indiana_f_TH2023	0.8571	1.0000	1.0000	0.9000	1.0000	1.0000

Open in a new tab

Scenario 2 (Classified with Finetune model based on DINOv2 using cropped images)

As mentioned above, although DINOv2 model with ViT-L architecture has the better performance in model comparison, it is needed to improve in classification for mosquito species. Since the original images were collected by mobile phone’s adhered digital camera without using zoom, the images have more space for background. When the image is split into fix-size patch before feeding to linear projection of flattened patches, some split may have only background space. It may affect the training process and model classification. To avoid this problem, the original image is cropped using in-house CiRA CORE platform to get 608 × 608 image size. Since the collected image has single mosquito per image, the number of cropped image dataset is the same as the original image dataset. Then DINOv2 with four ViT-based architectures were used to train the cropped train image dataset and evaluate the trained model using the cropped test image dataset. Consequently, DINOv2-ViT-g model trained on cropped images exhibited significantly higher AUC values, reaching up to 0.992, when compared to the model trained on the original images (Fig. 8).

Fig. 8 — Area under the ROC curve. The DINOv2 models trained on original images, including ViT-S, ViT-B, ViT-L, and ViT-g, yield the AUC values of 0.9703, 0.9719, 0.9829, and 0.9783, respectively. In contrast, the same models trained on cropped images produce the AUC values of 0.9879, 0.9842, 0.9844, and 0.9920, respectively. Here, “OI” represents the original image set, while “CI” represents the cropped image set.

The confusion matrices for the four DINOv2 models are depicted in Fig. 9. The matrices indicate an improvement in the classification of mosquito species, apart from the An. maculatus species.

The detailed performance analysis and model comparison are presented in Table 5and Table 6. The comparison of two DINOv2 models with ViT-L architecture using the original images and cropped images is shown in Table 5. In Table 5, DINOv2 using original images has a higher precision score in 5 species and more recall score in 3 species. On the other hand, DINOv2 using original images has a higher precision score in 5 species and a higher recall score in 5 species. In the class-wise precision score of the trained model using cropped images, most mosquito species have at least 75%, except An. sawadwongporni, which has a high false positive score. This effect is influenced by the highest false-negative score of Anopheles maculatus. Subsequently, this effect also reduces the recall score of the Anopheles maculatus species to 33.33%.

Finally, Table 6 presents the comparison of model-wise prediction quality between the original images and the cropped images. The trained model using cropped images has significantly improved performance in four architectures. ViT-S improved by 9% in recall and 10% in precision; ViT-B improved by 6% in recall and 4% in precision; ViT-L improved by 4% in recall and 1% in precision; and ViT-g improved by 4% in recall and 4% in precision. In the ViT-based architectures comparison, ViT-L has the highest recall score of 87.86%, and ViT-g has the highest precision score of 91.71%. Therefore, it can be concluded that the DINOv2 model with ViT-based architecture using cropped images is effectively applied in automated species monitoring of vector-borne diseases transmitted by female mosquitoes in Thailand.

Discussion and conclusion

In view of our experiments, this study represents the inaugural attempt to utilize ViT and DINOv2 for the classification of 16 species (five genera) of female mosquito images commonly observed in Thailand. The ViT architecture comprises a multi-head self-attention layer and a multilayer perceptron (MLP) layer, with hyperparameters tailored to optimize model performance. A comparative analysis was conducted between ViT models and DINOv2 models. DINO-v2 integrates a pretrained model with self-distillation as its backbone, utilizing a transformation network for high-dimensional feature extraction and a classification layer for image categorization. Evaluation metrics such as accuracy, recall, precision, specificity, and F1-score were employed to assess the efficacy of the trained models.

Other automatic classification and detection methods have been suggested in the past. In the adult mosquito species classification, Okayasu et al.¹⁸ constructed mosquito classification algorithms of three genera using the handcraft feature-based conventional method and the convolutional neural network-based deep learning method and ResNet achieved the highest discrimination accuracy of 95.5% among other models. Motta et al.¹⁹ applied the dengue mosquito classification models in the field using CNN models such as LeNet, AlexNet, and GoogleNet, and the best classification accuracy of GoogleNet was 83.9%. Couret et al.²⁰ delimited the variations of cryptic morphological characteristics of 16 species (three genera) using a CNN model including ResNet, Wide ResNet, ResNext, DenseNet. It achieved the classification accuracy of 96.96% for species identification and 98.48% for gender identification. Rustam et al.²² constructed novel feature selection technique using CNN models for classifying two adult mosquito genera and it achieved 98.6% in accuracy. Lastly, Reshma Pise and Kailas Patil²³ constructed three pretrained models (VGG-16, ResNet-50, and GoogLeNet) in two transfer learning approaches including feature extraction and fine-tuning for three mosquito species. Through comparison of the three models, GoogLeNet achieved 92.5% accuracy in feature extraction and 96% in fine-tuning. In comparison, supervised CNN was used in most papers and our proposed optimized model (DINOv2) achieved an accuracy of 98.07% for original images (Table). It was higher than other papers’ achievement except Rustam et al. In addition, we also compare our proposed model with previous object detection models. Kittichai et al.²⁴ used two stages of YOLOv2 and v3 to classify 11 species (5 genera) of adult mosquito, achieving a mean average precision of 99% and a sensitivity of 92.4%. Zhao et al.²⁵ constructed mosquito classification algorithms of 17 species using Swin-transformer. Through comparison of several models and image size, the optimal model was selected and achieved 99.04% in accuracy and 99.16% in F1‑score. While the performance of the proposed DINOv2 model doesn’t significantly surpass that of an object detection model, it offers additional advantages. Unlike object detection models that necessitate labeling objects in images before training, DINOv2 does not require explicit object labeling during training. This characteristic enhances its versatility and efficiency in utilizing unlabeled data for representation learning.

Furthermore, some research works emphasis on the female mosquito because of its significant role in vector-borne disease transmission and the cropped images because of small size of a mosquito to have only one within an image. Azam et al.²⁶ applied four CNN namely ResNet50, MobileNetV2, EfficientNet-B0, and ConvNeXtTiny to classify the cropped images of the three mosquito species and all four gonotrophic stages of the cycle. In the comparison of their overall classification accuracies, ResNet50 and EfficientNet-B0 achieved 97.44% and 93.59% respectively. Lee et al.²⁷ created deep learning detection model in the combination of Faster R-CNN and Swin Transformer to classify 11 mosquito species and non-mosquito (Chironomus) and the model achieved 91.7% in F1-score. Since the proposed DINOv2 model received 98.30% in accuracy and 88.07% in F1-score, the optimized model has higher accuracy than classification models in the use of cropped images. But it is still challenging to compare the detection model according to F1-score. In addition, Park et al.²⁸ built mosquito classification algorithms such as VGG-16, Resnet-50, and SqueezeNet to classify eight species and achieved a classification accuracy of 97.19%. Adhane et al.²⁹ also applied DCNN of VGG-16, ResNet-50 to classify Ae. albopictus species and the performance of VGG (98% in precision, 97% in recall and 98% in F1-socre) is higher than ResNet (97% in precision, 96% in recall, 96% in F1-score). Based on these studies, it is evident that the proposed model encompasses a greater number of female mosquito species compared to previously suggested classification models, both in original images and cropped images. Moreover, this study incorporated a broader range of female mosquito species, totaling 16 species.

In conclusion, this study presents a comprehensive approach to automated species monitoring of vector-borne disease mosquitoes in Thailand, utilizing Vision Transformer (ViT) and DINOv2 models. The dataset encompasses 16 female mosquito species distributed across five main genera, totaling 3703 images captured at a 40-times magnification scale. Both ViT and DINOv2 models are employed, with ViT having three distinct model trainings (ViT-S/16, ViT-B/16, ViT-L/16), and DINOv2 having four (ViT-S/14, ViT-B/14, ViT-L/14, ViT-g/14), constituting the first scenario. Performance comparisons reveal that ViT with ViT-B/16 and DINOv2 with ViT-L/14 achieve superior results. Following this, the study advances to the second scenario, involving the classification of cropped mosquito images using DINOv2 with four ViT architectures. The results demonstrate that training with cropped images significantly enhances performance. This development, focusing on training with single-mosquito cropped images, exhibits substantial potential for improving the effectiveness of mosquito identification, thereby aiding preventive measures against vector-borne diseases transmitted by female mosquitoes.

However, the present study utilized limited dataset sizes, with certain mosquito species lacking sufficient representation, making them less robust against various features. The evaluation of the model, particularly in the case of An. maculatus, remains challenging in both scenarios, and further comparisons with object detection models are necessary for DINOv2. Future studies should focus on collecting more mosquito images for each species and exploring the combination of SSL with object detection models to achieve further improvements. To improve the generalization of the best proposed model, it is essential to explore this technique by comparing it with other AI methods used in public health for mosquito identification and the identification of other vectors globally. The proposed model would benefit from validation with updated data from countries currently experiencing epidemics related to Zika, dengue, and yellow fever. If successful, this technique could demonstrate robustness and suitability for public health services, aligning with the World Health Organization’s goals of eradicating emerging diseases³⁸. Additionally, it could be valuable in emerging countries with arboviral disease cases by monitoring global pathogen risks³⁹. This would aid in predicting and modeling potential scenarios for pandemics and epidemics caused by arthropod-borne diseases.

Acknowledgements

This work (Research grant for New Scholar, Grant No. RGNS 65–212) was financially supported by Office of the Permanent Secretary, Ministry of Higher Education, Science, Research and Innovation (OPS MHESI), Thailand Science Research and Innovation (TSRI). The research, carried out by King Mongkut’s Institute of Technology Ladkrabang under Grant No. RE-KRIS/FF68/40, was supported by the National Science Research and Innovation Fund (NSRF). M.K. was funded by Thailand Science Research and Innovation Fund Chulalongkorn University. We also thank the College of Advanced Manufacturing Innovation, King Mongkut’s Institute of Technology, Ladkrabang who have provided the deep learning platform and software to support the research project.

Author contributions

V.K., T.C., S.L., T.S., and K.M.N. collected the samples and acquired the image data. V.K., K.M.N., and T.T. constructed the mosquito dataset and performed the training, validation, and testing of the DCNN models. V.K., K.M.N., and S.C. wrote most of the manuscript. V.K., S.B., and S.C. designed the study. T.T., S.B., and S.C. contributed source code and a dataset. V.K., M.K., S.B., and S.C. reviewed the manuscript.

Data availability

The data supporting the findings of this study are available from the corresponding author upon reasonable request.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Organization, W. H. Vector-borne diseases, (2020). https://www.who.int/news-room/fact-sheets/detail/vector-borne-diseases
2.Sukkanon, C. et al. Distribution of mosquitoes (Diptera: Culicidae) in Thailand: A dataset. GigaByte (2023). [DOI] [PMC free article] [PubMed]
3.Lupenza, E., Gasarasi, D. B. & Minzi, O. M. Lymphatic filariasis, infection status in Culex quinquefasciatus and Anopheles species after six rounds of mass drug administration in Masasi District, Tanzania. Infect. Dis. Poverty. 10, 1–11 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Ridha, M. R., Rahayu, N., Hairani, B., Perwitasari, D. & Kusumaningtyas, H. Biodiversity of mosquitoes and Mansonia uniformis as a potential vector of Wuchereria bancrofti in Hulu Sungai Utara District, South Kalimantan, Indonesia. Veterinary World. 13, 2815 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Harbach, R. E. Recognition of Lasioconops Theobald, 1903 and Oculeomyia Theobald, 1907 as separate subgenera of the genus Culex Linnaeus, 1758 (Diptera: Culicidae). Zootaxa5319, 595–599 (2023). [DOI] [PubMed] [Google Scholar]
6.Becker, N. et al. Mosquitoes and their control (Springer Science & Business Media, 2010).
7.Laojun, S., Changbunjong, T., Abdulloh, A. & Chaiphongpachara, T. Geometric morphometrics to differentiate species and explore seasonal variation in three Mansonia species (Diptera: Culicidae) in central Thailand and their association with meteorological factors. Med. Vet. Entomol. (2024). [DOI] [PubMed]
8.Jeffries, C. L. et al. Novel Wolbachia strains in Anopheles malaria vectors from sub-Saharan Africa. Wellcome open. Res. 3 (2018). [DOI] [PMC free article] [PubMed]
9.Javed, N., López-Denman, A. J., Paradkar, P. N. & Bhatti, A. EggCountAI: A Convolutional Neural Network Based Software for Counting of Aedes Aegypti Mosquito Eggs. (2023). [DOI] [PMC free article] [PubMed]
10.Javed, N., López-Denman, A. J., Paradkar, P. N. & Bhatti, A. EggCountAI: A convolutional neural network-based software for counting of Aedes aegypti mosquito eggs. Parasites Vectors. 16, 341 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Krieshok, G., Torres Gutierrez, C. MECVision Using computer vision to identify and count mosquito eggs (2022). https://github.com/abtassociates/mecvision
12.Gaburro, J., Duchemin, J. B., Paradkar, P. N., Nahavandi, S. & Bhatti, A. Assessment of ICount software, a precise and fast egg counting tool for the mosquito vector Aedes aegypti. Parasites vectors. 9, 1–9 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Alar, H. S. & Fernandez, P. L. Classifying mosquito presence and genera using median and interquartile values from 26-filter wingbeat acoustic properties. Procedia Comput. Sci.193, 453–463. 10.1016/j.procs.2021.10.047 (2021). [Google Scholar]
14.Kiskin, I., Cobb, A. D., Sinka, M., Willis, K. & Roberts, S. J. 351–366 (Springer International Publishing).
15.Kiskin, I., Cobb, A. D., Sinka, M., Willis, K. & Roberts, S. J. in Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 351–366 (Springer).
16.Yin, M. S. et al. A deep learning-based pipeline for mosquito detection and classification from wingbeat sounds. Multimedia Tools Appl.82, 5189–5205. 10.1007/s11042-022-13367-0 (2023). [Google Scholar]
17.Fernandes, M. S., Cordeiro, W. & Recamonde-Mendoza, M. Detecting Aedes aegypti mosquitoes through audio classification with convolutional neural networks. Comput. Biol. Med.129, 104152. 10.1016/j.compbiomed.2020.104152 (2021). [DOI] [PubMed] [Google Scholar]
18.Okayasu, K., Yoshida, K., Fuchida, M. & Nakamura, A. Vision-based classification of mosquito species: Comparison of conventional and deep learning methods. Appl. Sci.9, 3935 (2019). [Google Scholar]
19.Motta, D. et al. Application of convolutional neural networks for classification of adult mosquitoes in the field. PLOS ONE. 14, e0210829. 10.1371/journal.pone.0210829 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Couret, J. et al. Delimiting cryptic morphological variation among human malaria vector species using convolutional neural networks. PLoS Negl. Trop. Dis.14, e0008904 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Goodwin, A. et al. Mosquito species identification using convolutional neural networks with a multitiered ensemble model for novel species detection. Sci. Rep.11, 13656. 10.1038/s41598-021-92891-9 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Rustam, F. et al. Vector mosquito image classification using novel RIFS feature selection and machine learning models for disease epidemiology. Saudi J. Biol. Sci.29, 583–594. 10.1016/j.sjbs.2021.09.021 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Pise, R. & Patil, K. A. Deep transfer learning framework for the multi-class classification of vector mosquito species. J. Ecol. Eng.24 (2023).
24.Kittichai, V. et al. Deep learning approaches for challenging species and gender identification of mosquito vectors. Sci. Rep.11, 4838. 10.1038/s41598-021-84219-4 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Zhao, D. et al. A Swin Transformer-based model for mosquito species identification. Sci. Rep.12, 18664. 10.1038/s41598-022-21017-6 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Azam, F. B. et al. Classifying stages in the gonotrophic cycle of mosquitoes from images using computer vision techniques. Sci. Rep.13, 22130 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Lee, S., Kim, H. & Cho, B. K. Deep Learning-Based Image Classification for Major Mosquito Species Inhabiting Korea. Insects14, 526 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Park, J., Kim, D. I., Choi, B., Kang, W. & Kwon, H. W. Classification and morphological analysis of vector mosquitoes using deep convolutional neural networks. Sci. Rep.10, 1012. 10.1038/s41598-020-57875-1 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Adhane, G., Dehshibi, M. M. & Masip, D. A deep convolutional neural network for classification of aedes albopictus mosquitoes. IEEE Access.9, 72681–72690 (2021). [Google Scholar]
30.Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:11929 (2010).
31.Oquab, M. et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023).
32.Abd Alaziz, H. M. et al. Enhancing fashion classification with vision transformer (ViT) and developing recommendation fashion systems using DINOVA2. Electronics12, 4263 (2023).
33.Zhang, Z. C., Chen, Z. D., Wang, Y., Luo, X. & Xu, X. S. A vision transformer for fine-grained classification by reducing noise and enhancing discriminative information. Pattern Recogn.145, 109979. 10.1016/j.patcog.2023.109979 (2024). [Google Scholar]
34.Beilei, C., Mobarakol, I., Long, B. & Hongliang, R. Surgical-DINO: Adapter learning of foundation model for depth estimation in endoscopic surgery. arXiv preprint arXiv:2401.06013 (2024). [DOI] [PMC free article] [PubMed]
35.Tayebi Arasteh, S., Misera, L., Kather, J. N., Truhn, D. & Nebelung, S. Enhancing diagnostic deep learning via self-supervised pretraining on large-scale, unlabeled non-medical images. Eur. Radiol. Exp.8, 10. 10.1186/s41747-023-00411-3 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Organization, W. H. Pictorial identification key of important disease vectors in the WHO South-East Asia Region (2020).
37.Rattanarithikul, R., Harrison, B. A., Panthusiri, P., Peyton, E. & Coleman, R. E. Illustrated keys to the mosquitoes of Thailand III. Genera aedeomyia, ficalbia, mimomyia, hodgesia, coquillettidia, mansonia, and uranotaenia. Southeast Asian J. Trop. Med. Public Health. 37, 1 (2006). [PubMed] [Google Scholar]
38.Organization, W. H. World malaria report 2023 (World Health Organization, 2023).
39.Organization, W. H. Global arbovirus initiative: Preparing for the next pandemic by tackling mosquito-borne viruses with epidemic and pandemic potential. (2024).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data supporting the findings of this study are available from the corresponding author upon reasonable request.

[CR1] 1.Organization, W. H. Vector-borne diseases, (2020). https://www.who.int/news-room/fact-sheets/detail/vector-borne-diseases

[CR2] 2.Sukkanon, C. et al. Distribution of mosquitoes (Diptera: Culicidae) in Thailand: A dataset. GigaByte (2023). [DOI] [PMC free article] [PubMed]

[CR3] 3.Lupenza, E., Gasarasi, D. B. & Minzi, O. M. Lymphatic filariasis, infection status in Culex quinquefasciatus and Anopheles species after six rounds of mass drug administration in Masasi District, Tanzania. Infect. Dis. Poverty. 10, 1–11 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Ridha, M. R., Rahayu, N., Hairani, B., Perwitasari, D. & Kusumaningtyas, H. Biodiversity of mosquitoes and Mansonia uniformis as a potential vector of Wuchereria bancrofti in Hulu Sungai Utara District, South Kalimantan, Indonesia. Veterinary World. 13, 2815 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Harbach, R. E. Recognition of Lasioconops Theobald, 1903 and Oculeomyia Theobald, 1907 as separate subgenera of the genus Culex Linnaeus, 1758 (Diptera: Culicidae). Zootaxa5319, 595–599 (2023). [DOI] [PubMed] [Google Scholar]

[CR6] 6.Becker, N. et al. Mosquitoes and their control (Springer Science & Business Media, 2010).

[CR7] 7.Laojun, S., Changbunjong, T., Abdulloh, A. & Chaiphongpachara, T. Geometric morphometrics to differentiate species and explore seasonal variation in three Mansonia species (Diptera: Culicidae) in central Thailand and their association with meteorological factors. Med. Vet. Entomol. (2024). [DOI] [PubMed]

[CR8] 8.Jeffries, C. L. et al. Novel Wolbachia strains in Anopheles malaria vectors from sub-Saharan Africa. Wellcome open. Res. 3 (2018). [DOI] [PMC free article] [PubMed]

[CR9] 9.Javed, N., López-Denman, A. J., Paradkar, P. N. & Bhatti, A. EggCountAI: A Convolutional Neural Network Based Software for Counting of Aedes Aegypti Mosquito Eggs. (2023). [DOI] [PMC free article] [PubMed]

[CR10] 10.Javed, N., López-Denman, A. J., Paradkar, P. N. & Bhatti, A. EggCountAI: A convolutional neural network-based software for counting of Aedes aegypti mosquito eggs. Parasites Vectors. 16, 341 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Krieshok, G., Torres Gutierrez, C. MECVision Using computer vision to identify and count mosquito eggs (2022). https://github.com/abtassociates/mecvision

[CR12] 12.Gaburro, J., Duchemin, J. B., Paradkar, P. N., Nahavandi, S. & Bhatti, A. Assessment of ICount software, a precise and fast egg counting tool for the mosquito vector Aedes aegypti. Parasites vectors. 9, 1–9 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Alar, H. S. & Fernandez, P. L. Classifying mosquito presence and genera using median and interquartile values from 26-filter wingbeat acoustic properties. Procedia Comput. Sci.193, 453–463. 10.1016/j.procs.2021.10.047 (2021). [Google Scholar]

[CR14] 14.Kiskin, I., Cobb, A. D., Sinka, M., Willis, K. & Roberts, S. J. 351–366 (Springer International Publishing).

[CR15] 15.Kiskin, I., Cobb, A. D., Sinka, M., Willis, K. & Roberts, S. J. in Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 351–366 (Springer).

[CR16] 16.Yin, M. S. et al. A deep learning-based pipeline for mosquito detection and classification from wingbeat sounds. Multimedia Tools Appl.82, 5189–5205. 10.1007/s11042-022-13367-0 (2023). [Google Scholar]

[CR17] 17.Fernandes, M. S., Cordeiro, W. & Recamonde-Mendoza, M. Detecting Aedes aegypti mosquitoes through audio classification with convolutional neural networks. Comput. Biol. Med.129, 104152. 10.1016/j.compbiomed.2020.104152 (2021). [DOI] [PubMed] [Google Scholar]

[CR18] 18.Okayasu, K., Yoshida, K., Fuchida, M. & Nakamura, A. Vision-based classification of mosquito species: Comparison of conventional and deep learning methods. Appl. Sci.9, 3935 (2019). [Google Scholar]

[CR19] 19.Motta, D. et al. Application of convolutional neural networks for classification of adult mosquitoes in the field. PLOS ONE. 14, e0210829. 10.1371/journal.pone.0210829 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Couret, J. et al. Delimiting cryptic morphological variation among human malaria vector species using convolutional neural networks. PLoS Negl. Trop. Dis.14, e0008904 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Goodwin, A. et al. Mosquito species identification using convolutional neural networks with a multitiered ensemble model for novel species detection. Sci. Rep.11, 13656. 10.1038/s41598-021-92891-9 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Rustam, F. et al. Vector mosquito image classification using novel RIFS feature selection and machine learning models for disease epidemiology. Saudi J. Biol. Sci.29, 583–594. 10.1016/j.sjbs.2021.09.021 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Pise, R. & Patil, K. A. Deep transfer learning framework for the multi-class classification of vector mosquito species. J. Ecol. Eng.24 (2023).

[CR24] 24.Kittichai, V. et al. Deep learning approaches for challenging species and gender identification of mosquito vectors. Sci. Rep.11, 4838. 10.1038/s41598-021-84219-4 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Zhao, D. et al. A Swin Transformer-based model for mosquito species identification. Sci. Rep.12, 18664. 10.1038/s41598-022-21017-6 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Azam, F. B. et al. Classifying stages in the gonotrophic cycle of mosquitoes from images using computer vision techniques. Sci. Rep.13, 22130 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Lee, S., Kim, H. & Cho, B. K. Deep Learning-Based Image Classification for Major Mosquito Species Inhabiting Korea. Insects14, 526 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Park, J., Kim, D. I., Choi, B., Kang, W. & Kwon, H. W. Classification and morphological analysis of vector mosquitoes using deep convolutional neural networks. Sci. Rep.10, 1012. 10.1038/s41598-020-57875-1 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Adhane, G., Dehshibi, M. M. & Masip, D. A deep convolutional neural network for classification of aedes albopictus mosquitoes. IEEE Access.9, 72681–72690 (2021). [Google Scholar]

[CR30] 30.Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:11929 (2010).

[CR31] 31.Oquab, M. et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023).

[CR32] 32.Abd Alaziz, H. M. et al. Enhancing fashion classification with vision transformer (ViT) and developing recommendation fashion systems using DINOVA2. Electronics12, 4263 (2023).

[CR33] 33.Zhang, Z. C., Chen, Z. D., Wang, Y., Luo, X. & Xu, X. S. A vision transformer for fine-grained classification by reducing noise and enhancing discriminative information. Pattern Recogn.145, 109979. 10.1016/j.patcog.2023.109979 (2024). [Google Scholar]

[CR34] 34.Beilei, C., Mobarakol, I., Long, B. & Hongliang, R. Surgical-DINO: Adapter learning of foundation model for depth estimation in endoscopic surgery. arXiv preprint arXiv:2401.06013 (2024). [DOI] [PMC free article] [PubMed]

[CR35] 35.Tayebi Arasteh, S., Misera, L., Kather, J. N., Truhn, D. & Nebelung, S. Enhancing diagnostic deep learning via self-supervised pretraining on large-scale, unlabeled non-medical images. Eur. Radiol. Exp.8, 10. 10.1186/s41747-023-00411-3 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Organization, W. H. Pictorial identification key of important disease vectors in the WHO South-East Asia Region (2020).

[CR37] 37.Rattanarithikul, R., Harrison, B. A., Panthusiri, P., Peyton, E. & Coleman, R. E. Illustrated keys to the mosquitoes of Thailand III. Genera aedeomyia, ficalbia, mimomyia, hodgesia, coquillettidia, mansonia, and uranotaenia. Southeast Asian J. Trop. Med. Public Health. 37, 1 (2006). [PubMed] [Google Scholar]

[CR38] 38.Organization, W. H. World malaria report 2023 (World Health Organization, 2023).

[CR39] 39.Organization, W. H. Global arbovirus initiative: Preparing for the next pandemic by tackling mosquito-borne viruses with epidemic and pandemic potential. (2024).

PERMALINK

Enhance fashion classification of mosquito vector species via self-supervised vision transformer

Veerayuth Kittichai

Morakot Kaewthamasorn

Tanawat Chaiphongpachara

Sedthapong Laojun

Tawee Saiwichai

Kaung Myat Naing

Teerawat Tongloy

Siridech Boonsang

Santhad Chuwongin

Abstract

Introduction

Table 6.

Materials and methods

Geographical distribution and data collection

Fig. 1.

Table 1.

Supervised ViT network structure

Fig. 2.

DINOv2 network structure

Fig. 3.

Experimental setup

Table 2.

Evaluation metrics

Results

The training results for ViT and DINOv2

The training loss and accuracy of ViT model

Fig. 4.

The training loss and accuracy of DINOv2 model

Fig. 5.

Scenario 1 (Classified with Finetune model based on Supervised ViT and DINOv2 using original images)

Supervised ViT model

Table 3.

Fig. 6.

DINOv2 model

Fig. 7.

Table 4.

Table 5.

Scenario 2 (Classified with Finetune model based on DINOv2 using cropped images)

Fig. 8.

Fig. 9.

Discussion and conclusion

Acknowledgements

Author contributions

Data availability

Declarations

Competing interests

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases