Insect detect: An open-source DIY camera trap for automated insect monitoring

Maximilian Sittinger; Johannes Uhler; Maximilian Pink; Annette Herz

doi:10.1371/journal.pone.0295474

. 2024 Apr 3;19(4):e0295474. doi: 10.1371/journal.pone.0295474

Insect detect: An open-source DIY camera trap for automated insect monitoring

Maximilian Sittinger ^1,^*, Johannes Uhler ¹, Maximilian Pink ¹, Annette Herz ¹

Editor: Ramzi Mansour²

PMCID: PMC10990185 PMID: 38568922

Abstract

Insect monitoring is essential to design effective conservation strategies, which are indispensable to mitigate worldwide declines and biodiversity loss. For this purpose, traditional monitoring methods are widely established and can provide data with a high taxonomic resolution. However, processing of captured insect samples is often time-consuming and expensive, which limits the number of potential replicates. Automated monitoring methods can facilitate data collection at a higher spatiotemporal resolution with a comparatively lower effort and cost. Here, we present the Insect Detect DIY (do-it-yourself) camera trap for non-invasive automated monitoring of flower-visiting insects, which is based on low-cost off-the-shelf hardware components combined with open-source software. Custom trained deep learning models detect and track insects landing on an artificial flower platform in real time on-device and subsequently classify the cropped detections on a local computer. Field deployment of the solar-powered camera trap confirmed its resistance to high temperatures and humidity, which enables autonomous deployment during a whole season. On-device detection and tracking can estimate insect activity/abundance after metadata post-processing. Our insect classification model achieved a high top-1 accuracy on the test dataset and generalized well on a real-world dataset with captured insect images. The camera trap design and open-source software are highly customizable and can be adapted to different use cases. With custom trained detection and classification models, as well as accessible software programming, many possible applications surpassing our proposed deployment method can be realized.

Introduction

The worldwide decline in insect biomass, abundance and diversity has been reported in numerous studies in recent years [1–3]. To identify potential drivers of insect decline, quantify their impact and design effective conservation strategies, more long-term monitoring data across ecological gradients is needed, preferably at high spatiotemporal resolutions [4, 5]. An obstacle to the implementation of large-scale monitoring schemes are often financial restrictions. The estimated costs for a national pollinator monitoring scheme range from ~525 € per site and year for volunteer floral observations with crowd-sourced identification to group level, up to ~8130 € for a professional monitoring with pan traps, transect walks, floral observations and expert identification to species level [6]. Due to these high costs, as well as the time-consuming processing and identification of insect specimens, traditional monitoring methods are usually deployed with a low number of spatial and/or temporal replicates. This often generates snapshot data with restricted potential for analysis and interpretation.

Non-invasive automated monitoring methods, which can acquire data at a high spatiotemporal resolution, can complement traditional monitoring methods that often generate data with a comparatively higher taxonomic resolution. When integrated into frameworks based on artificial intelligence (AI) approaches for rapid automated data processing and information extraction, large amounts of high-quality data can be collected with comparatively lower time and effort required [7]. Standardizing these automated monitoring methods and providing easy accessibility and reproducibility could furthermore decentralize monitoring efforts and strengthen the integration of independent biodiversity observations, also by non-professionals (Citizen Science) [8].

A range of different sensors can be used for automated insect monitoring [9]. These include acoustic [10] and opto-electronic sensors [11–13], as well as cameras [14–20]. Several low-cost DIY camera trap systems use scheduled video or image recordings, which are analyzed in subsequent processing steps [14, 15]. Other systems utilize motion detection software as trigger for the video or image capture [16, 18]. Similar to traditional camera traps used for the monitoring of mammals, these systems often generate large amounts of video/image data, which is most efficiently processed and analyzed by making use of machine learning (ML) and especially deep learning (DL) algorithms [21–23] automatically extract information such as species identity, abundance or behavior [24]. While automated insect classification with specifically trained DL models does not yet reach the accuracy of taxonomic experts, results are still of sufficient quality to perform routine identification tasks in a fraction of time with the potential to significantly reduce human workload [25].

Small DL models with few parameters and relatively low computational costs can be run on suitable devices directly in the field (“AI at the edge”), to enable real-time detection and/or classification of objects the model was trained on. An existing camera trap for automated insect monitoring combines scheduled time-lapse image recordings at 0.33 fps (frames per second) with subsequent on-device insect detection and classification running in parallel, while tracking is implemented during post-processing on a local computer [17]. As an alternative approach, frames produced by the camera can also be used as direct input for an insect detection model running with a high frame rate in real time on the camera trap hardware. In this way, the appearance and detection of an insect can automatically trigger the image capture. If the model is optimized to detect a wide range of possibly occurring insects with a high accuracy, this is usually more robust to false triggers (e.g. moving background) compared to motion detection and can drastically reduce the amount of data that has to be stored. Smaller volumes of data subsequently enable faster and more efficient post-processing even on standard computer hardware. In contrast to systems utilizing cloud computing, there are no networking costs or dependence on wireless broadband coverage and lower power requirements. Furthermore, new possibilities arise with having more information immediately available on the device, especially regarding autonomous decision making (e.g. automatically adjust recording times to capture more insects) or sending small sized metadata with pre-processed information at the end of a recording interval.

Following this approach, we present the design and proof of concept of a novel DIY camera trap system for automated visual monitoring of flower-visiting insects. To the best of our knowledge, there is currently no system available that combines the real-time processing capabilities described in the following, with a completely open software environment, including simple no-code training of state-of-the-art detection and classification models that can be deployed and modified also by non-professionals. The system is based on low-cost off-the-shelf hardware components and open-source software, capable of on-device AI inference at the edge with custom trained models. Our goal was to develop a camera trap that could be easily utilized in monitoring projects involving citizen scientists to achieve a broader application potential. As such, it should be inexpensive, easy to assemble and set up, and provide reliable results without the requirement of expert taxonomic knowledge during data collection. Detailed step-by-step instructions to enable a simple reproduction and optional customization of the camera trap can be found at the corresponding documentation website [26].

The camera trap is resistant to high temperatures and humidity, as well as fully solar-powered, which makes it possible to deploy autonomously during a whole season. Insects landing on a platform with colored flower shapes are detected and tracked in real time at ~12.5 fps, while an image of each detected insect is cropped from synchronized high-resolution frames (1920x1080 pixels) and saved together with relevant metadata to a microSD card every second. The insect images can then be automatically classified with a custom trained model in a subsequent step on a local computer. The on-device tracking capabilities were tested with a fast-flying hoverfly species (Episyrphus balteatus) in a lab experiment, and different metadata post-processing settings for activity/abundance estimation were compared. Five camera traps were continuously deployed during four months in 2023 with a total recording time of 1919 hours. Captured images and metadata were classified, post-processed and analyzed with a focus on six different hoverfly species(-groups). The generalization capability of the classification model was validated on a real-world dataset of images captured during field deployment.

Materials and methods

Hardware

The camera trap system is based on three main hardware components: (1) OpenCV AI Kit (OAK-1) (Luxonis, Littleton, USA), including a 12MP image sensor and a specific chip for on-device AI inference with custom models; (2) Raspberry Pi Zero 2 W (RPi) (Raspberry Pi Foundation, Cambridge, UK), a single-board computer that is used as host for the OAK-1 and stores the captured data on a microSD card; (3) PiJuice Zero pHAT (Pi Supply, East Sussex, UK), an expansion board for the RPi with integrated RTC (real-time clock) for power management and scheduling of recording times.

Two rechargeable batteries (~91Wh combined capacity) connected to a 9W 6V solar panel (Voltaic Systems, New York, USA) are used as power supply for the system. All electronic components are installed in a weatherproof enclosure, which can be mounted to a standard wooden or steel post (Fig 1B). A simple hardware schematic with more details about the connections between the individual components can be found in the supporting information (S1 Fig). Below the camera trap enclosure, a sheet (e.g. acrylic glass or lightweight foam board) with colored flower shapes printed on top is attached to the same post, which acts as visual attractant and landing platform for flower-visiting insects (Fig 1A). This standardizable platform (e.g. 50x28 or 35x20 cm) provides a homogeneous plane background and leads to a uniform posture of insects sitting on it, which increases detection, tracking and classification accuracy with less data required for model training. The cost for the full camera trap setup including all components is ~700 €, a minimal setup can be built for ~530 €. A complete list of all required components and step-by-step assembly instructions can be found at the documentation website [26].

Fig 1 — (A) Field deployment of the camera trap and flower platform (35x20 cm) on wooden post. (B) Weatherproof camera trap enclosure with integrated hardware and connected solar panel.

Software

All of the camera trap software and associated insect detection models are available at GitHub [27]. The software includes Python scripts for livestreaming of camera frames together with the detection model and/or object tracker output, and for capturing high-resolution frames (1080p/4K/12MP) or videos (1080p/4K). All scripts use auto focus, auto exposure and auto white balance for the OAK-1 camera by default. Optionally, the auto focus range can be set by providing the minimum and maximum distance to which the auto focus should be restricted (in centimeters measured from the camera). To set the auto exposure region, the area (bounding box) of a detected insect can be used optionally. Both settings were only implemented recently and were therefore not used during collection of the data presented in the following.

On-device insect detection

Together with the software, custom trained YOLOv5n [28], YOLOv6n [29], YOLOv7-tiny [30] and YOLOv8n [31] object detection models are provided, which can be run on the OAK-1 chip to enable real-time insect detection and tracking. The respective model weights, pre-trained on the MS COCO dataset [32], were fine-tuned on an image dataset, collected with a camera trap prototype in 2022 [33]. The dataset is composed of 1,335 images, of which 1,225 contain at least one annotation (110 background images). A total of 2,132 objects were annotated, including 664 wasps, 454 flies, 297 honey bees, 297 other arthropods, 233 shadows of insects and 187 hoverflies. For detection model training, all originally annotated classes were merged into one generic class (“insect”), the original 4K resolution (3840x2160 pixel) of the images was downscaled and stretched to 320x320 pixel, and the dataset was randomly split into train (1,069 images), validation (133 images) and test (133 images) subsets with a ratio of 0.8/0.1/0.1.

The provided YOLO models are general insect detectors and can also detect insects not included in the dataset and/or on other homogeneous backgrounds (e.g. variations of the artificial flower platform design). Due to the downscaled input image size of 320x320 pixel, the models can achieve a high on-device inference speed, while still keeping good precision and recall values (Table 1). The YOLOv5n model achieved the highest mAP (mean average precision) and recall on the dataset validation split and is used as default model by the camera trap software, with an IoU (intersection over union) and confidence threshold of 0.5 respectively. Google Colab notebooks are provided at GitHub [34] to reproduce the model training and validation, or train detection models on custom datasets.

Table 1. Metrics of the YOLO insect detection models.

Model	Image size [pixel]	mAP^val (@0.5 IoU)	Precision^val	Recall^val	Speed OAK-1 [fps]	Parameters (million)
YOLOv5n	320	0.969	0.955	0.961	49	1.76
YOLOv6n	320	0.951	0.969	0.898	60	4.63
YOLOv7-tiny	320	0.957	0.947	0.942	52	6.01
YOLOv8n	320	0.944	0.922	0.899	39	3.01

Open in a new tab

All models were trained on a custom dataset with 1,335 images (1,069 in train split) to 300 epochs with batch size 32 and default hyperparameters. Metrics (mAP, Precision, Recall) are shown on the dataset validation split (133 images) for the original PyTorch (.pt) models before conversion to.blob format. Speed (fps) is shown for the converted.blob models running on the OAK-1 connected to the RPi Zero 2 W.

On-device processing pipeline

The main Python script for continuous automated insect monitoring uses a specific processing pipeline to run insect detection on downscaled LQ (low-quality) frames (320x320 pixel). This increases inference speed of the model and thereby accuracy of the object tracker that uses the coordinates from the model output as input for tracking insects. The object tracker is based on the Kalman Filter and Hungarian algorithm [35, 36], which can keep track of a moving object by comparing the bounding box coordinates of the current frame with the object’s trajectory on previous frames. By assigning a unique ID to each insect landing on or flying above the flower platform, multiple counting of the same individual can be avoided as long as it is present in the frame. As the lack of visual features on LQ frames would significantly decrease classification accuracy, all tracked detections are synchronized with HQ (high-quality) frames (1920x1080 or 3840x2160 pixel) on-device in real time (Fig 2). The detected and tracked insects are then cropped from the synchronized HQ frames by utilizing the bounding box coordinates (Fig 3). The cropped detections are saved as individual.jpg images to the microSD card of the RPi together with relevant metadata including timestamp, confidence score and tracking ID with an image capture frequency of ~1 s. The image capture frequency can be adjusted optionally, without decreasing the overall pipeline and inference speed. Due to the synchronization of the tracker/model output with the HQ frames, the whole pipeline runs at ~12.5 fps for 1080p HQ frame resolution, or ~3.4 fps for 4K HQ frame resolution. Due to the higher possible inference speed of the detection model at 1080p HQ frame resolution, a higher tracking accuracy can be expected for fast-moving insects, potentially resulting in a more precise activity/abundance estimation. By using 4K HQ frame resolution, more details can be preserved in the cropped insect images, which can result in a higher classification accuracy, especially for small species. This tradeoff should be carefully considered when selecting the HQ frame resolution.

Fig 2 — HQ frames (1080p or 4K) are downscaled to LQ frames (320x320 pixel), while both HQ and LQ frames run in two parallel streams. The LQ frames are used as input for the insect detection model. The object tracker uses the coordinates from the model output to track detected insects and assign unique tracking IDs. Detected insects are cropped from synchronized HQ frames in real time and saved to the microSD card of the RPi together with relevant metadata (including timestamp, tracking ID, coordinates).

Fig 3 — Downscaled LQ frames (320x320 pixel) are used as model input. Detected insects are cropped from synchronized HQ frames (1080p or 4K) on device.

The power consumption of the system was measured at room temperature with a USB power measuring device (JT-UM25C, SIMAC Electronics GmbH, Neukirchen-Vluyn, Germany) connected between the Voltaic 12,800mAh battery and the PiJuice Zero pHAT. The solar panel and PiJuice 12,000mAh battery were not connected to the system for this test, to avoid both components of influencing the measurement (e.g. additional charging of the PiJuice battery). The mean peak power consumption of the camera trap system is ~4.4 W, while constantly detecting and tracking five insects simultaneously, and saving the detections cropped from 1080p HQ frames to the RPi microSD card every second (Fig 4). With a combined battery capacity of ~91Wh, recordings can be run for ~20 hours, even if no sunlight is available for the solar panel to recharge the batteries. The PiJuice Zero pHAT is used for efficient power management. If the PiJuice battery charge level drops below a specified threshold, the RPi will immediately shut down before starting a recording. The respective duration of each recording interval is conditional on the current charge level, which can prevent gaps in monitoring data during periods with less sunlight. The scheduled recording times, charge level thresholds and recording durations can be easily adjusted for each use case.

Fig 4 — The two power spikes at ~5W represent the start and end of the recording. Power consumption was measured during a 5 min recording interval while constantly detecting and tracking five insects, and saving the cropped detections together with metadata to the RPi microSD card every second.

The insect images are classified in a subsequent step, followed by post-processing of the metadata. This two-stage approach makes it possible to achieve high detection and tracking accuracy in real time on-device (OAK-1), combined with a high classification accuracy of the insect images cropped from HQ frames on a local computer.

Insect classification

The data output from the automated monitoring pipeline includes insect images together with associated.csv files containing metadata for further analysis. Classification of the insect images is done in a subsequent step on a local computer, after the camera trap data has been collected from the microSD card. A modified YOLOv5 [28] script that supports several classification model architectures is used to classify all insect images captured with the camera trap and append the classification results (top-3 classes and probabilities) as new columns to the merged metadata. The script is available at GitHub [37] together with other modified YOLOv5 scripts, including classification model training and validation.

For training of the insect classification model, we used model weights, pre-trained on the ImageNet dataset [38]. The pre-trained weights were fine-tuned on a custom image dataset, mainly containing cropped detections captured by six camera traps [39]. The dataset is composed of 21,000 images, of which 18,597 images contain various arthropods (mainly insects). We sorted all images to 27 different classes, including four classes without insects (“none_*”). These additional classes are used to filter out images of background, shadows, dirt (e.g. leaves and bird droppings) and birds that can be captured by the camera trap. Most of the images were detections cropped from 1080p HQ frames, which were automatically collected with six different camera traps deployed in the field in 2022 and 2023. Additional images of some classes with insufficient occurrences in the field data (including “ant”, “bee_bombus”, “beetle_cocci”, “bug”, “bug_grapho”, “hfly_eristal”, “hfly_myathr”, “hfly_syrphus”) were automatically captured (cropped from 1080p HQ frames) with a lab setup of the camera trap hardware to increase accuracy and generalization capability of the model. Detailed descriptions of all classes, which were not only chosen by taxonomical but also visual distinctions, can be found in S1 Table and example images are shown in Fig 5.

Fig 5 — All images were captured automatically by the camera trap and are shown unedited, but resized to the same dimension. Original pixel values are plotted on the x- and y-axis for each image. Four example images are shown for the class “other”, to account for its high intraclass heterogeneity.

For model training, the dataset was randomly split into train (14,686 images), validation (4,189 images) and test (2,125 images) subsets with a ratio of 0.7/0.2/0.1. We compared the metrics of three different model architectures (YOLOv5-cls [28], ResNet-50 [40], EfficientNet-B0 [41]) supported by YOLOv5 classification model training and different hyperparameter settings to find the combination with the highest accuracy on our dataset (S2 Table). We selected the EfficientNet-B0 model trained to 20 epochs with batch size 64 and images scaled to 128x128 pixel, as it achieved an overall high accuracy on the dataset validation and test split, while generalizing better to a real-world dataset compared to the same model trained to only ten epochs. With this model, high top-1 accuracies on the test dataset for all insect classes could be achieved (Fig 6, Table 2). Only three classes (“beetle”, “bug”, “other”) did not reach a top-1 accuracy > 0.9, probably because of a high visual intraclass heterogeneity and not enough images in the dataset.

Fig 6 — The cell values show the proportion of images that were classified to a predicted class (x-axis) to the total number of images per true class (y-axis). The model was trained on a custom dataset with 21,000 images (14,686 in train split). Metrics are shown on the dataset test split (2,125 images) for the converted model in ONNX format.

Table 2. Metrics of the EfficientNet-B0 insect classification model, validated on the dataset test split.

Class	Images^test	Top-1 accuracy^test	Precision^test	Recall^test	F1 score^test
all	2125	0.972	0.971	0.967	0.969
ant	111	1.0	0.991	1.0	0.996
bee	107	0.963	0.972	0.963	0.967
bee_apis	31	1.0	0.969	1.0	0.984
bee_bombus	127	1.0	0.992	1.0	0.996
beetle	52	0.885	0.92	0.885	0.902
beetle_cocci	78	0.987	1.0	0.987	0.994
beetle_oedem	21	0.905	0.905	0.905	0.905
bug	39	0.846	1.0	0.846	0.917
bug_grapho	19	1.0	1.0	1.0	1.0
fly	173	0.971	0.944	0.971	0.957
fly_empi	19	1.0	1.0	1.0	1.0
fly_sarco	33	0.909	0.938	0.909	0.923
fly_small	167	0.958	0.952	0.958	0.955
hfly_episyr	253	0.996	0.996	0.996	0.996
hfly_eristal	197	0.99	0.995	0.99	0.992
hfly_eupeo	137	0.985	0.993	0.985	0.989
hfly_myathr	60	1.0	1.0	1.0	1.0
hfly_sphaero	39	0.974	1.0	0.974	0.987
hfly_syrphus	50	0.98	1.0	0.98	0.99
lepi	24	1.0	0.96	1.0	0.98
none_bg	86	0.988	0.966	0.988	0.977
none_bird	8	1.0	1.0	1.0	1.0
none_dirt	85	0.976	0.902	0.976	0.938
none_shadow	66	0.924	0.953	0.924	0.938
other	79	0.861	0.883	0.861	0.872
scorpionfly	12	1.0	1.0	1.0	1.0
wasp	52	1.0	1.0	1.0	1.0

Open in a new tab

The model was trained on a custom dataset with 21,000 images (14,686 in train split). Metrics are shown on the dataset test split (2,125 images) for the converted model in ONNX format.

The insect classification model was exported to ONNX format, which enables faster inference speed on standard CPUs (~10–20 ms per image), and is available at GitHub [34]. To reproduce the model training and validation, or train classification models on custom datasets, Google Colab notebooks are provided in the same repository [34].

Metadata post-processing

By running the insect classification, all available metadata files from the camera trap output are merged and the top-3 classification results for each insect image are appended to new columns. In most cases, the merged metadata still contains multiple rows for each insect with a unique tracking ID, with the number of rows depending on the image capture frequency (default: ~1 s) and on the duration the insect was present in the frame. With multiple captured images for each individual insect, classification of the images can result in different top-1 classes for the same tracking ID. The Python script for post-processing and analysis [34] is used to generate the final metadata file that can then be used for further data analysis. The weighted mean classification probability is calculated to determine the top-1 class with the overall highest probability for each tracking ID, by calculating the mean probability of each top-1 class per tracking ID and multiplying it with the proportion of images classified to the respective top-1 class to the total number of images per tracking ID:

w e i g h t e d m e a n p r o b a b i l i t y_{t o p 1 c l a s s} = m e a n (p r o b a b i l i t y_{t o p 1 c l a s s}) \times \frac{# i m a g e s_{t o p 1 c l a s s}}{# i m a g e s_{t r a c k i n g I D}}

Only the top-1 class with the highest weighted mean classification probability per tracking ID is kept in the final metadata file, in which each row corresponds to an individual tracked insect.

To make the activity/abundance estimations more reliable, all individual insects (= unique tracking IDs) with less than three or more than 1,800 images are excluded by default. These values can be adjusted optionally, e.g. if a different image capture frequency is used and/or to make the final data more robust to inaccurate tracking of fast-moving insects. This can result in “jumping” tracking IDs, as the limited inference speed of the detection model is providing the object tracker with bounding box coordinates in a frequency that is too low to keep track of the fast-moving insect (S2B Fig). In this case, the track of the insect is lost and a new tracking ID is assigned to the same individual. With the default image capture frequency of ~1 s, the insect had to be tracked for at least slightly over two seconds to be kept in the final dataset. The default upper limit of 1,800 images per tracking ID removes all IDs that were tracked for more than ~30–40 min at the default image capture frequency. This maximum duration depends on the number of simultaneous detections, which can slightly decrease the capture frequency over time. From our experience, objects tracked for > 30 min are most often non-insect detections, e.g. leaves fallen on the flower platform.

For estimation of the respective insect size, the absolute bounding box sizes in millimeters are calculated by supplying the true frame width and height (e.g. flower platform dimensions).

Insect tracking evaluation

The accuracy of the object tracker was tested in a lab experiment. The camera trap hardware was installed 40 cm above the big flower platform (50x28 cm), so that the platform was filling out the whole camera frame, and both were placed in the middle of a 200x180x155 cm insect cage. The script for continuous automated insect monitoring was run for 15 min for each recording interval. We used the slower synchronization with 4K HQ frame resolution (~3.4 fps) to test the more difficult use case for the object tracker, compared to the faster 1080p resolution (~12.5 fps). 15 Episyrphus balteatus hoverflies (reared under lab conditions) were released inside the cage and recorded for 15 min for each of the ten replications. The flower platform was simultaneously filmed with a smartphone camera (1080p, 30 fps) during each recording interval. The videos were played back afterwards with reduced speed (25–50%) and all frame/platform visits of the hoverflies were counted manually. The captured metadata was post-processed with ten different settings for the minimum number of images required per tracking ID to be kept in the final data output. The true platform visits from the video count were then compared with the number of unique tracking IDs for each setting.

Insect classification validation

While the metrics of the custom trained EfficientNet-B0 classification model show a high accuracy on the dataset test split, the underlying dataset was curated and insect images were selected in such a way to include images where the respective class was clearly identifiable and also some images with more difficult cases, e.g. with only a part of the insect visible. For a more realistic measure of the classification accuracy and an estimation of the generalization capability of the model, we compiled a dataset with images captured during field deployment of five camera traps between August 11 and September 18, 2023. All images were automatically cropped detections from 1080p HQ frames. No images from this recording period were included in the dataset for classification model training. All captured 93,215 images from camera trap 1 were classified and subsequently manually verified, to ensure that all images were sorted to the correct class. As some classes were only present with a small number of images, more images from camera traps 2–5 were added to achieve a better class balance, resulting in a total dataset size of 97,671 images. All images from each class were added to the dataset, including false negative classifications (image of target class classified to wrong class) and false positive classifications (image of other class classified wrongly to target class). No images were removed to reflect a real-world use case, which means that many nearly identical images are present in the dataset (e.g. insects moving only slightly or none-insect detections such as leaves) and the overall class balance is biased towards classes with more captured images. For two classes (“bug_grapho”, “fly_empi”), no images were captured during the selected recording period. To run the model validation, a dummy image for each of these classes was added to the dataset and thus results for both classes must be ignored.

Field deployment

Starting in mid-May 2023, five camera traps were deployed at five different sites in southwestern Germany. All sites were located within a range of 50 km and separated by at least 5 km, with an elevation ranging from 90–170 m.a.s.l. As study sites, extensively managed orchard meadows were chosen, where an insect monitoring with focus on hoverflies using traditional methods (Malaise traps, yellow pan traps) was also being conducted. While the area, age and tree density differed between the sites, all meadows were cut at least once per year.

All camera traps were positioned in full sunlight with the solar panel and flower platform facing to the south. Scheduled recording intervals with a respective duration of 40 min (if the current battery charge level was > 70%) were run four to seven times per day, resulting in a recording duration of 160 to 280 min for each camera trap per day. Recordings started at 6, 7, 8, 10, 11, 14, 16, 18 and 19 o’clock, with the start time differing per month and in some cases also per camera trap. Due to differences in the date of first deployment (camera trap 3, camera trap 5) and initial hardware problems (camera trap 1), the total recording times differed between all camera traps (Table 3). To increase power efficiency and avoid potential recording gaps during times of low sunlight and decreased charging of the batteries, the respective duration of each recording interval is conditional on the current battery charge level. This additionally influenced the total recording time per camera trap, depending on the amount of sunlight available to charge the batteries. During field deployment, the artificial flower platform was changed at least once for each camera trap (except camera trap 5) to test different sizes, materials and designs (flower shapes and colors). Therefore, the results presented below should be interpreted with a certain degree of caution in relation to the insect monitoring data, and should be considered primarily as proof of concept for the camera trap system.

Table 3. Recording times of the five deployed camera traps and site details of the orchard meadows.

Camera trap ID	Total recording time [h]	Site ID	Coordinates	Area [ha]
Camtrap 1	362	Gross-Rohrheim	49.718741, 8.509769	1.4
Camtrap 2	456	Boehl-Iggelheim	49.379616, 8.330233	0.7
Camtrap 3	349	Malsch	49.236468, 8.670059	0.9
Camtrap 4	421	Bensheim	49.703971, 8.590160	1.7
Camtrap 5	330	Dossenheim	49.446449, 8.649950	5.4

Open in a new tab

Data visualization

Data analysis and creation of plots presented in the following was conducted with R version 4.3.1 [42] and the R packages tidyverse version 2.0.0 [43], patchwork version 1.1.3 [44] and viridis version 0.6.4 [45]. All required R scripts and associated data to reproduce the plots are available at Zenodo [46].

Results

Insect tracking evaluation

With 10 replications of 15 min recording intervals, the object tracker accuracy was tested together with cropping detected insects from synchronized 4K HQ frames (~3.4 fps) every second. High activity with many frame/platform visits of the hoverflies during the 15 min recording interval resulted in a decreased tracking accuracy, mainly due to hoverflies flying fast and erratic and/or coming close to each other (S2 Fig). This behavior can lead to “jumping” tracking IDs and consequently multiple counting of the same individual, which is reflected in the number of unique tracking IDs if none are removed during post-processing with the setting “-min_tracks 1” (Fig 7). By excluding tracking IDs with less than a specified number of images during post-processing, the activity/abundance estimation can get more robust. Even with a decreased pipeline speed of ~3.4 fps during synchronization with 4K HQ frames, an exclusion of tracking IDs with less than six images (“-min_tracks 6”) led to a relatively precise estimation of hoverfly activity/abundance in the data from the lab experiment (Fig 7).

Fig 7 — Data from 10 replications of 15 min recording intervals with 15 E. *balteatus* hoverflies placed in a cage with the camera trap and flower platform is shown. Linear regression lines illustrate the effect of 10 different tracking IDs filter methods for post-processing of the captured metadata of each recording interval. With “-min_tracks 1” no tracking IDs are excluded. Dashed line indicates optimal result. True frame visits were manually counted from video recordings of the flower platform.

Insect classification validation

To estimate the generalization capability of the custom trained EfficientNet-B0 insect classification model, a dataset with images captured between August 11 and September 18, 2023 was compiled. For some classes (“beetle”, “bug”, “other”, “wasp”) classification accuracy was very low (Fig 8, S3 Table). For the classes “beetle_cocci” and “beetle_oedem”, some of the images were classified as “beetle”, which is the correct taxonomic order and could therefore be analyzed and interpreted without drawing overly wrong conclusions. In cases of uncertainty and wrong model predictions, images seem to have a tendency to be classified to the class “other”, which has a high intraclass heterogeneity in the training dataset. For our focus group of hoverflies (“hfly_*”), a high classification accuracy was achieved for all classes of species (-groups). Some images of Syrphus sp. hoverflies (“hfly_syrphus”) were wrongly predicted as Eupeodes sp. (“hfly_eupeo”), which could be explained by the high visual similarity of both genera. Overall, classes with a high visual intraclass heterogeneity, such as “beetle”, “bug” and “other” could not be classified with a sufficient accuracy in the real-world dataset. Classes with less intraclass heterogeneity achieved a high classification accuracy in most cases.

Fig 8 — The cell values show the proportion of images that were classified to a predicted class (x-axis) to the total number of images per true class (y-axis). Metrics are shown on a real-world dataset (97,671 images) for the converted model in ONNX format. All images were classified and subsequently verified and sorted to the correct class in the case of a wrong classification by the model. A dummy image was added for each of the classes “bug_grapho” and “fly_empi”, as no images of both were captured. Results for both classes must be ignored.

Field deployment

Five camera traps were deployed at five different sites in southwestern Germany, starting in mid-May 2023. The capability of the camera traps to withstand high temperatures and humidity was tested, as well as the performance of the solar panel and two connected batteries used as power supply for the system. The captured insect images were classified with the custom trained EfficientNet-B0 classification model, and metadata was post-processed with the default settings to exclude all tracking IDs with less than three or more than 1,800 images.

Weather resistance

The maximum air temperature measured at the nearest weather station during deployment of the camera traps was 37.5°C in July. The maximum temperature measured during the camera trap recordings was 81°C for the OAK-1 CPU and 66°C for the RPi CPU (Fig 9). Both measurements still lie in the safe operating temperature range for these devices and a normal functionality of the camera trap can be expected if the air temperature does not exceed ~38°C.

Fig 9 — Weather data was taken from the nearest weather station (source: Deutscher Wetterdienst).

For camera trap 5, a humidity/temperature USB logger was placed inside the enclosure from July to mid-August. Even during days with a high mean air humidity, measured at the nearest weather station, the mean humidity inside the enclosure only increased slightly to a maximum of ~15% (Fig 10). For all five camera traps, no buildup of moisture during the recording time until mid-September could be noticed, even without exchanging the originally installed 50g Silica gel pack.

Fig 10 — Weather data was taken from the nearest weather station (source: Deutscher Wetterdienst).

Two rechargeable batteries connected to a 9W solar panel are used as power supply for the camera trap system. During sunny days with a sunshine duration of ~8–12 h per day, the charge level of the PiJuice battery stayed relatively constant at > 80% for all five camera traps (Fig 11). A drop in the battery charge level can be noticed during end of July and end of August, when the sunshine duration per day decreased to ~0–4 h for several days. Since the duration of each recording interval is conditional on the current charge level, decreased recording durations preserved battery charge even when less sunlight was available to recharge the batteries. A fast recovery to charge levels > 80% restored the normal recording durations within a few days, after the sunshine duration increased again.

Fig 11 — The PiJuice battery was charged by a second battery, connected to a 9W solar panel. Weather data was taken from the nearest weather station (source: Deutscher Wetterdienst).

Insect monitoring data

During a recording time of ~1919 h of all five camera traps combined, a total of ~2.34 million images with 49,583 unique tracking IDs were captured between mid-May and mid-September 2023. After post-processing of the metadata with default settings, 23,900 tracking IDs with less than three images (= tracked less than ~2 s) and 85 tracking IDs with more than 1,800 images (= tracked longer than ~30–40 min) were removed from the final dataset, resulting in ~2.03 million images with 25,598 unique tracking IDs.

Out of the 25,598 unique tracking IDs, 8,677 were classified to one of the non-insect classes (“none_bg”, “none_bird”, “none_dirt”, “none_shadow”) (S3 Fig). Flies (“fly_small”, “fly”) were the most frequently captured insects, followed by wild bees excluding Bombus sp. (“bee”) and other arthropods (“other”) (Fig 12F). For hoverflies, 1,090 unique tracking IDs of E. balteatus (“hfly_episyr”), followed by 672 unique tracking IDs of Eupeodes corollae or Scaeva pyrastri (“hfly_eupeo”) and 220 unique tracking IDs of the other four hoverfly classes were recorded by all camera traps. Differences in the composition of the captured insects between the camera traps can be observed, which could have been influenced by site conditions and/or by different flower platforms used for testing purposes during field deployment.

Fig 12 — (A–E) Data from camera traps 1–5. (F) Merged data from all camera traps. Data of non-insect classes not shown. All tracking IDs with less than three or more than 1,800 images were removed.

When an insect leaves the frame/platform and re-enters it again, a new tracking ID is assigned with the risk to count the same individual multiple times, which could influence the activity/abundance estimations. To assess this risk, the minimum time difference to the previous five tracking IDs that were classified as the same insect class was calculated for each unique tracking ID. A total of 3,150 tracking IDs showed a time difference of less than five seconds to the previous tracking ID that was classified as the same class (Fig 13). This equals ~18.6% of the 16,921 total tracking IDs classified as insects in our present dataset, of which some might have been the same individuals re-entering the frame.

Fig 13 — Minimum time difference < 30 s of the previous five tracking IDs classified as the same class is shown. Data of non-insect classes not shown. All tracking IDs with less than three or more than 1,800 images were removed.

Data from the images classified as one of the hoverfly classes is presented in more detail in the following plots, as an example for a functional group of special interest. Overall, the number of captured hoverfly tracking IDs varied throughout May and June, but also the total recording time per day was comparatively low during this period (Fig 14F). A peak in late June/early July is followed by steady numbers throughout July. Less hoverflies were recorded in August, though the recording time per day was also decreased during the first half of the month, due to lower battery charge levels caused by less available sunshine. Significantly lower numbers of hoverflies were captured by camera trap 5 compared to the other four camera traps (Fig 14E). While the maximum recording time per day was normally 280 min for each camera trap, an overlap of changed recording start times resulted in 320 min total recording time per day for camera trap 3 in June 28 (Fig 14C).

Fig 14 — (A–E) Data from camera traps 1–5. (F) Merged data from all camera traps. Grey lines/areas indicate the recording time per day. Dashed lines indicate a change of the flower platform. All tracking IDs with less than three or more than 1,800 images were removed.

With differences in the total recording time per day between days and camera traps, the activity, calculated as the number of unique tracking IDs per hour of active recording, can be a more adequate estimation of hoverfly activity/abundance. While the range of hoverfly activity was mostly similar between all camera traps, a peak in activity at the first day of deployment (May 16) can be observed for camera trap 1 (Fig 15A). In total, 41 unique tracking IDs of hoverflies were captured during a single 40 min recording interval at that day, which extrapolates to 62 tracking IDs per hour. This initially high activity could have been caused by a smaller group of hoverflies visiting the platform multiple times due to a higher attractivity induced by novelty. The highest hoverfly activity for a prolonged period was recorded by camera trap 3 in early July (Fig 15C). A decrease in hoverfly activity for all camera traps in late July/early August could have been caused by more rainfall during this time (Fig 15F).

Fig 15 — (A–E) Data from camera traps 1–5. (F) Merged data from all camera traps. Shaded areas indicate days without recordings. All tracking IDs with less than three or more than 1,800 images were removed. Weather data was taken from the nearest weather station (source: Deutscher Wetterdienst).

The start of the recording intervals was scheduled at 6, 7, 8, 10, 11, 14, 16, 18 and 19 o’clock, with the respective start times differing per month and in some cases also per camera trap. This resulted in differences in the total recording time per hour, with most recordings available from 8, 10, 16 and 18 o’clock, and only few recordings available from 14 o’clock (S4 Fig). Overall, the highest hoverfly activity was measured at 8 and 10 o’clock, with approximately two unique hoverfly tracking IDs captured per hour on average (Fig 16F). About one hoverfly was captured per hour at 7 and 16 o’clock. A similarity in high activity during hours before noon compared to the other times of day can be seen also for bees (S5 Fig) and flies (S6 Fig).

Fig 16 — (A–E) Data from camera traps 1–5. (F) Merged data from all camera traps. Shaded areas indicate hours without recordings. All tracking IDs with less than three or more than 1,800 images were removed.

Discussion

The Insect Detect DIY camera trap system and associated software can be considered as an open-source development platform for anyone interested in automated insect monitoring, including non-professionals that are yet hesitant to delve into this rather complex topic. Due to its resistance to high temperatures and humidity, as well as a low power consumption of ~4.4 W combined with energy supplied by a solar panel, the camera trap system can be autonomously deployed during a whole season. We specifically target non-professionals as potential user group for deployment of the camera trap, such as citizen scientists involved in large-scale monitoring projects. Detailed step-by-step instructions for hardware assembly and software setup are provided at the corresponding documentation website [26] and allow everyone a full reproduction of the system with additional tips on software programming. The insect detection and classification models can be easily retrained on custom datasets without special hardware requirements by using the provided Google Colab notebooks [34].

The artificial flower platform, which is used as attractant for flower-visiting insects and background for the captured images, increases detection and tracking accuracy and can be standardized similar to yellow pan traps to compare insect populations at different sites. As for traditional traps that use visual features to attract insects, several characteristics, including the specific material, size, shape, color and orientation, affect the number and species assemblage of captured insects. Ongoing efforts are made to test and compare alternative materials, shapes and colors for the artificial flower platform. An updated platform design could further increase the visual attraction for a wider range of insect taxa or for specific target groups. By omitting the flower platform and training new detection models on appropriate datasets, the camera trap system could be adapted to different use cases, e.g. to monitor insects visiting real flowers, ground-dwelling arthropods (platform on ground) or nocturnal insects (vertical setup with light source).

Processing pipeline

We implemented a two-stage approach that combines on-device insect detection and tracking in real time on low-resolution frames in the first stage (~12.5 fps for 1080p HQ frame resolution), with classification of the insect images cropped from synchronized high-resolution frames in the second stage. By using our proposed processing pipeline running on the camera trap hardware, it is sufficient to only store the cropped insect detections (~5–10 KB per image at 1080p resolution). This can save a significant amount of space on the microSD card, compared to storing the full frame (~0.5–1 MB per image at 1080p resolution). The separation of both steps also simplifies dataset management and encourages retraining of the classification model, as sorting the cropped insect images into class folders is much easier and faster compared to bounding box annotations required to train a new detection model. Cases with a low detection confidence and images that were classified to the wrong class or with a low probability (e.g. new insect species) should be identified, relabeled/resorted and added to a new dataset version to retrain the detection and/or classification model incrementally. Through this iterative active learning loop, the accuracy of the deployed models can be increased over time and will adapt to specific use cases and/or environments [47].

Insect detection and tracking

While the provided insect detection models will generalize well for different homogeneous backgrounds and can also detect insect taxa not included in the training dataset, accuracy will drop significantly when using them with complex and dynamic backgrounds, such as natural vegetation. By utilizing datasets with annotated images that include these complex backgrounds, models can also be trained to detect and classify insects under more challenging conditions [48–51]. Additionally, other techniques can be used to increase the detection and potentially also tracking accuracy of small insects in complex scenes, e.g. by using motion-informed enhancement of the images prior to insect detection [49, 51]. In general, the dataset size required to train models that can detect insects with a high accuracy increases with a higher visual variety of the insects and the background. If deployed in new environments, species that the model was not trained on might not be reliably detected (false negatives), which can lead to an underestimation of insect activity/abundance. Larger datasets with a high diversity of annotated insects on various backgrounds will increase overall model performance in the future.

The accuracy of the implemented object tracker depends on the frequency of received bounding box coordinates to calculate the object’s trajectory and thereby the inference speed of the detection model. Fast and/or erratically moving insects, as well as insects coming very close to each other, can result in “jumping” tracking IDs and multiple counting of the same individuals. We show that the selection of different post-processing settings regarding the exclusion of tracking IDs with less than a specified number of images can result in more precise activity and abundance estimations. However, it is important to note that we only compared the final number of unique tracking IDs to the true frame visits in the presented experiment, without analyzing the number of false positive and false negative tracklets. As these could cancel each other out, our presented results can only estimate the tracking accuracy. Also, the synchronization with 1080p HQ frames could lead to different results, due to a faster pipeline speed (~12.5 fps) compared to the 4K HQ frame synchronization used in the experiment (~3.4 fps). Furthermore, the software of the OAK-1 device includes three additional object tracker types that can additionally influence the overall tracking accuracy. Further experiments with other insect species under field conditions and more detailed analyses comparing different settings during on-device processing and post-processing are necessary to validate the activity/abundance estimation by using the number of unique tracking IDs. Implementation of additional post-processing options, e.g. by incorporating temporal (timestamp) and spatial (insect coordinates) information from the captured metadata could furthermore increase the accuracy of the tracking results. More sophisticated tracking methods are available in other systems, e.g. a video-based processing pipeline that is able to detect, track and classify insects in heterogeneous environments of natural vegetation and enables the automated analysis of pollination events [52].

Insect classification

When using our provided insect classification model for images of insect species that were not included in the training dataset, wrong classification results must be expected. New approaches incorporate the taxonomic hierarchy into the classification process to identify the lowest taxonomic rank for which a reliable classification result is achieved [53, 54]. Thereby, also completely new insect species that were not included in the training dataset can be correctly classified to higher taxonomic ranks (e.g. family or order), if they are morphologically similar to other species in the respective rank, which were included in the dataset.

When only images are used for automated classification, it is currently not possible to identify many insects to species level, as often microscopic structures have to be examined to distinguish closely related species. New approaches could fuse data generated by different sensors, such as images produced by an image sensor and wing beat frequency data generated by an opto-electronic sensor, to be able to identify insect species that could not be distinguished from similar species with images only [55].

Insect monitoring data

In ecological insect surveys, traditional monitoring methods (e.g. Malaise traps or yellow pan traps) are usually deployed to lethally capture insect specimens that are then identified by trained experts to species level. Due to the high effort and cost required, these methods are often deployed in very limited timeframes and interpretation of the resulting data is restricted by its low temporal resolution. In contrast, data captured with automated methods is available at a significantly higher temporal resolution of up to several hours for each day and covers the whole season. Although the taxonomic resolution is currently still low for the automated classification of many insect taxa, combining data from both automated and traditional methods would benefit analyses significantly and widen the scope of possible interpretations [7]. At the same time, data on estimated insect abundance extracted from the camera trap metadata could be systematically compared with abundances acquired from traditional methods to calculate a conversion factor to allow for a more direct comparison of insect data captured with both methods.

The insect monitoring data presented in this paper can be seen as a proof of concept for automated monitoring with our proposed camera trap system. In this example, we mainly focus on hoverflies as target species, as this group provides important ecosystem services, such as pollination and natural pest control [56]. With camera trap data on the estimated daily activity of different hoverfly species, phenological information on different temporal scales can be extracted. This information could be used to e.g. study the impact of climate change on the phenology of specific species [57]. Activity changes during the day can also be investigated, e.g. to determine appropriate times for pesticide application with reduced impact on beneficial insects [58].

By changing the trap design and/or by adding olfactory attractants, such as pheromone dispensers, the camera trap system could be adapted to a more targeted monitoring of specific pest or invasive insect species [59, 60]. This also opens the opportunity to simultaneously monitor beneficial and pest insect species with the potential to significantly reduce pesticide use by facilitating data-informed decision making.

Future perspectives

With rapid improvements in AI-based software and hardware capabilities, a broad application of these technologies for the next generation of biodiversity monitoring tools can be expected in the near future [7, 9, 19]. However, this also means that it is currently difficult to maintain a methodological continuity over longer periods, which is crucial for standardized long-term monitoring of insects. Storing the raw data generated by automated monitoring devices together with detailed metadata [61], including in-depth descriptions of the deployed trap design, enables reproduction and reprocessing of this data and will allow a comparison to updated technologies in the future [62].

Contrary to traditional methods, automated camera traps generate a permanent visual record of insects in a given location and time that can be reanalyzed with enhanced software capabilities in the future. As it is crucial to collect as many observations as possible to identify long-term trends and causes of insect population changes, future researchers will be grateful for every available record from the past that is our present now [63].

Supporting information

S1 Table. Description of the 27 classes from the image dataset that was used to train the insect classification model.

The images were sorted to the respective class by considering taxonomic and visual distinctions. In some cases, clear taxonomical separations are difficult from images only and the decision to sort an image to the respective class was based more on visual distinction.

(DOCX)

pone.0295474.s001.docx^{(15.2KB, docx)}

S2 Table. Comparison of different classification model architectures and hyperparameter settings supported by YOLOv5 classification model training.

All models were trained on a custom dataset with 21,000 images (14,686 in train split) and default hyperparameters. Metrics are shown on the dataset validation split (4,189 images) and dataset test split (2,125 images) for the converted models in ONNX format.

(DOCX)

pone.0295474.s002.docx^{(19KB, docx)}

S3 Table. Metrics of the EfficientNet-B0 insect classification model, validated on a real-world dataset.

The model was trained on a custom dataset with 21,000 images (14,686 in train split), scaled to 128x128 pixel, to 20 epochs with batch size 64 and default hyperparameters. Metrics are shown on a real-world dataset (97,671 images) for the converted model in ONNX format. All images were classified and subsequently verified and sorted to the correct class in the case of a wrong classification by the model. A dummy image was added for each of the classes “bug_grapho” and “fly_empi”, as no images of both were captured during the deployment period. Results for both classes must be ignored.

(DOCX)

pone.0295474.s003.docx^{(17KB, docx)}

S1 Fig. Hardware schematic of the electronic camera trap components.

Nominal voltage is shown for the PiJuice 12,000 mAh battery.

(TIFF)

pone.0295474.s004.tiff^{(498.3KB, tiff)}

S2 Fig. Examples for limitations in the detection and tracking accuracy.

(A) The same tracking ID is assigned to insects coming close to each other. (B) A fast-moving insect is not correctly tracked, with the risk of a new tracking ID being assigned to the same individual.

(TIFF)

pone.0295474.s005.tiff^{(5.5MB, tiff)}

S3 Fig. Total number of unique tracking IDs for each predicted class.

Merged data from all five camera traps deployed from mid-May to mid-September 2023. All tracking IDs with less than three or more than 1,800 images were removed.

(TIFF)

pone.0295474.s006.tiff^{(436.8KB, tiff)}

S4 Fig. Total recording time per time of day.

Merged data from all five camera traps deployed from mid-May to mid-September 2023.

(TIFF)

pone.0295474.s007.tiff^{(304KB, tiff)}

S5 Fig. Estimated bee activity (number of unique tracking IDs per hour) per time of day.

Merged data from all five camera traps deployed from mid-May to mid-September 2023. Shaded areas indicate hours without recordings. All tracking IDs with less than three or more than 1,800 images were removed.

(TIFF)

pone.0295474.s008.tiff^{(271.1KB, tiff)}

S6 Fig. Estimated fly activity (number of unique tracking IDs per hour) per time of day.

(TIFF)

pone.0295474.s009.tiff^{(293.4KB, tiff)}

Acknowledgments

We would like to thank all caretakers and owners of the orchard meadows for letting us deploy the camera traps during the whole season. Thanks a lot to everybody in the automated insect monitoring community for constant feedback and support. This study is part of the joint project “National Monitoring of Biodiversity in Agricultural Landscapes” (MonViA) of the German Federal Ministry of Food and Agriculture.

Data Availability

The camera trap software and insect detection models are available at GitHub (https://github.com/maxsitt/insect-detect). The insect classification model, the Python script for metadata post-processing and model training notebooks are available at GitHub (https://github.com/maxsitt/insect-detect-ml). The source files for the documentation website are available at GitHub (https://github.com/maxsitt/insect-detect-docs). The modified YOLOv5 scripts, including for classification of insect images, are available at GitHub (https://github.com/maxsitt/yolov5). The datasets for insect detection model training (https://doi.org/10.5281/zenodo.7725941) and insect classification model training (https://doi.org/10.5281/zenodo.8325384) are available at Zenodo. The R scripts and associated data for creation of plots shown in this paper are available at Zenodo (https://doi.org/10.5281/zenodo.10171524).

Funding Statement

The author(s) received no specific funding for this work.

References

1.Hallmann CA, Sorg M, Jongejans E, Siepel H, Hofland N, Schwan H, et al. More than 75 percent decline over 27 years in total flying insect biomass in protected areas. PLoS ONE. 2017; 12(10): e0185809. doi: 10.1371/journal.pone.0185809 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Seibold S, Gossner MM, Simons NK, Blüthgen N, Müller J, Ambarlı D, et al. Arthropod decline in grasslands and forests is associated with landscape-level drivers. Nature. 2019; 574: 671–674. doi: 10.1038/s41586-019-1684-3 [DOI] [PubMed] [Google Scholar]
3.Wagner DL. Insect Declines in the Anthropocene. Annu Rev Entomol. 2020; 65: 457–480. doi: 10.1146/annurev-ento-011019-025151 [DOI] [PubMed] [Google Scholar]
4.Samways MJ, Barton PS, Birkhofer K, Chichorro F, Deacon C, Fartmann T, et al. Solutions for humanity on how to conserve insects. Biological Conservation. 2020; 242: 108427. doi: 10.1016/j.biocon.2020.108427 [DOI] [Google Scholar]
5.van Klink R, Bowler DE, Gongalsky KB, Swengel AB, Gentile A, Chase JM. Meta-analysis reveals declines in terrestrial but increases in freshwater insect abundances. Science. 2020; 368: 417–420. doi: 10.1126/science.aax9931 [DOI] [PubMed] [Google Scholar]
6.Breeze TD, Bailey AP, Balcombe KG, Brereton T, Comont R, Edwards M, et al. Pollinator monitoring more than pays for itself. J Appl Ecol. 2021; 58: 44–57. doi: 10.1111/1365-2664.13755 [DOI] [Google Scholar]
7.Besson M, Alison J, Bjerge K, Gorochowski TE, Høye TT, Jucker T, et al. Towards the fully automated monitoring of ecological communities. Ecology Letters. 2022; 25: 2753–2775. doi: 10.1111/ele.14123 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Kühl HS, Bowler DE, Bösch L, Bruelheide H, Dauber J, Eichenberg David, et al. Effective Biodiversity Monitoring Needs a Culture of Integration. One Earth. 2020; 3: 462–474. doi: 10.1016/j.oneear.2020.09.010 [DOI] [Google Scholar]
9.van Klink R, August T, Bas Y, Bodesheim P, Bonn A, Fossøy F, et al. Emerging technologies revolutionise insect ecology and monitoring. Trends in Ecology & Evolution. 2022; 37(10): 872–885. doi: 10.1016/j.tree.2022.06.001 [DOI] [PubMed] [Google Scholar]
10.Kawakita S, Ichikawa K. Automated classification of bees and hornet using acoustic analysis of their flight sounds. Apidologie. 2019; 50: 71–79. doi: 10.1007/s13592-018-0619-6 [DOI] [Google Scholar]
11.Potamitis I, Rigakis I, Fysarakis K. Insect Biometrics: Optoacoustic Signal Processing and Its Applications to Remote Monitoring of McPhail Type Traps. PLoS ONE. 2015; 10(11): e0140474. doi: 10.1371/journal.pone.0140474 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Rydhmer K, Bick E, Still L, Strand A, Luciano R, Helmreich S, et al. Automating insect monitoring using unsupervised near-infrared sensors. Sci Rep. 2022; 12: 2603. doi: 10.1038/s41598-022-06439-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Parmezan ARS, Souza VMA, Seth A, Žliobaitė I, Batista GEAPA. Hierarchical classification of pollinating flying insects under changing environments. Ecological Informatics. 2022; 70: 101751. doi: 10.1016/j.ecoinf.2022.101751 [DOI] [Google Scholar]
14.Droissart V, Azandi L, Onguene ER, Savignac M, Smith TB, Deblauwe V. PICT: A low‐cost, modular, open-source camera trap system to study plant-insect interactions. Methods Ecol Evol. 2021; 12: 1389–1396. doi: 10.1111/2041-210X.13618 [DOI] [Google Scholar]
15.Geissmann Q, Abram PK, Wu D, Haney CH, Carrillo J. Sticky Pi is a high-frequency smart trap that enables the study of insect circadian activity under natural conditions. PLoS Biol. 2022; 20(7): e3001689. doi: 10.1371/journal.pbio.3001689 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Bjerge K, Nielsen JB, Sepstrup MV, Helsing-Nielsen F, Høye TT. An Automated Light Trap to Monitor Moths (Lepidoptera) Using Computer Vision-Based Tracking and Deep Learning. Sensors. 2021; 21: 343. doi: 10.3390/s21020343 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Bjerge K, Mann HMR, Høye TT. Real-time insect tracking and monitoring with computer vision and deep learning. Remote Sens Ecol Conserv. 2021; rse2.245. doi: 10.1002/rse2.245 [DOI] [Google Scholar]
18.Pegoraro L, Hidalgo O, Leitch IJ, Pellicer J, Barlow SE. Automated video monitoring of insect pollinators in the field. Emerging Topics in Life Sciences. 2020; 4: 87–97. doi: 10.1042/ETLS20190074 [DOI] [PubMed] [Google Scholar]
19.Høye TT, Ärje J, Bjerge K, Hansen OLP, Iosifidis A, Leese F, et al. Deep learning and computer vision will transform entomology. Proc Natl Acad Sci USA. 2021; 118: e2002545117. doi: 10.1073/pnas.2002545117 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Darras KFA, Balle M, Xu W, Yan Y, Zakka VG, Toledo-Hernández M, et al. Eyes on nature: Embedded vision cameras for multidisciplinary biodiversity monitoring. BioRxiv [Preprint]. 2023. bioRxiv 2023.07.26.550656 [posted 2023 Jul 29; cited 2023 Nov 21]. Available from: https://www.biorxiv.org/content/10.1101/2023.07.26.550656v1 [Google Scholar]
21.Pichler M, Hartig F. Machine learning and deep learning—A review for ecologists. Methods Ecol Evol. 2023; 14: 994–1016. doi: 10.1111/2041-210X.14061 [DOI] [Google Scholar]
22.Wäldchen J, Mäder P. Machine learning for image based species identification. Methods Ecol Evol. 2018; 9: 2216–2225. doi: 10.1111/2041-210X.13075 [DOI] [Google Scholar]
23.Borowiec ML, Dikow RB, Frandsen PB, McKeeken A, Valentini G, White AE. Deep learning as a tool for ecology and evolution. Methods Ecol Evol. 2022; 13: 1640–1660. doi: 10.1111/2041-210X.13901 [DOI] [Google Scholar]
24.Tuia D, Kellenberger B, Beery S, Costelloe BR, Zuffi S, Risse B, et al. Perspectives in machine learning for wildlife conservation. Nat Commun. 2022; 13: 792. doi: 10.1038/s41467-022-27980-y [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Ärje J, Raitoharju J, Iosifidis A, Tirronen V, Meissner K, Gabbouj M, et al. Human experts vs. machines in taxa recognition. Signal Processing: Image Communication. 2020; 87: 115917. doi: 10.1016/j.image.2020.115917 [DOI] [Google Scholar]
26.Sittinger M. Insect Detect Docs—Documentation website for the Insect Detect DIY camera trap system. 2023. [cited 2023 Nov 21]. Available from: https://maxsitt.github.io/insect-detect-docs/ [Google Scholar]
27.Sittinger M. Insect Detect—Software for automated insect monitoring with a DIY camera trap system. 2023. [cited 2023 Nov 21]. Available from: https://github.com/maxsitt/insect-detect [Google Scholar]
28.Jocher G. YOLOv5 by Ultralytics (Version 7.0). 2022. [cited 2023 Nov 21]. Available from: https://github.com/ultralytics/yolov5 [Google Scholar]
29.Li C, Li L, Jiang H, Weng K, Geng Y, Li L, et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv:2209.02976v1. 2022. [cited 2023 Nov 21]. Available from: http://arxiv.org/abs/2209.02976 doi: 10.48550/arXiv.2209.02976 [DOI] [Google Scholar]
30.Wang C-Y, Bochkovskiy A, Liao H-YM. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv:2207.02696v1. 2022. [cited 2023 Nov 21]. Available from: http://arxiv.org/abs/2207.02696 doi: 10.48550/arXiv.2207.02696 [DOI] [Google Scholar]
31.Jocher G, Chaurasia A, Qiu J. YOLO by Ultralytics (Version 8.0.0). 2023. [cited 2023 Nov 21]. Available from: https://github.com/ultralytics/ultralytics [Google Scholar]
32.Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, et al. Microsoft COCO: Common Objects in Context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T, editors. Computer Vision–ECCV 2014. Lecture Notes in Computer Science , vol 8693. Springer, Cham. 2014. pp. 740–755. doi: 10.1007/978-3-319-10602-1_48 [DOI] [Google Scholar]
33.Sittinger M. Image dataset for training of an insect detection model for the Insect Detect DIY camera trap. 2023. [cited 2023 Nov 21]. Database: Zenodo [Internet]. Available from: https://zenodo.org/records/7725941 doi: 10.5281/zenodo.7725941 [DOI] [Google Scholar]
34.Sittinger M. Insect Detect ML—Software for classification of images and analysis of metadata from a DIY camera trap system. 2023. [cited 2023 Nov 21]. Available from: https://github.com/maxsitt/insect-detect-ml [Google Scholar]
35.Sahbani B, Adiprawita W. Kalman filter and Iterative-Hungarian Algorithm implementation for low complexity point tracking as part of fast multiple object tracking system. 2016 6th International Conference on System Engineering and Technology (ICSET). Bandung, Indonesia: IEEE; 2016. pp. 109–115. doi: 10.1109/ICSEngT.2016.7849633 [DOI] [Google Scholar]
36.Arun Kumar NP, Laxmanan R, Ram Kumar S, Srinidh V, Ramanathan R. Performance Study of Multi-target Tracking Using Kalman Filter and Hungarian Algorithm. In: Thampi SM, Wang G, Rawat DB, Ko R, Fan C-I, editors. Security in Computing and Communications. SSCC 2020. Communications in Computer and Information Science, vol 1364. Springer, Singapore. 2021. pp. 213–227. doi: 10.1007/978-981-16-0422-5_15 [DOI] [Google Scholar]
37.Sittinger M. Custom YOLOv5 fork for the Insect Detect DIY camera trap. 2023. [cited 2023 Nov 21]. Available from: https://github.com/maxsitt/yolov5 [Google Scholar]
38.Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL: IEEE; 2009. pp. 248–255. doi: 10.1109/CVPR.2009.5206848 [DOI] [Google Scholar]
39.Sittinger M, Uhler J, Pink M. Insect Detect—insect classification dataset v2. 2023. [cited 2023 Nov 21]. Database: Zenodo [Internet]. Available from: https://zenodo.org/records/8325384 doi: 10.5281/zenodo.8325384 [DOI] [Google Scholar]
40.He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. arXiv:1512.03385v1. 2015. [cited 2024 Jan 23]. Available from: http://arxiv.org/abs/1512.03385 doi: 10.48550/arXiv.1512.03385 [DOI] [Google Scholar]
41.Tan M, Le QV. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv:1905.11946v5. 2020. [cited 2023 Nov 21]. Available from: http://arxiv.org/abs/1905.11946 doi: 10.48550/arXiv.1905.11946 [DOI] [Google Scholar]
42.R Core Team. R: A Language and Environment for Statistical Computing. 2023. [cited 2023 Nov 21]. R Foundation for Statistical Computing, Vienna, Austria. Available from: https://www.R-project.org/ [Google Scholar]
43.Wickham H, Averick M, Bryan J, Chang W, McGowan L, François R, et al. Welcome to the Tidyverse. JOSS. 2019; 4(43): 1686. doi: 10.21105/joss.01686 [DOI] [Google Scholar]
44.Pedersen T. patchwork: The Composer of Plots. 2023. [cited 2023 Nov 21]. Available from: https://CRAN.R-project.org/package=patchwork [Google Scholar]
45.Garnier S, Ross N, Rudis R, Camargo AP, Sciaini M, Scherer C. viridis(Lite)—Colorblind-Friendly Color Maps for R. 2023. [cited 2023 Nov 21]. Available from: https://CRAN.R-project.org/package=viridis [Google Scholar]
46.Sittinger M. R scripts and data for the paper "Insect Detect: An open-source DIY camera trap for automated insect monitoring". 2023. [cited 2023 Nov 21]. Database: Zenodo [Internet]. Available from: https://zenodo.org/records/10171524 doi: 10.5281/zenodo.10171524 [DOI] [PubMed] [Google Scholar]
47.Bodesheim P, Blunk J, Körschens M, Brust C-A, Käding C, Denzler J. Pre-trained models are not enough: active and lifelong learning is important for long-term visual monitoring of mammals in biodiversity research—Individual identification and attribute prediction with image features from deep neural networks and decoupled decision models applied to elephants and great apes. Mamm Biol. 2022; 102: 875–897. doi: 10.1007/s42991-022-00224-8 [DOI] [Google Scholar]
48.Bjerge K, Alison J, Dyrmann M, Frigaard CE, Mann HMR, Høye TT. Accurate detection and identification of insects from camera trap images with deep learning. Fang W-T, editor. PLOS Sustain Transform. 2023; 2: e0000051. doi: 10.1371/journal.pstr.0000051 [DOI] [Google Scholar]
49.Bjerge K, Frigaard CE, Karstoft H. Object Detection of Small Insects in Time-Lapse Camera Recordings. Sensors. 2023; 23: 7242. doi: 10.3390/s23167242 [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Stark T, Ştefan V, Wurm M, Spanier R, Taubenböck H, Knight TM. YOLO object detection models can locate and classify broad groups of flower-visiting arthropods in images. Sci Rep. 2023; 13: 16364. doi: 10.1038/s41598-023-43482-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Ratnayake MN, Dyer AG, Dorin A. Tracking individual honeybees among wildflower clusters with computer vision-facilitated pollinator monitoring. PLoS ONE. 2021; 16(2): e0239504. doi: 10.1371/journal.pone.0239504 [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Ratnayake MN, Amarathunga DC, Zaman A, Dyer AG, Dorin A. Spatial Monitoring and Insect Behavioural Analysis Using Computer Vision for Precision Pollination. Int J Comput Vis. 2022; 131: 591–606. doi: 10.1007/s11263-022-01715-4 [DOI] [Google Scholar]
53.Bjerge K, Geissmann Q, Alison J, Mann HMR, Høye TT, Dyrmann M, et al. Hierarchical classification of insects with multitask learning and anomaly detection. Ecological Informatics. 2023; 77: 102278. doi: 10.1016/j.ecoinf.2023.102278 [DOI] [Google Scholar]
54.Badirli S, Picard CJ, Mohler G, Richert F, Akata Z, Dundar M. Classifying the unknown: Insect identification with deep hierarchical Bayesian learning. Methods Ecol Evol. 2023; 14: 1515–1530. doi: 10.1111/2041-210X.14104 [DOI] [Google Scholar]
55.Tschaikner M, Brandt D, Schmidt H, Bießmann F, Chiaburu T, Schrimpf I, et al. Multisensor data fusion for automatized insect monitoring (KInsecta). In: Neale CM, Maltese A, editors. Proc. SPIE 12727, Remote Sensing for Agriculture, Ecosystems, and Hydrology XXV, 1272702. 2023. doi: 10.1117/12.2679927 [DOI] [Google Scholar]
56.Rodríguez-Gasol N, Alins G, Veronesi ER, Wratten S. The ecology of predatory hoverflies as ecosystem-service providers in agricultural systems. Biological Control. 2020; 151: 104405. doi: 10.1016/j.biocontrol.2020.104405 [DOI] [Google Scholar]
57.Forrest JR. Complex responses of insect phenology to climate change. Current Opinion in Insect Science. 2016; 17: 49–54. doi: 10.1016/j.cois.2016.07.002 [DOI] [PubMed] [Google Scholar]
58.Karbassioon A, Stanley DA. Exploring relationships between time of day and pollinator activity in the context of pesticide use. Basic and Applied Ecology. 2023; 72: 74–81. doi: 10.1016/j.baae.2023.06.001 [DOI] [Google Scholar]
59.Preti M, Verheggen F, Angeli S. Insect pest monitoring with camera-equipped traps: strengths and limitations. J Pest Sci. 2021; 94: 203–217. doi: 10.1007/s10340-020-01309-4 [DOI] [Google Scholar]
60.Teixeira AC, Ribeiro J, Morais R, Sousa JJ, Cunha A. A Systematic Review on Automatic Insect Detection Using Deep Learning. Agriculture. 2023; 13: 713. doi: 10.3390/agriculture13030713 [DOI] [Google Scholar]
61.Reyserhove L, Norton B, Desmet P. Best Practices for Managing and Publishing Camera Trap Data. Community review draft. 2023. [cited 2023 Nov 21]. Available from: https://docs.gbif-uat.org/camera-trap-guide/en/ doi: 10.35035/doc-0qzp-2x37 [DOI] [Google Scholar]
62.van Klink R. Delivering on a promise: Futureproofing automated insect monitoring methods. EcoEvoRxiv [Preprint]. 2023. [posted 2023 Sep 01; revised 2023 Oct 26; cited 2023 Nov 21]. Available from: https://ecoevorxiv.org/repository/view/5888/ [Google Scholar]
63.Kitzes J, Schricker L. The Necessity, Promise and Challenge of Automated Biodiversity Surveys. Envir Conserv. 2019; 46: 247–250. doi: 10.1017/S0376892919000146 [DOI] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0295474.r001

Decision Letter 0

Ramzi Mansour

26 Dec 2023

PONE-D-23-38812Insect Detect: An open-source DIY camera trap for automated insect monitoringPLOS ONE

Dear Dr. Sittinger,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Feb 09 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Ramzi Mansour

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Did you know that depositing data in a repository is associated with up to a 25% citation advantage (https://doi.org/10.1371/journal.pone.0230416)? If you’ve not already done so, consider depositing your raw data in a repository to ensure your work is read, appreciated and cited by the largest possible audience. You’ll also earn an Accessible Data icon on your published paper if you deposit your data in any participating repository (https://plos.org/open-science/open-data/#accessible-data).

3. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

4. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: N/A

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1:

Comments to the Authors:

The aim of this paper is to present the Insect Detect DIY camera trap, a low-cost and customizable automated monitoring system for flower-visiting insects, utilizing off-the-shelf hardware and open-source software, allowing for large data collection and reliable insect activity estimation. The paper provides substantial value to the scientific community, as it addresses the current hot topic of obtaining reliable abundance counts of insects, which holds crucial importance for biodiversity conservation. However, the readability could be enhanced, particularly concerning the main goal of the study, which revolves around the detection, tracking, classification algorithms, and their corresponding results.

For better understanding of the study, it would be helpful to clarify the points below.

1. Consider using full names Bombus, Coccinellidae, Coleoptera, Graphosoma etc. for Figures, Tables and within the text, as it improves the overall readability of the study instead of “bee_bombus”, “beetle_cocci”, “bug”, “bug_grapho”, etc.

2. The frequent mention of Python scripts, R-scripts, or output files using their specific output format (.csv) in the text can be reduced to enhance readability. Instead, it is recommended to use a simpler phrase such as "The data output for automated insect monitoring" rather than specifying "The data output from the Python script for automated insect monitoring."

3. Similar to the statement in line 442-444, "with the default settings (“-min_tracks 3” and “-max_tracks 1800”) to exclude all tracking IDs with less than three or more than 1,800 images," there are instances where the text reads like camera trap documentation, which is somewhat understandable. However, a paper presenting impressive results like this should strive for good readability.

4. The paper includes a valuable GitHub repository, which is highly commendable. Therefore, it is suggested to remove all comments regarding code or scripts from the manuscript, including the entire section 2.4. Data Visualization, to enhance readability.

5. I hope I didn't miss it, but I was curious why the focus was solely on hoverflies in the results and particularly in the discussion, considering that the camera and model classified other intriguing insects as well. It may be beneficial to address this in the introduction, as it was rather surprising.

6. Starting at Fig. 6 (Fig 8., S2 Table, line 274) the text “The model was trained on a custom dataset with 21,000” images is repeated four times throughout the manuscript. I think it is enough to mentioned it once in detail and the other times in a shortened version.

7. Fig. 2 is highly impressive. I would suggest that, since it falls under the processing pipeline section, some essential details about the YOLO detection model and its hyperparameters could be included. Additionally, it might be beneficial to incorporate information about the post-processing classification stage, the Efficientnet, and a few minor but significant hyperparameters.

8. If I understood correctly, the YOLO models for insect detection were trained on 1335 images for 300 epochs, while the Efficientnet model for insect classification was trained on 21000 images for 20 epochs. It would be helpful if you could provide a clearer explanation for this choice of experimental setup, particularly addressing and discussing any concerns regarding potential overfitting when training with just 1335 images for 300 epochs.

9. Additionally, it could be beneficial to mention that Resnet50, YOLO standard backbone, and Efficientnet were employed in the study, with the final decision to use Efficientnet based on its superior performance.

10. Caption for Fig. 8 is too lengthy; consider providing a more detailed explanation in Section 3.2 on Insect Classification Validation.

Reviewer #2:

This paper presents an automated, do-it-yourself camera trap system for monitoring flower-visiting insects. The system consists of two components: a real-time camera with a deep learning-based object detector that identifies and captures insect images on an artificial platform, and an insect classification model that identifies species from captured images.

The camera trap's accuracy was tested in a controlled laboratory experiment using hoverflies as a test species. The classification model was validated using images captured by deploying the camera trap at test sites. Additionally, the authors conducted a brief case study demonstrating the system's capabilities by analyzing data related to hoverfly behavior.

The authors employed appropriate methods and provided detailed documentation and guidance to ensure the camera trap's reproducibility for non-expert users, encouraging citizen science engagement in insect monitoring. This is a significant contribution of the paper.

However, I believe some issues related to the camera trap's evaluation should be addressed before accepting the paper for publication. If these concerns can be satisfactorily addressed, I recommend accepting the paper for publication in PLoS ONE.

--------------------

Major comments

--------------------

(1) A key strength of this paper, compared to the existing literature discussed in the introduction, lies in its development of a camera trap system built with open-source software. This system, accessible to non-experts in computer science or engineering, addresses a significant gap in the field. However, the introduction currently lacks a clear explanation of this gap and its significance. To provide readers with better understanding of this paper's contributions, I recommend the authors include a concise description of this research gap before introducing current research at Line 93.

(2) While the introduction mentions real-time processing, it doesn't fully justify its preference over offline processing. Providing more detail on this aspect would benefit the reader's understanding of the proposed system's necessity.

(3) The abstract (Line 34) and Discussion section (Line 587) states that "on-device detection and tracking reliably estimated insect activity/abundance...". I am a bit confused about this statement as under Insect track evaluation results (Line 390) I did not find any quantitative metrics that was calculated to measure the reliability or accuracy of insect tracking to back up the above mentioned statement. While there are standard methods to evaluate accuracy of a Multi-object tracking problem (e.g Luiten et al. "HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking", IJCV 2020.), I do not think a complete evaluation of tracking is necessary as the final measurement of the system is the number of insects landed on the platform. However, as the reliability and accuracy of insect counting is integral for the system, some sort of quantitative metric is necessary.

Authors have attempted to address this by comparing the total number of automatically detected insect tracks to those observed visually. I do not agree this is an appropriate way to measure the reliability because the metric itself is susceptible to misleading results due to potential cancellation of false positives and negatives.

Alternatively, I recommend presenting separate counts for True Positives, False Positives, and False Negatives generated by the on-device tracking. These counts can then be used to calculate Precision, Recall, and F-score for the camera trap's detection of hoverflies (https://en.wikipedia.org/wiki/Precision_and_recall). Similar metrics have been used by other research cited in this paper (e.g: 17, 48). Presenting these final average metrics (Precision, Recall, F-score) in the abstract and Discussion/Conclusion sections would provide a more quantitative and reliable measure of the system's detection accuracy.

(4) In the Introduction (Line 102), Methods (Line 191) and Discussion (Line 588) authors state that the “...the whole pipeline runs at 190 ~12.5 fps (1080p HQ frame resolution), which is sufficient to reliably track most insects…”. However, the evaluations presented in the insect tracking section (Line 320) utilize 4K HQ frames. This inconsistency raises concerns about which resolution was used for the actual classification task, as the authors mention both 1080p and 4K image synchronization throughout the manuscript. Since image resolution can significantly impact classification accuracy, it's crucial to clarify this point. If the camera trap system and its classification model were indeed tested on 4K captured (and cropped) images, reporting the processing speed for 4K frames would be more accurate and reflect the true operational speed of the system. Additionally, a brief discussion on the trade-offs between speed and resolution, and how these choices affect the monitoring of different insect species, would be valuable for readers.

(5) I appreciate the detailed hardware assembly instructions provided in the documentation. However, it's currently unclear how the individual components connect to each other. To address this, I suggest incorporating a simple hardware schematic (complimenting Figure 1) to visually illustrate the connections between each component.

(6) Using artificial flower visits as a proxy for insect counts may not be as accurate as direct flower observations. This is because insect visitation to artificial flowers can be influenced by various characteristics of the flowers themselves. Please include a brief discussion on how the choice of platform materials and colors could affect capture rates, thereby improving the reliability of this method.

(7) Line 87-89: The detection rate and accuracy of a deep learning model depends on various factors including its architecture and quality of the training dataset. Hence, the use of deep learning-based models for insect detection can result in false negative detections leading to underestimating insect counts. Please briefly mention this in the introduction and further discuss how this drawback could be overcome in the Discussion section.

--------------------

Minor Comments

--------------------

Presentation Structure: Authors have presented this study in easy to understand clear language. However, the structure the manuscript presented was confusing to me. I would like to propose authors to consider restructuring the manuscripts Methods and Materials and Results sections. Here Methods and Materials sections would contain two subsections on Camera Trap and Insect Classification model and only contain methodology associated with it. The Results section (or Experiments and Results) sections would contain all the experimental evaluations including results of YOLO models, classification model, field deployment etc, divided among two subsections on Camera Trap and Insect Classification model.

Line 24: Not all traditional monitoring methods (e.g. focal flower observations or quadrat observations, transect walks) may provide data with high taxonomic resolution.

Line 68 and 71: Other research that uses motion detection for insect monitoring include “van der Voort, Genevieve E., et al. "Continuous video capture, and pollinia tracking, in Platanthera (Orchidaceae) reveal new insect visitors and potential pollinators." PeerJ 10 (2022): e13191.”, “Steen, Ronny. "Diel activity, frequency and visit duration of pollinators in focal plants: in situ automatic camera monitoring and data processing." Methods in Ecology and Evolution 8.2 (2017): 203-213.”

Line 106: Please provide an appropriate reference and data on the speed of the hoverfly species.

Line 123: Is 91 Wh the combined capacity of the two batteries?

Line 127: What are the dimensions of the platform?

Line 131: I suggest including the component list and associated cost values also in the supporting materials. This is because the cost provided in the manuscript may change over time.

Line 158: It is unclear on what types other homogeneous backgrounds the YOLO model was tested with. Could you please clarify?

Line 161: Why were the metrics not calculated for the test split?

Line 178: Please provide references for Kalman Filter and Hungarian Algorithm.

Line 191: [See Major comment 3 and 4]

Line 199: Please include the type of metadata recorded in the figure caption.

Line 205: Please include more information on how the power consumption was measured. What device did you use to measure the energy consumption? Under what ambient conditions (temperature, humidity) was the test conducted? Was the solar panel connected to the system during this test? Were 5 insects tracked simultaneously or sequentially during the test? Also, what was the camera resolution?

Line 208: Was the estimate of 20 hours calculated considering the threshold value mentioned in line 210?

Line 229: Could you please explain why YOLOv5 was used on captured images first to classification without directly using EfficientNet-B0 on captured images?

Line 234: Please provide a reference to the EfficientNet-B0 model.

Line 259: Could you please explain why high inference speed is critical for this step. As the images are classified offline and as the main aim is to achieve the best classification accuracy, shouldn’t the accuracy be prioritized over inference speed?

Line 313: Please change “mm” to millimeters.

Line 324: Please provide the video settings used by the smartphone camera including its resolution and framerate. Also, could you please provide more detail on the speed the smartphone camera video was played at? (e.g 50% of original speed).

Line 364: Please explain why a threshold of 70% was used? Why not set a lower value allowing us to record more data?

Line 416: I suggest authors include the S3 table in the manuscript as it reflects the performance of the classification model on the real world data. Also Table 2 can be moved to Supplementary materials.

Line 516: I suggest authors present the analysis of hoverfly behavior under a subsection “Example data analysis" or " Case Study”.

Line 579: Could you please provide any references to support the statement that the artificial flower platform can be standardized similarly to yellow pan traps. [Also see Major comment 6].

Line 582: Please change the “camera trap system” to “camera trap hardware” as the software system was not evaluated for monitoring insects visiting real flowers or in outdoor settings.

Line 583: I agree with authors that the presented hardware system can be used to monitor insects visiting real flowers. However, it is unclear how the software solution will translate to monitoring insect visits to real flowers. Basic monitoring insect visit to real flowers requires detecting insects in an image, obtaining its coordinates, tracking its position and movements with changes in environment and its posture, and comparing insect coordinates with flower coordinates to identify flower visits. Could you please mention how the presented methods can achieve these requirements, or alternatively present these requirements as future work. This could be an expansion to the discussion presented under Insect detection and tracking subsection in the Discussion section.

Line 603: Under insect detection and tracking section please include a discussion on how the methods presented in this study can be implemented with different IoT platforms and how development of more efficient computational platforms can leverage the results of this study.

Line 609, 611, 625: Studies cited in [45,46,48] can also be discussed in the introduction section.

Line 611: Other research that used motion enhancement includes “Ratnayake, Malika Nisal, Adrian G. Dyer, and Alan Dorin. "Tracking individual honeybees among wildflower clusters with computer vision-facilitated pollinator monitoring." Plos one 16.2 (2021): e0239504.”

Line 620 - 622: Currently there is research being conducted on re-identification of insects (see. Borlinghaus, Parzival, Frederic Tausch, and Luca Rettenberger. "A Purely Visual Re-ID Approach for Bumblebees (Bombus terrestris)." Smart Agricultural Technology 3 (2023): 100135.”). Please discuss the possibility of using a similar mechanism for the proposed study to improve its sampling accuracy.

Fig 2.: Please label the purple line from Script node to cropped detections.

Fig 4: Please indicate the start and end of the recording period.

Fig 6 and Fig 8: Please label the color bar.

Fig. 7: I suggest removing the “15 min recording” from axis titles to simplify the plot. Also rename y axis to “Ground truth” or “Manual video observations” and the x axis to “Camera trap recordings”.

Fig 11. I suggest moving this figure to supplementary materials. Results presented in 11F are a bit confusing as all the camera traps were not deployed for the same period of time. I recommend removing plot 11F. Could you please provide more information on how sunshine was measured? Please adjust the secondary y axis scale to match that of the primary y axis. Also include a legend describing what each line in the plot represents.

Fig.12: Please make all y axis scales 12A-12E same to enable easy comparison of data across camera traps.

Fig 14: As the recording time per day varies between camera traps, I suggest normalizing the number of unique tracking IDs by the recording time. Please make all y axis scales 14A-14E same to enable easy comparison of data across camera traps

Fig 15: As the discussion does not extensively analyze the relationship between rainfall and hoverfly activity, this Figure can be removed.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Apr 3;19(4):e0295474. doi: 10.1371/journal.pone.0295474.r002

Author response to Decision Letter 0

29 Jan 2024

Reviewer #1:

Comments to the Authors:

For better understanding of the study, it would be helpful to clarify the points below.

Authors response: Thank you for your suggestion. We agree that using the full taxonomic names could increase the readability compared to using the class labels with sometimes abbreviated taxonomic names. However, we also do not want readers to assume that our presented classification model is able to precisely separate insects into their respective taxonomic ranks. To avoid this assumption, we purposely used the class labels in the Materials and methods section when describing the classes, as well as in the figures and tables. In lines 280-281 we refer to S1 table, in which all class labels are described and associated to their respective taxa(-groups). Additionally, we added some more information in lines 280-281 to make it clear that not all classes correspond to strictly defined insect taxa. In the Results section we mostly used the full names together with the class labels to improve readability. In line 532, we added “[…] wild bees excluding Bombus sp. (“bee”) […]” to make the distinction to the two other classes containing bee species (Apis mellifera and Bombus sp.) clearer.

Authors response: Thank you for this reasonable suggestion. As recommended, we simplified the phrasing in lines 258, 264, 321, 323, 328 and 423-424.

Authors response: Thanks a lot for this nice comment. As recommended, we simplified the phrasing in lines 367 and 485. There are still more instances in the text that might read like camera trap documentation, but this is due to the scope of the manuscript describing a novel methodology/tool. We tried to simplify as much as possible, but in our opinion many aspects of the hardware/software must be described with some minimum amount of detail to be able to fully understand the whole system.

Authors response: Thank you for this suggestion. However, we think that the details we mention about our code are necessary to understand the respective context. Therefore, we would also like to keep the last section in Material and methods “Data visualization”, as it describes very briefly how the analyses and plots shown in the Results section were created. We also think that this short section is the best place to refer to the Zenodo repository where all R scripts and associated data are published to reproduce and check our analyses.

Authors response: Thank you for this question. We do address this briefly in the manuscript. In lines 558-559, we mention that “Data from the images classified as one of the hoverfly classes is presented in more detail in the following plots, as an example for a functional group of special interest.” In lines 722-724 we explain that “In this example, we mainly focus on hoverflies as target species, as this group provides important ecosystem services, such as pollination and natural pest control”. As many hoverfly species combine two important ecosystem functions in the agricultural landscape (adults: pollination; larvae: pest control) and we trained our classification model to differentiate six different hoverfly species(-groups), we chose to focus on this group for the proof-of-concept data analysis. The tracking experiment was carried out with the hoverfly species Episyrphus balteatus, as we could rear this species in our facilities. This species and relatives are known to fly rather fast and erratically, which made it suitable to test the tracking abilities under more difficult conditions. In our opinion, we made it clear that this species group was only shown as a proof of concept and other groups could be captured and analyzed in a similar way (e.g. with updated classification models that can differentiate more classes). As recommended, we added more information about our focus on hoverflies in the introduction in line 126: “[…] with a focus on six different hoverfly species(-groups).”

Authors response: As recommended, we shortened this information in lines 305-306, 311-312 and 471-472. For S2 Table we did not shorten the text, as it could also be viewed independently from the main text and already includes the minimum amount of detail that is necessary to accompany the table content.

Authors response: Thank you very much for this nice comment and the suggestion. We understand that including more details in Fig 2 could give the reader more information about the whole processing pipeline. However, we also want to keep the diagram as simple and general as possible, to facilitate understanding for non-professionals as well. As the detection models and their respective hyperparameters can be freely customized by each user (e.g. YOLOv5/v6/v7/v8 but also other architectures are possible), while still running them in this same processing pipeline, including more detailed information could potentially give the false impression that only this one specific model can be used in the pipeline. The diagram in Fig 2 only shows the processing pipeline that is run in real time on the camera trap hardware. We clearly mention in the manuscript text, that the captured images are classified in a subsequent step and metadata should be post-processed prior to further analysis.

Authors response: We purposely omitted specific details about the training runs leading up to the models that we published, as this would go beyond the scope of this paper. By publishing all datasets and Google Colab notebooks that were used to train the presented models, we encourage everyone interested to reproduce our training results and test e.g. different hyperparameters such as the number of epochs the model is trained to. While training the models we considered the best practices to avoid overfitting, e.g. by observing the loss on the validation set. Training to 300 epochs for the YOLO detection models and to 20 epochs for the EfficientNet-B0 classification model resulted in the highest possible mAP/top-1 accuracy respectively without an increase in the validation loss. Strictly speaking, the detection models probably kind of overfit to the flower platform, which always provides the same background. But this is expected and we make it clear that our provided models will only work well if the background for insect detection is constant and homogeneous (e.g. in lines 652-654).

Authors response: Thanks a lot for this suggestion. We added this information in line 291, details on the model comparison can be found in S2 Table.

10. Caption for Fig. 8 is too lengthy; consider providing a more detailed explanation in Section 3.2 on Insect Classification Validation.

Authors response: Thank you for this hint. We shortened the caption for Fig 8 in lines 471-472 to not include details about the classification model training, that were already provided previously in the text (see also response to comment 6).

Reviewer #2:

Major comments

Authors response: Thank you very much for this nice comment and suggestion. As recommended, we added a short section in lines 102-106 to emphasize the mentioned research gap regarding the accessibility of these systems to non-professionals. From our point of view the significance of this contribution is explicitly addressed in lines 108-110 with the sentence “Our goal was to develop a camera trap that could be easily utilized in monitoring projects involving citizen scientists to achieve a broader application potential.” We now also emphasize the accessibility to non-professionals more at the beginning of the Discussion, by adding “[…] including non-professionals that are yet hesitant to delve into this rather complex topic.” in line 610.

Authors response: Thanks for this very reasonable suggestion. We agree that we did not give enough detail about this aspect and added more information about potential benefits of real-time on-device processing in lines 92-99.

Authors response: Thank you very much for this detailed comment and very valuable suggestion. We fully agree that the presented metric of comparing the true hoverfly platform/frame visits, manually counted in smartphone video recordings of the platform, with the number of unique tracking IDs, generated by the on-device tracking and subsequent post-processing with different settings, is susceptible to misleading results due to potential cancellation of false positives and false negatives. Presenting more established metrics, such as Precision, Recall and F1-score not only for the detection accuracy (calculated on the validation set and shown in Table 1) but also for the tracking accuracy, would indeed support our previous statement of “[…] reliably estimated insect activity/abundance […]” more profoundly. As the tracker algorithm directly uses the output of the detection model (bounding box coordinates), the presented detection accuracy can still be considered as a first estimation of also the tracking accuracy, e.g. regarding a high Recall of the detection model corresponding to a presumably lower number of false negative tracklets (= tracking ID) occurring during object tracking (but potentially higher number of false positives). From our experience, false negative tracklets, meaning that an insect is not detected when entering the frame and no bounding box coordinates are given to the object tracker, are very rare. On the other hand, the assignment of false positive tracklets (e.g. multiple tracking IDs for the same individual because it is moving too fast or swaps the ID with an individual coming to close) occur much more often, as can be seen in Fig 7 for the unprocessed data (“-min_tracks 1”). We are aware that this is only anecdotal evidence at this point, but still believe that the presented experiment can provide some interesting information about the performance of the on-device tracking.

Compared to the mentioned research of Bjerge et al. 2021 (Ref. 17) and Ratnayake et al. 2022 (Ref. 48), we had to deal with data that are more complicated to analyze regarding calculation of true/false positive and false negative tracklets. In contrast to both works, we did not have video/full frame recordings with associated metadata available to evaluate the number of true positive, false positive and false negative tracklets. With only cropped detections and metadata (including timestamp, tracking ID and bounding box coordinates) together with videos taken with a different camera while running the automated insect monitoring script, it is much more difficult to associate the automatically captured tracklets with the true frame visits, as no bounding boxes with tracking IDs can be drawn on the frames from the video recordings. This would have required a frame-by-frame analysis of the videos together with manually comparing the exact timestamps of every hoverfly tracklet (= tracking ID). Also, we would like to remark that Bjerge et al. 2021 (Ref. 17) did in fact not evaluate false negative tracklets in the sense of undetected insects, but only in the sense of wrongly classified species. Their presented metrics can therefore not be interpreted in a way the reviewer is indicating.

Furthermore, as we compare different post-processing settings regarding the minimum number of images (captured per second) that are required to include the respective tracking ID in the final data, the true/false positive and false negative tracklets would still not fully describe the reliability of the activity/abundance estimations. In the presented tracking experiment, we wanted to show an estimation of the tracking accuracy under difficult conditions. For this reason, we chose a fast and often erratically flying hoverfly species at a high activity level (due to containment in an insect cage at high numbers) together with 4K HQ frame synchronization for the on-device processing pipeline at a reduced speed of ~3.4 fps compared to ~12.5 fps for 1080p HQ frame synchronization. At a lower framerate the object tracker receives bounding box coordinates in a lower frequency, which means that the calculation of the object trajectory is less precise which can lead to “jumping” tracking IDs if the track of an individual is lost. This is another possible factor that will affect the overall tracking accuracy of the system, depending on the user settings. Also there are three more object tracking types available in the software/API of the OAK-1 device (https://docs.luxonis.com/projects/api/en/latest/components/nodes/object_tracker/#supported-object-tracker-types and https://dlstreamer.github.io/dev_guide/object_tracking.html) that could lead to different results regarding tracking accuracy of the system.

We hope that it is acceptable for the reviewer that we could not show a more precise evaluation of the tracking accuracy of our system at this stage, as many factors influence the respective tracking accuracy depending on the settings during image capture and post-processing, together with the difficulty of calculating true/false positive tracklets and false negative tracklets with the data that we have available at the moment. As the camera trap software is still under continuous development, we plan to run more targeted experiments to assess the tracking accuracy at different settings more rigorously, also with various insect species and under field conditions.

To account for the missing calculation of the appropriate metrics regarding tracking accuracy in the presented experiment, we changed the phrasing in lines 35, 123, 211, 634 and 668-669. Additionally, we now explicitly mention this shortcoming in the Discussion section in lines 673-683 and give more information about the object tracker in lines 666-668.

(4) In the Introduction (Line 102), Methods (Line 191) and Discussion (Line 588) authors state that the “...the whole pipeline runs at ~12.5 fps (1080p HQ frame resolution), which is sufficient to reliably track most insects…”. However, the evaluations presented in the insect tracking section (Line 320) utilize 4K HQ frames. This inconsistency raises concerns about which resolution was used for the actual classification task, as the authors mention both 1080p and 4K image synchronization throughout the manuscript. Since image resolution can significantly impact classification accuracy, it's crucial to clarify this point. If the camera trap system and its classification model were indeed tested on 4K captured (and cropped) images, reporting the processing speed for 4K frames would be more accurate and reflect the true operational speed of the system. Additionally, a brief discussion on the trade-offs between speed and resolution, and how these choices affect the monitoring of different insect species, would be valuable for readers.

Authors response: Thank you for making us aware of the possibly confusing inconsistency of both mentioning 1080p and 4K HQ frame synchronization resulting in different pipeline speeds, which in turn can influence the object tracker accuracy. Most of the images in the dataset for classification model training were cropped from 1080p frames. For the field deployment of the five camera traps, we also used 1080p HQ frame synchronization, including the real-world dataset that was used to evaluate the generalization capability of the classification model. The reason why we used 4K HQ frame synchronization for the object tracking experiment was to test the tracking accuracy under difficult conditions (see also previous response to comment 3). We tried to clarify these points by adding more information in lines 119, 211-217, 274-275, 278, 358-360, 377-378 and 635-636.

Authors response: Thanks a lot for this nice comment and useful suggestion. We agree that it makes sense to add a simple hardware schematic that gives more information about the connections between the individual components. We included this figure as “S1_Fig” in the supporting information (figure caption in lines 977-978) and refer to it in the text in lines 142-144. To account for the new numbering of the figures in supporting information, we changed lines 342, 435, 531, 596, 600, 980, 984, 988, 991 and 996 accordingly.

Authors response: Thanks for this suggestion. We added some more information about this aspect in lines 622-625. In this paragraph we are already discussing in lines 626-627 how “An updated platform design could further increase the visual attraction for a wider range of insect taxa or for specific target groups.”. As this is ongoing research, it is currently still difficult to provide more specific details.

Authors response: We agree that the accuracy (especially Recall) of the deployed insect detection model significantly affects the reliability of the estimated insect activity/abundance. We now mention this more clearly in the Introduction in lines 89-90. In the Discussion section, we now call attention to this aspect in lines 659-665.

Minor Comments

Authors response: While we understand that the evaluation metrics of the YOLO detection models and the EfficientNet-B0 classification model could have also been shown in the Results section, we think that our presented work differs to similar research articles that focus more strongly on the performance of the presented models. We see our presented models more as a baseline or example for interested users to train their own models on custom datasets. We tried to make this fact clear e.g. in lines 104-105 and 617-619. For this reason, the presentation of the model metrics on the validation/test datasets fit better in the Materials and methods section under the “Software” subsection. For unexperienced readers this also significantly enhances understanding of the whole pipeline that includes several steps from on-device detection and tracking to metadata post-processing that all build on top of each other.

Line 24: Not all traditional monitoring methods (e.g. focal flower observations or quadrat observations, transect walks) may provide data with high taxonomic resolution.

Authors response: This is correct and the reason why we purposely wrote “[…] traditional monitoring methods are widely established and can provide data with a high taxonomic resolution.” in lines 23-25. We hope that this indicates enough that the taxonomic resolution also depends on the specific method that is used.

Authors response: Thank you for mentioning these interesting research articles that are also using motion detection for camera-based insect monitoring. While there are a number of studies that use similar approaches, we wanted to keep the references as concise as possible and therefore decided to mainly cite papers that also include some form of AI-based post-processing of the resulting image/video data. Pegoraro et al. 2020 (Ref. 18) gives a broad overview of existing camera systems for pollinator monitoring, including Steen 2017.

Line 106: Please provide an appropriate reference and data on the speed of the hoverfly species.

Authors response: Unfortunately, we could not find any research articles that explicitly measured flight speed of hoverflies. The only information on hoverfly speed we could find are the following excerpts from Rotheray & Gilbert 2011: The natural History of hoverflies (ISBN 978-0-9564692-1-2):

Page 32-33: “[…] their wings have been recorded beating at between 120 and 150 beats per second, resulting in a forward speed of between 3 to 4 metres per second. To put this into context, hawkmoths (Sphingidae, Lepidoptera) beat their wings at much slower speeds of between 50 and 90 beats per second, but they fly faster at more than 5 metres per second. On the other hand, midges (Ceratopogonidae, Diptera) can beat their wings at more than 1000 times per second, but only move forward at less than half-a-metre per second.”

Page 36: “Analysis of film of hovering E. balteatus has revealed how rapidly hoverflies can accelerate. From a hovering start, acceleration begins in the first couple of wingbeats. At eight wingbeats or 40 milliseconds later, the hoverfly have moved forward about 4 cm and in another 40 milliseconds is has moved between 1-1.5 metres.”

We believe that without making a more specific statement in our manuscript, a reference is not necessary in this context. Determination of flying speed can be relative to e.g. other flower-visiting species and in this regard, most hoverflies (including Episyrphus balteatus) often fly faster than other species such as beetles, butterflies and many bees.

Line 123: Is 91 Wh the combined capacity of the two batteries?

Authors response: We added “[…] combined capacity” in line 139.

Line 127: What are the dimensions of the platform?

Authors response: We added the dimensions of the big and small platform that we were using in line 147. We also added the dimensions for the small platform shown in Fig 1 A to the figure caption in line 155.

Line 131: I suggest including the component list and associated cost values also in the supporting materials. This is because the cost provided in the manuscript may change over time.

Authors response: As the costs for the proposed components are changing frequently, we only added the approximate total costs in the manuscript text in line 150: “[…] ~700 € […]”. We will update the costs in the component list at the documentation website from time to time to show the current costs to interested readers. We do not think that it is necessary to add a component list to the manuscript as components will also change in new iterations of the camera trap and will be updated accordingly on the documentation website.

Line 158: It is unclear on what types other homogeneous backgrounds the YOLO model was tested with. Could you please clarify?

Authors response: We made this clearer by adding “[…] (e.g. variations of the artificial flower platform design).” in lines 177-178.

Line 161: Why were the metrics not calculated for the test split?

Authors response: We also calculated the metrics for the test split. These were very similar compared to the metrics for the validation split for all YOLO models (e.g. for YOLOv5n mAP@0.5: 0.964, Precision: 0.966, Recall: 0.954). We report the metrics on the validation split as this is the default output while training the model and in line with similar research articles, e.g. Bjerge et al. 2023 (Ref. 45). By open-sourcing our datasets and Google Colab notebooks that were used for model training, everyone interested can reproduce the presented metrics together with the metrics for the test split.

Line 178: Please provide references for Kalman Filter and Hungarian Algorithm.

Authors response: We added two references for object tracking with the Kalman Filter and Hungarian Algorithm in lines 198 and 861-870. We renumbered the subsequent references accordingly.

Line 191: [See Major comment 3 and 4]

Authors response: Lines 211-212 were changed accordingly to major comments 3 and 4.

Line 199: Please include the type of metadata recorded in the figure caption.

Authors response: We included the most important metadata in the caption of Fig 2 in lines 224-225.

Authors response: Thanks for this suggestion. We included more details about how the power consumption was measured in lines 231-235 and 237-238.

Line 208: Was the estimate of 20 hours calculated considering the threshold value mentioned in line 210?

Authors response: We did not consider the threshold value for calculating the total estimate of 20 hours runtime, as this can differ depending on the user settings.

Line 229: Could you please explain why YOLOv5 was used on captured images first to classification without directly using EfficientNet-B0 on captured images?

Authors response: We used a modified Python script from our YOLOv5 fork that supports several classification model architectures, including EfficientNet-B0 (https://github.com/maxsitt/yolov5/blob/master/classify/predict.py). We made this clearer by adding more information in line 262.

Line 234: Please provide a reference to the EfficientNet-B0 model.

Authors response: Thanks for finding this missing reference when we first mentioned the EfficientNet-B0 model. As we now explicitly mention the model architectures that we compared in line 291 (see also response to comment 9 of reviewer #1), we changed line 267 to be more general on the different model weights that were all pre-trained on ImageNet. We included a reference at the new first mention of EfficientNet-B0 in line 291 and deleted the previous citation in line 294. We also added a reference for ResNet-50 in line 291 and 879-881. We renumbered the subsequent references accordingly.

Authors response: Thank you for this reasonable question. It is correct that the accuracy is prioritized over inference speed due to the classification model running offline. The only reason why inference speed should also be considered, is that a higher inference speed usually lets models also run faster on suboptimal hardware (e.g. only CPU without GPU) that users might only have available. Also, the classification of large numbers of images will take less time with faster models. However, even if only accuracy is accounted for, the EfficientNet-B0 models were still superior to the YOLOv5-cls and ResNet-50 models (see S2 Table). To avoid confusion regarding this aspect, we changed lines 293-296 accordingly.

Line 313: Please change “mm” to millimeters.

Authors response: We changed “mm” to “millimeters” in line 350.

Authors response: We included the resolution and framerate of the videos that were filmed with the smartphone camera in line 363. We also included details about the reduced playback speed in line 364.

Line 364: Please explain why a threshold of 70% was used? Why not set a lower value allowing us to record more data?

Authors response: We use this threshold as a default (can be changed by the user) to optimize recording efficiency if less sunlight is available to charge the batteries and to avoid recording gaps. We added this explanation in lines 408-409 and shortened the whole sentence in line 410.

Authors response: Table 2 shows the metrics of our classification model on the dataset test split, which is in line with similar research works and can be regarded as a standard form of reporting model performance. For these reasons, we would like to keep it in the manuscript. We additionally tested the classification model on a real-world dataset to demonstrate that model performance will most probably differ in the case of non-curated image data. However, with the limited scope of this additional test, the generalization capability of the classification model can only be estimated. The shown metrics will change for other real-world datasets and a much larger dataset including more images with a higher variety of captured insect species is needed to evaluate the generalization capability of the model more rigorously. We tried to make readers aware of this fact in the Discussion section, e.g. in lines 644-649 and 692-699. Therefore, we would like to keep S3 Table in the supporting information and only show the more comprehensive Fig 8 as estimation of the generalization capability of the classification model.

Line 516: I suggest authors present the analysis of hoverfly behavior under a subsection “Example data analysis" or " Case Study”.

Authors response: Thanks for the suggestion, but we think that the presentation of data with focus on captured hoverfly images is still a good fit for the subsection “Insect monitoring data”, as we not necessarily analyzed hoverfly behavior, but only show the numbers of unique hoverfly tracking IDs (= estimated activity/abundance) over the course of the field deployment of the camera traps.

Line 579: Could you please provide any references to support the statement that the artificial flower platform can be standardized similarly to yellow pan traps. [Also see Major comment 6].

Authors response: We do not know of any reference that would support the statement that the artificial flower platform can be standardized similar to yellow pan traps. However, we do not think that a reference is needed in this context, as this is only a general statement without specific details. As response to major comment 6, we added more information regarding this aspect in lines 622-625.

Line 582: Please change the “camera trap system” to “camera trap hardware” as the software system was not evaluated for monitoring insects visiting real flowers or in outdoor settings.

Authors response: Even though we did not evaluate the provided detection models and software on the use cases mentioned in lines 629-631, we believe that it will be possible to use the same hardware and software system also for monitoring of other insects on different backgrounds, including real flowers. This will require training of a new detection model with a dataset that reflects the new environment. In lines 659-665 we now explain how the required training dataset size increases with complexity of the system that should be monitored. For this reason, we kept “camera trap system”, but added information about the required training of a new/adapted detection model and changed the phrasing of the sentence in lines 628-629.

Authors response: We are actively working on adapting the camera trap system to monitor insects visiting real flowers at the moment. This requires not only adaptation of the detection model (as indicated in e.g. lines 628, 652-657 and 659-665) but also of the hardware setup (e.g. fixing of the flower at the right distance to the camera to avoid wind movements). In our opinion we already made it clear that a deployment of the camera trap system to new environments requires specific adaptations to achieve monitoring results with a sufficient accuracy. More detailed explanations on the specific requirements are out of scope for this work and will be presented in upcoming papers about updates of the camera trap system.

Authors response: A substantial part of the software that we use on the camera trap hardware uses the DepthAI Python API (https://docs.luxonis.com/projects/api/en/latest/) that is specific for the Luxonis OAK devices and therefore difficult to implement with other IoT platforms. Our provided datasets for insect detection and classification model training can of course be used also for other purposes, but this should be clear for interested readers and does not have to be mentioned explicitly in the manuscript from our point of view. In the subsection “Future perspectives”, we already indicate that improved hardware platforms will make the presented approaches more attractive for a wider audience in lines 737-739.

Line 609, 611, 625: Studies cited in [45,46,48] can also be discussed in the introduction section.

Authors response: We wanted to keep the introduction as brief and concise as possible to make it easier to read and more interesting also for readers that are new to the field of automated insect monitoring. In our opinion these studies are a better fit in the Discussion section after the reader already got a grasp of the potential difficulties of automated visual insect monitoring.

Authors response: Thanks for the suggestion. We added Ratnayake et al. 2021 to the references in lines 659 and 913-915. We renumbered the subsequent references accordingly.

Authors response: Thanks for making us aware of this very interesting work. However, this approach still seems to be in an early stage and we can see many potential difficulties when trying to implement something similar for our proposed system (e.g. this will probably only work for a specific set of insect species). We think that this approach is out of scope for our manuscript but will keep a close watch on further developments in this field that could be implemented in the future.

Fig 2.: Please label the purple line from Script node to cropped detections.

Authors response: To keep the labeling consistent, we labeled the purple line (HQ frame) going into the script node in the new version of Fig 2. It should be clear that the colored lines going out of the script node represent the synchronized inputs.

Fig 4: Please indicate the start and end of the recording period.

Authors response: We included information about the start and end of the recording at the two power spikes in the figure caption for Fig 4 in lines 247-248.

Fig 6 and Fig 8: Please label the color bar.

Authors response: We do not think that it is necessary to label the color bar representing the proportion of images that were classified to a predicted class to the total number of images per true class for the normalized confusion matrix plots shown in Fig 6 and Fig 8. We have not seen a normalized confusion matrix with labeled color bar in any similar research works. To make the content of the cell labels clearer, we added an explanation in the figure caption for Fig 6 in lines 303-304 and for Fig 8 in lines 469-471.

Authors response: We agree that the suggested changes could simplify the plot axis labels of Fig 7. However, we think that our current labels describe the underlying data more precisely and could avoid potential misinterpretation compared to the suggested labels.

Authors response: In our opinion, Fig 11 shows an important characteristic of the camera trap, namely the charge level of the PiJuice battery depending on the duration of sunshine available per day to charge the batteries. While it is correct that the plot in Fig 11 F does not give an overview of all camera traps during the whole period, as some were deployed later in the season, we still think that it gives valuable information to the reader without misleading the interpretation of the merged charge level data and mean of the sunshine duration at the deployment sites. For these reasons, we would like to keep Fig 11 F. We got official data from the German Meteorological Service (“Deutscher Wetterdienst”) on sunshine duration, measured at the nearest weather station for each deployment site. The only information that we could find on their website, is that the sunshine duration is measured with “opto-electronic sensors” (German source: https://www.dwd.de/DE/fachnutzer/landwirtschaft/dokumentationen/allgemein/basis_sonnenscheindauer_doku.html). As we are using a credible official source for this data, we do not think that a detailed explanation of how the sunshine was measured is required in this context. We purposely decreased the scale for the secondary y-axis to enhance the readability of the plots, as both the lines for charge level and sunshine duration would overlap if both axes would be scaled to the same size, which would significantly impair readability. As suggested, we added a legend to the new version of Fig 11 in plot Fig 11 A and in plot Fig 11 F, describing what each line is representing in the plots. To enhance readability, we also moved the y-axis label to Fig 11 A (previously Fig 11 C) and removed the y-axis label from Fig 11 E.

Fig.12: Please make all y axis scales 12A-12E same to enable easy comparison of data across camera traps.

Authors response: We agree that the same axis scale is important for many examples where the same kind of data should be compared between e.g. different sites/methods/times. However, we do not think that scaling all y-axes in Fig 12 A-E the same would enhance interpretability in this case. In lines 406-416, we make it clear that several factors influenced the image capture for each camera trap in our presented data. Especially the significant differences in the total recording time (shown in Table 3) make it impossible to directly compare the number of unique tracking IDs per class (= estimation of abundance/activity) between the camera traps. Especially for Fig 12, we believe that the distribution of the captured classes gives the most interesting information for the readers. And this distribution is better illustrated if the individual plots are not scaled to the same size, which would significantly impair readability regarding class distribution for all plots except Fig 12 B and Fig 12 F. For these reasons, we would like to keep the scaling of the y-axes in Fig 12.

Authors response: As the reviewer is suggesting, we already normalized the number of unique tracking IDs by the recording time for Fig 15 and explain this in lines 575-577. We still think it is interesting to see the recording time, shown on the secondary y-axis in Fig 14, in relation to the total number of unique tracking IDs for all camera traps. For similar reasons as already mentioned in the previous response to the comment on Fig 12, we think that scaling the y-axes the same is not necessarily required in this case. Enabling an easier readability for each individual camera trap regarding the peaks of captured hoverfly tracking IDs in Fig 14 is more important for the readers in our opinion. For this reason, we would like to keep the scaling of the y-axes in Fig 14.

Fig 15: As the discussion does not extensively analyze the relationship between rainfall and hoverfly activity, this Figure can be removed.

Authors response: As mentioned in the previous response to the comment on Fig 14, we show the normalized number of unique tracking IDs (per hour of recording) in Fig 15. We added the amount of rainfall in the secondary y-axis to include a potential influencing factor on hoverfly activity, that could be easily shown in the plot. We indicate the potential influence of rainfall on hoverfly activity in lines 584-585. For the scope of this work, we do not think that it is necessary to extensively analyze this potential relationship in the Discussion section.

Attachment

Submitted filename: Response_to_reviewers.docx

pone.0295474.s010.docx^{(47.9KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0295474.r003

Decision Letter 1

Ramzi Mansour

13 Feb 2024

PONE-D-23-38812R1Insect Detect: An open-source DIY camera trap for automated insect monitoringPLOS ONE

Dear Dr. Sittinger,

Please submit your revised manuscript by Mar 29 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Ramzi Mansour

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: N/A

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #1:

The paper titled "Insect Detect: An Open-Source DIY Camera Trap for Automated Insect Monitoring" represents a commendable effort, offering a hardware solution for insect counting in natural environments and their classification into taxonomic orders, thereby providing significant value. This contribution to the scientific community, encompassing both the hardware solution and the open-source code, underscores its importance. However, regrettably, the text itself presents challenges in readability. It's not the language per se, but rather the overall structure and organization of section headings that can be perplexing. In my view, the authors should strongly consider revising some section headings and streamlining certain portions of the paper to enhance its accessibility and, consequently, its impact as a valuable scientific resource. Below, I outline specific questions regarding these aspects.

• Figure 5 could benefit from improvement. It's unclear why there are x and y dimensions labeled for all images, especially considering they are of different sizes. Shouldn't the images be resized to a consistent size? Removing the axes would free up space, allowing for slightly larger images and thus improving visibility.

• In the paragraph beginning at line 285, you mention that the dataset is divided into 80% for training, 10% for validation, and 10% for testing. Could you clarify how this division is achieved? Is it done randomly or as 80%, 10%, 10% for each class?

• In the paragraph starting at line 285, you mention that you are utilizing the YOLOv5 classifier. Could you provide insights into any differences between training a typical ResNet50 versus a YOLOv5 ResNet50? Your commentary on this topic would be greatly appreciated.

• In the paragraph starting at line 285, you mention using an image size of 128x128 for insect classification and 320x320 for initial insect detection. While the results are promising at these resolutions, could you comment on whether using a larger image size would further improve results? Additionally, considering hardware constraints, especially if this is not being done on the Pi-module, how does this impact your choice of image size?

• I find the section headings confusing, particularly regarding the processing pipeline. Starting with section 2.2 on software, why is insect detection (2.2.1) not within the processing pipeline (2.2.2)? In my opinion, the software section should be renamed "Processing Pipeline," or better yet, remove the hardware and software distinctions entirely and replace them with more thematically fitting headings, such as 2.1. Pi-module - Insect Detect and 2.2. Processing Pipeline.

• In line 183, Table 1 there are results for insect detection in the methods section. Shouldn't these results be in the results section instead?

Reviewer #2:

This paper introduces an automated, do-it-yourself camera trap system for studying flower-visiting insects. The system comprises two key components: a real-time camera equipped with a deep learning-based object detector. This detector identifies and captures images of insects landing on an artificial platform. Additionally, an insect classification model analyzes the captured images to identify the species of each insect.

The authors have adequately addressed the comments and concerns raised in the previous review, incorporating the necessary changes into the manuscript. I encourage the authors to continue updating and maintaining the documentation and software associated with this research, as it provides a valuable tool for researchers and citizen scientists engaged in insect monitoring studies.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

PLoS One. 2024 Apr 3;19(4):e0295474. doi: 10.1371/journal.pone.0295474.r004

Author response to Decision Letter 1

26 Feb 2024

Reviewer #1:

Comments to the Authors:

1. Figure 5 could benefit from improvement. It's unclear why there are x and y dimensions labeled for all images, especially considering they are of different sizes. Shouldn't the images be resized to a consistent size? Removing the axes would free up space, allowing for slightly larger images and thus improving visibility.

Authors response: Thank you for your suggestion. The individual images in Fig 5 show an example of the respective class from the dataset for classification model training. It is correct that all images are resized to the same size during classification model training and inference (in our model 128x128 pixel). However, all images shown in Fig 5 were automatically captured with the camera trap by cropping the detections (bounding box area) from synchronized HQ frames, which results in different image sizes, depending on the size of the bounding box (e.g. smaller for “ant”, bigger for “lepi”). To improve the clarity and comprehensibility of Fig 5, we decided to resize all example images to the same dimension. To still give the readers the important information about the original image sizes (as captured by the camera trap), we added x- and y-axis labels representing the original pixel values. We believe that the importance of this information outweighs the improved visibility of slightly larger images without the axes.

2. In the paragraph beginning at line 285, you mention that the dataset is divided into 80% for training, 10% for validation, and 10% for testing. Could you clarify how this division is achieved? Is it done randomly or as 80%, 10%, 10% for each class?

Authors response: Thank you for finding this important missing information. We added “[…] randomly […]” in line 296 to make it clear that the dataset was split randomly. This still means that all images of each class are split into the train (70%), validation (20%) and test (10%) subsets with the same ratio. However, as we are dealing with an imbalanced dataset, this results in different numbers of images per class in each subset (see also Table 2 for the test dataset). As this information was also missing for the dataset for detection model training, we added “[…] randomly […]” also in line 182.

3. In the paragraph starting at line 285, you mention that you are utilizing the YOLOv5 classifier. Could you provide insights into any differences between training a typical ResNet50 versus a YOLOv5 ResNet50? Your commentary on this topic would be greatly appreciated.

Authors response: This is indeed an interesting point. In fact, we only use a slightly modified version of the Python script for classification model training from the YOLOv5 repository (https://github.com/maxsitt/yolov5/blob/master/classify/train.py) and not a specific “YOLOv5 classifier”. The ResNet50 and EfficientNet-B0 models are taken from the torchvision.models subpackage and were pre-trained on the ImageNet dataset (more info here: https://github.com/ultralytics/yolov5/tree/master#classification under “Classification Checkpoints”). In theory, the YOLOv5 script for classification model training supports all available PyTorch implementations of the (optionally pre-trained) classification model architectures from torchvision.models (see here: https://pytorch.org/vision/stable/models.html#classification). As a more thorough comparison of many different classification model architectures was out of scope for this manuscript, we only focused on some of the pre-trained weights published together with the YOLOv5 v6.2 release (https://github.com/ultralytics/yolov5/releases/v6.2). We are currently working on optimizing this classification model training and inference workflow by comparing more architectures. Results from these tests will be published in an upcoming paper. To come back to the reviewer’s question: the ResNet50 and EfficientNet-B0 are the official PyTorch implementations of both model architectures, available in the torchvision package and pre-trained on the ImageNet dataset. To avoid overcomplicating this topic, we are not including this additional information in the manuscript as it would probably only confuse most of the (non-professional) readers and would not add important information to our already described workflow.

4. In the paragraph starting at line 285, you mention using an image size of 128x128 for insect classification and 320x320 for initial insect detection. While the results are promising at these resolutions, could you comment on whether using a larger image size would further improve results? Additionally, considering hardware constraints, especially if this is not being done on the Pi-module, how does this impact your choice of image size?

Authors response: Thank you for this question. First, regarding the resolution of 320x320 pixel for insect detection (and tracking) running on the device in real time: We explain in lines 203-205 that this downscaled resolution (LQ frames) increases the inference speed of the detection model and thereby the accuracy of the object tracker. We tested different resolutions for our use case and 320x320 pixel resulted in the optimal balance between accuracy and speed. Depending on the specific use case (background, distance from camera to object, object size, object speed etc.) the optimal resolution could be lower or higher than what we are using. We mention multiple times in the Introduction and Discussion that users can train models on their custom datasets that are more adapted to their specific use case. By providing detailed documentation and Google Colab notebooks we enable also non-professional users to train their custom models, optionally on a different image resolution that is better suited for their use case (e.g. even lower resolution to increase the model’s inference speed). Second, regarding the resolution of 128x128 pixel for insect classification: we tested different image resolutions (see also S2 Table) for classification model training/inference. Resizing the insect images (= cropped detections) to a resolution of 128x128 pixel means that most of the images in our training dataset were upscaled. This led to better results compared to downscaling more of the images. In the Google Colab notebook for classification model training, we give users the option to calculate the metrics of their custom image dataset (https://colab.research.google.com/github/maxsitt/insect-detect-ml/blob/main/notebooks/YOLOv5_classification_training.ipynb#scrollTo=aVZgGZtc8skH). We recommend using the 90th percentile of the image sizes (divisible by 32) as reference point to set the image size for model training (and later inference). However, this is only an estimated reference point and it is still required to test different resolutions to find the optimal model when training on a new custom dataset. A different aspect is the HQ frame resolution that is used for cropping the detections. We explain in lines 217-225 that a higher resolution can increase classification accuracy but can also decrease tracking accuracy as the inference speed of the detection model is reduced.

5. I find the section headings confusing, particularly regarding the processing pipeline. Starting with section 2.2 on software, why is insect detection (2.2.1) not within the processing pipeline (2.2.2)? In my opinion, the software section should be renamed "Processing Pipeline," or better yet, remove the hardware and software distinctions entirely and replace them with more thematically fitting headings, such as 2.1. Pimodule - Insect Detect and 2.2. Processing Pipeline.

Authors response: Thank you for your suggestion. We built the entire workflow (from on-device detection to metadata post-processing) in a modular way, so that users can change their specific settings or deployed models at different stages of this workflow, without affecting previous/subsequent steps. For example, own custom models can be used for the insect detection without the need to change the on-device processing pipeline. Also, different classification models/workflows can be used for insect images captured with the camera trap. It is also possible to use the classification model/workflow on other images that were not necessarily captured with our proposed camera trap system. To indicate these different “workflow modules”, we used a subsection header for each step. We added “On-device […]” to the subsection headings in line 167 and 201 to clarify the distinction to the subsequent classification and metadata post-processing that is not happening on the camera trap.

6. In line 183, Table 1 there are results for insect detection in the methods section. Shouldn't these results be in the results section instead?

Authors response: Thank you for your suggestion. While the metrics of the detection models (and the classification model) are more often shown in the Results section in similar research articles, we focus more on non-professional readers and best possible clarity and accessibility. By already showing the model metrics in the respective subsections of the Materials and methods section, potential users are guided through the whole workflow in a more logical and cohesive way, without having to jump between sections. From our point of view, it is also easier to understand the subsequent processing steps if information about the model performance is given before explaining the on-device processing pipeline in more detail. In contrast to similar research works, we do not focus so strongly on the performance of the presented models and see them more as a baseline or example for interested user to train their own models on custom datasets. While we agree that this is probably a more unconventional way to present model metrics, we would like to keep them in the Materials and methods section for the mentioned reasons of increased accessibility.

Reviewer #2:

Authors response: Thank you for your comment! We will continuously update and maintain the software and documentation to implement new features, further improve the detection and classification models and keep all requirements up to date.

Attachment

Submitted filename: Response_to_reviewers.docx

pone.0295474.s011.docx^{(23.7KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0295474.r005

Decision Letter 2

Ramzi Mansour

29 Feb 2024

Insect Detect: An open-source DIY camera trap for automated insect monitoring

PONE-D-23-38812R2

Dear Dr. Sittinger,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Ramzi Mansour

Academic Editor

PLOS ONE

PLoS One. doi: 10.1371/journal.pone.0295474.r006

Acceptance letter

Ramzi Mansour

12 Mar 2024

PONE-D-23-38812R2

PLOS ONE

Dear Dr. Sittinger,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Ramzi Mansour

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Table. Description of the 27 classes from the image dataset that was used to train the insect classification model.

(DOCX)

pone.0295474.s001.docx^{(15.2KB, docx)}

S2 Table. Comparison of different classification model architectures and hyperparameter settings supported by YOLOv5 classification model training.

(DOCX)

pone.0295474.s002.docx^{(19KB, docx)}

S3 Table. Metrics of the EfficientNet-B0 insect classification model, validated on a real-world dataset.

(DOCX)

pone.0295474.s003.docx^{(17KB, docx)}

S1 Fig. Hardware schematic of the electronic camera trap components.

Nominal voltage is shown for the PiJuice 12,000 mAh battery.

(TIFF)

pone.0295474.s004.tiff^{(498.3KB, tiff)}

S2 Fig. Examples for limitations in the detection and tracking accuracy.

(TIFF)

pone.0295474.s005.tiff^{(5.5MB, tiff)}

S3 Fig. Total number of unique tracking IDs for each predicted class.

Merged data from all five camera traps deployed from mid-May to mid-September 2023. All tracking IDs with less than three or more than 1,800 images were removed.

(TIFF)

pone.0295474.s006.tiff^{(436.8KB, tiff)}

S4 Fig. Total recording time per time of day.

Merged data from all five camera traps deployed from mid-May to mid-September 2023.

(TIFF)

pone.0295474.s007.tiff^{(304KB, tiff)}

S5 Fig. Estimated bee activity (number of unique tracking IDs per hour) per time of day.

(TIFF)

pone.0295474.s008.tiff^{(271.1KB, tiff)}

S6 Fig. Estimated fly activity (number of unique tracking IDs per hour) per time of day.

(TIFF)

pone.0295474.s009.tiff^{(293.4KB, tiff)}

Attachment

Submitted filename: Response_to_reviewers.docx

pone.0295474.s010.docx^{(47.9KB, docx)}

Attachment

Submitted filename: Response_to_reviewers.docx

pone.0295474.s011.docx^{(23.7KB, docx)}

Data Availability Statement

[pone.0295474.ref001] 1.Hallmann CA, Sorg M, Jongejans E, Siepel H, Hofland N, Schwan H, et al. More than 75 percent decline over 27 years in total flying insect biomass in protected areas. PLoS ONE. 2017; 12(10): e0185809. doi: 10.1371/journal.pone.0185809 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295474.ref002] 2.Seibold S, Gossner MM, Simons NK, Blüthgen N, Müller J, Ambarlı D, et al. Arthropod decline in grasslands and forests is associated with landscape-level drivers. Nature. 2019; 574: 671–674. doi: 10.1038/s41586-019-1684-3 [DOI] [PubMed] [Google Scholar]

[pone.0295474.ref003] 3.Wagner DL. Insect Declines in the Anthropocene. Annu Rev Entomol. 2020; 65: 457–480. doi: 10.1146/annurev-ento-011019-025151 [DOI] [PubMed] [Google Scholar]

[pone.0295474.ref004] 4.Samways MJ, Barton PS, Birkhofer K, Chichorro F, Deacon C, Fartmann T, et al. Solutions for humanity on how to conserve insects. Biological Conservation. 2020; 242: 108427. doi: 10.1016/j.biocon.2020.108427 [DOI] [Google Scholar]

[pone.0295474.ref005] 5.van Klink R, Bowler DE, Gongalsky KB, Swengel AB, Gentile A, Chase JM. Meta-analysis reveals declines in terrestrial but increases in freshwater insect abundances. Science. 2020; 368: 417–420. doi: 10.1126/science.aax9931 [DOI] [PubMed] [Google Scholar]

[pone.0295474.ref006] 6.Breeze TD, Bailey AP, Balcombe KG, Brereton T, Comont R, Edwards M, et al. Pollinator monitoring more than pays for itself. J Appl Ecol. 2021; 58: 44–57. doi: 10.1111/1365-2664.13755 [DOI] [Google Scholar]

[pone.0295474.ref007] 7.Besson M, Alison J, Bjerge K, Gorochowski TE, Høye TT, Jucker T, et al. Towards the fully automated monitoring of ecological communities. Ecology Letters. 2022; 25: 2753–2775. doi: 10.1111/ele.14123 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295474.ref008] 8.Kühl HS, Bowler DE, Bösch L, Bruelheide H, Dauber J, Eichenberg David, et al. Effective Biodiversity Monitoring Needs a Culture of Integration. One Earth. 2020; 3: 462–474. doi: 10.1016/j.oneear.2020.09.010 [DOI] [Google Scholar]

[pone.0295474.ref009] 9.van Klink R, August T, Bas Y, Bodesheim P, Bonn A, Fossøy F, et al. Emerging technologies revolutionise insect ecology and monitoring. Trends in Ecology & Evolution. 2022; 37(10): 872–885. doi: 10.1016/j.tree.2022.06.001 [DOI] [PubMed] [Google Scholar]

[pone.0295474.ref010] 10.Kawakita S, Ichikawa K. Automated classification of bees and hornet using acoustic analysis of their flight sounds. Apidologie. 2019; 50: 71–79. doi: 10.1007/s13592-018-0619-6 [DOI] [Google Scholar]

[pone.0295474.ref011] 11.Potamitis I, Rigakis I, Fysarakis K. Insect Biometrics: Optoacoustic Signal Processing and Its Applications to Remote Monitoring of McPhail Type Traps. PLoS ONE. 2015; 10(11): e0140474. doi: 10.1371/journal.pone.0140474 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295474.ref012] 12.Rydhmer K, Bick E, Still L, Strand A, Luciano R, Helmreich S, et al. Automating insect monitoring using unsupervised near-infrared sensors. Sci Rep. 2022; 12: 2603. doi: 10.1038/s41598-022-06439-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295474.ref013] 13.Parmezan ARS, Souza VMA, Seth A, Žliobaitė I, Batista GEAPA. Hierarchical classification of pollinating flying insects under changing environments. Ecological Informatics. 2022; 70: 101751. doi: 10.1016/j.ecoinf.2022.101751 [DOI] [Google Scholar]

[pone.0295474.ref014] 14.Droissart V, Azandi L, Onguene ER, Savignac M, Smith TB, Deblauwe V. PICT: A low‐cost, modular, open-source camera trap system to study plant-insect interactions. Methods Ecol Evol. 2021; 12: 1389–1396. doi: 10.1111/2041-210X.13618 [DOI] [Google Scholar]

[pone.0295474.ref015] 15.Geissmann Q, Abram PK, Wu D, Haney CH, Carrillo J. Sticky Pi is a high-frequency smart trap that enables the study of insect circadian activity under natural conditions. PLoS Biol. 2022; 20(7): e3001689. doi: 10.1371/journal.pbio.3001689 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295474.ref016] 16.Bjerge K, Nielsen JB, Sepstrup MV, Helsing-Nielsen F, Høye TT. An Automated Light Trap to Monitor Moths (Lepidoptera) Using Computer Vision-Based Tracking and Deep Learning. Sensors. 2021; 21: 343. doi: 10.3390/s21020343 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295474.ref017] 17.Bjerge K, Mann HMR, Høye TT. Real-time insect tracking and monitoring with computer vision and deep learning. Remote Sens Ecol Conserv. 2021; rse2.245. doi: 10.1002/rse2.245 [DOI] [Google Scholar]

[pone.0295474.ref018] 18.Pegoraro L, Hidalgo O, Leitch IJ, Pellicer J, Barlow SE. Automated video monitoring of insect pollinators in the field. Emerging Topics in Life Sciences. 2020; 4: 87–97. doi: 10.1042/ETLS20190074 [DOI] [PubMed] [Google Scholar]

[pone.0295474.ref019] 19.Høye TT, Ärje J, Bjerge K, Hansen OLP, Iosifidis A, Leese F, et al. Deep learning and computer vision will transform entomology. Proc Natl Acad Sci USA. 2021; 118: e2002545117. doi: 10.1073/pnas.2002545117 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295474.ref020] 20.Darras KFA, Balle M, Xu W, Yan Y, Zakka VG, Toledo-Hernández M, et al. Eyes on nature: Embedded vision cameras for multidisciplinary biodiversity monitoring. BioRxiv [Preprint]. 2023. bioRxiv 2023.07.26.550656 [posted 2023 Jul 29; cited 2023 Nov 21]. Available from: https://www.biorxiv.org/content/10.1101/2023.07.26.550656v1 [Google Scholar]

[pone.0295474.ref021] 21.Pichler M, Hartig F. Machine learning and deep learning—A review for ecologists. Methods Ecol Evol. 2023; 14: 994–1016. doi: 10.1111/2041-210X.14061 [DOI] [Google Scholar]

[pone.0295474.ref022] 22.Wäldchen J, Mäder P. Machine learning for image based species identification. Methods Ecol Evol. 2018; 9: 2216–2225. doi: 10.1111/2041-210X.13075 [DOI] [Google Scholar]

[pone.0295474.ref023] 23.Borowiec ML, Dikow RB, Frandsen PB, McKeeken A, Valentini G, White AE. Deep learning as a tool for ecology and evolution. Methods Ecol Evol. 2022; 13: 1640–1660. doi: 10.1111/2041-210X.13901 [DOI] [Google Scholar]

[pone.0295474.ref024] 24.Tuia D, Kellenberger B, Beery S, Costelloe BR, Zuffi S, Risse B, et al. Perspectives in machine learning for wildlife conservation. Nat Commun. 2022; 13: 792. doi: 10.1038/s41467-022-27980-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295474.ref025] 25.Ärje J, Raitoharju J, Iosifidis A, Tirronen V, Meissner K, Gabbouj M, et al. Human experts vs. machines in taxa recognition. Signal Processing: Image Communication. 2020; 87: 115917. doi: 10.1016/j.image.2020.115917 [DOI] [Google Scholar]

[pone.0295474.ref026] 26.Sittinger M. Insect Detect Docs—Documentation website for the Insect Detect DIY camera trap system. 2023. [cited 2023 Nov 21]. Available from: https://maxsitt.github.io/insect-detect-docs/ [Google Scholar]

[pone.0295474.ref027] 27.Sittinger M. Insect Detect—Software for automated insect monitoring with a DIY camera trap system. 2023. [cited 2023 Nov 21]. Available from: https://github.com/maxsitt/insect-detect [Google Scholar]

[pone.0295474.ref028] 28.Jocher G. YOLOv5 by Ultralytics (Version 7.0). 2022. [cited 2023 Nov 21]. Available from: https://github.com/ultralytics/yolov5 [Google Scholar]

[pone.0295474.ref029] 29.Li C, Li L, Jiang H, Weng K, Geng Y, Li L, et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv:2209.02976v1. 2022. [cited 2023 Nov 21]. Available from: http://arxiv.org/abs/2209.02976 doi: 10.48550/arXiv.2209.02976 [DOI] [Google Scholar]

[pone.0295474.ref030] 30.Wang C-Y, Bochkovskiy A, Liao H-YM. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv:2207.02696v1. 2022. [cited 2023 Nov 21]. Available from: http://arxiv.org/abs/2207.02696 doi: 10.48550/arXiv.2207.02696 [DOI] [Google Scholar]

[pone.0295474.ref031] 31.Jocher G, Chaurasia A, Qiu J. YOLO by Ultralytics (Version 8.0.0). 2023. [cited 2023 Nov 21]. Available from: https://github.com/ultralytics/ultralytics [Google Scholar]

[pone.0295474.ref032] 32.Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, et al. Microsoft COCO: Common Objects in Context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T, editors. Computer Vision–ECCV 2014. Lecture Notes in Computer Science , vol 8693. Springer, Cham. 2014. pp. 740–755. doi: 10.1007/978-3-319-10602-1_48 [DOI] [Google Scholar]

[pone.0295474.ref033] 33.Sittinger M. Image dataset for training of an insect detection model for the Insect Detect DIY camera trap. 2023. [cited 2023 Nov 21]. Database: Zenodo [Internet]. Available from: https://zenodo.org/records/7725941 doi: 10.5281/zenodo.7725941 [DOI] [Google Scholar]

[pone.0295474.ref034] 34.Sittinger M. Insect Detect ML—Software for classification of images and analysis of metadata from a DIY camera trap system. 2023. [cited 2023 Nov 21]. Available from: https://github.com/maxsitt/insect-detect-ml [Google Scholar]

[pone.0295474.ref035] 35.Sahbani B, Adiprawita W. Kalman filter and Iterative-Hungarian Algorithm implementation for low complexity point tracking as part of fast multiple object tracking system. 2016 6th International Conference on System Engineering and Technology (ICSET). Bandung, Indonesia: IEEE; 2016. pp. 109–115. doi: 10.1109/ICSEngT.2016.7849633 [DOI] [Google Scholar]

[pone.0295474.ref036] 36.Arun Kumar NP, Laxmanan R, Ram Kumar S, Srinidh V, Ramanathan R. Performance Study of Multi-target Tracking Using Kalman Filter and Hungarian Algorithm. In: Thampi SM, Wang G, Rawat DB, Ko R, Fan C-I, editors. Security in Computing and Communications. SSCC 2020. Communications in Computer and Information Science, vol 1364. Springer, Singapore. 2021. pp. 213–227. doi: 10.1007/978-981-16-0422-5_15 [DOI] [Google Scholar]

[pone.0295474.ref037] 37.Sittinger M. Custom YOLOv5 fork for the Insect Detect DIY camera trap. 2023. [cited 2023 Nov 21]. Available from: https://github.com/maxsitt/yolov5 [Google Scholar]

[pone.0295474.ref038] 38.Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL: IEEE; 2009. pp. 248–255. doi: 10.1109/CVPR.2009.5206848 [DOI] [Google Scholar]

[pone.0295474.ref039] 39.Sittinger M, Uhler J, Pink M. Insect Detect—insect classification dataset v2. 2023. [cited 2023 Nov 21]. Database: Zenodo [Internet]. Available from: https://zenodo.org/records/8325384 doi: 10.5281/zenodo.8325384 [DOI] [Google Scholar]

[pone.0295474.ref040] 40.He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. arXiv:1512.03385v1. 2015. [cited 2024 Jan 23]. Available from: http://arxiv.org/abs/1512.03385 doi: 10.48550/arXiv.1512.03385 [DOI] [Google Scholar]

[pone.0295474.ref041] 41.Tan M, Le QV. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv:1905.11946v5. 2020. [cited 2023 Nov 21]. Available from: http://arxiv.org/abs/1905.11946 doi: 10.48550/arXiv.1905.11946 [DOI] [Google Scholar]

[pone.0295474.ref042] 42.R Core Team. R: A Language and Environment for Statistical Computing. 2023. [cited 2023 Nov 21]. R Foundation for Statistical Computing, Vienna, Austria. Available from: https://www.R-project.org/ [Google Scholar]

[pone.0295474.ref043] 43.Wickham H, Averick M, Bryan J, Chang W, McGowan L, François R, et al. Welcome to the Tidyverse. JOSS. 2019; 4(43): 1686. doi: 10.21105/joss.01686 [DOI] [Google Scholar]

[pone.0295474.ref044] 44.Pedersen T. patchwork: The Composer of Plots. 2023. [cited 2023 Nov 21]. Available from: https://CRAN.R-project.org/package=patchwork [Google Scholar]

[pone.0295474.ref045] 45.Garnier S, Ross N, Rudis R, Camargo AP, Sciaini M, Scherer C. viridis(Lite)—Colorblind-Friendly Color Maps for R. 2023. [cited 2023 Nov 21]. Available from: https://CRAN.R-project.org/package=viridis [Google Scholar]

[pone.0295474.ref046] 46.Sittinger M. R scripts and data for the paper "Insect Detect: An open-source DIY camera trap for automated insect monitoring". 2023. [cited 2023 Nov 21]. Database: Zenodo [Internet]. Available from: https://zenodo.org/records/10171524 doi: 10.5281/zenodo.10171524 [DOI] [PubMed] [Google Scholar]

[pone.0295474.ref047] 47.Bodesheim P, Blunk J, Körschens M, Brust C-A, Käding C, Denzler J. Pre-trained models are not enough: active and lifelong learning is important for long-term visual monitoring of mammals in biodiversity research—Individual identification and attribute prediction with image features from deep neural networks and decoupled decision models applied to elephants and great apes. Mamm Biol. 2022; 102: 875–897. doi: 10.1007/s42991-022-00224-8 [DOI] [Google Scholar]

[pone.0295474.ref048] 48.Bjerge K, Alison J, Dyrmann M, Frigaard CE, Mann HMR, Høye TT. Accurate detection and identification of insects from camera trap images with deep learning. Fang W-T, editor. PLOS Sustain Transform. 2023; 2: e0000051. doi: 10.1371/journal.pstr.0000051 [DOI] [Google Scholar]

[pone.0295474.ref049] 49.Bjerge K, Frigaard CE, Karstoft H. Object Detection of Small Insects in Time-Lapse Camera Recordings. Sensors. 2023; 23: 7242. doi: 10.3390/s23167242 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295474.ref050] 50.Stark T, Ştefan V, Wurm M, Spanier R, Taubenböck H, Knight TM. YOLO object detection models can locate and classify broad groups of flower-visiting arthropods in images. Sci Rep. 2023; 13: 16364. doi: 10.1038/s41598-023-43482-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295474.ref051] 51.Ratnayake MN, Dyer AG, Dorin A. Tracking individual honeybees among wildflower clusters with computer vision-facilitated pollinator monitoring. PLoS ONE. 2021; 16(2): e0239504. doi: 10.1371/journal.pone.0239504 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295474.ref052] 52.Ratnayake MN, Amarathunga DC, Zaman A, Dyer AG, Dorin A. Spatial Monitoring and Insect Behavioural Analysis Using Computer Vision for Precision Pollination. Int J Comput Vis. 2022; 131: 591–606. doi: 10.1007/s11263-022-01715-4 [DOI] [Google Scholar]

[pone.0295474.ref053] 53.Bjerge K, Geissmann Q, Alison J, Mann HMR, Høye TT, Dyrmann M, et al. Hierarchical classification of insects with multitask learning and anomaly detection. Ecological Informatics. 2023; 77: 102278. doi: 10.1016/j.ecoinf.2023.102278 [DOI] [Google Scholar]

[pone.0295474.ref054] 54.Badirli S, Picard CJ, Mohler G, Richert F, Akata Z, Dundar M. Classifying the unknown: Insect identification with deep hierarchical Bayesian learning. Methods Ecol Evol. 2023; 14: 1515–1530. doi: 10.1111/2041-210X.14104 [DOI] [Google Scholar]

[pone.0295474.ref055] 55.Tschaikner M, Brandt D, Schmidt H, Bießmann F, Chiaburu T, Schrimpf I, et al. Multisensor data fusion for automatized insect monitoring (KInsecta). In: Neale CM, Maltese A, editors. Proc. SPIE 12727, Remote Sensing for Agriculture, Ecosystems, and Hydrology XXV, 1272702. 2023. doi: 10.1117/12.2679927 [DOI] [Google Scholar]

[pone.0295474.ref056] 56.Rodríguez-Gasol N, Alins G, Veronesi ER, Wratten S. The ecology of predatory hoverflies as ecosystem-service providers in agricultural systems. Biological Control. 2020; 151: 104405. doi: 10.1016/j.biocontrol.2020.104405 [DOI] [Google Scholar]

[pone.0295474.ref057] 57.Forrest JR. Complex responses of insect phenology to climate change. Current Opinion in Insect Science. 2016; 17: 49–54. doi: 10.1016/j.cois.2016.07.002 [DOI] [PubMed] [Google Scholar]

[pone.0295474.ref058] 58.Karbassioon A, Stanley DA. Exploring relationships between time of day and pollinator activity in the context of pesticide use. Basic and Applied Ecology. 2023; 72: 74–81. doi: 10.1016/j.baae.2023.06.001 [DOI] [Google Scholar]

[pone.0295474.ref059] 59.Preti M, Verheggen F, Angeli S. Insect pest monitoring with camera-equipped traps: strengths and limitations. J Pest Sci. 2021; 94: 203–217. doi: 10.1007/s10340-020-01309-4 [DOI] [Google Scholar]

[pone.0295474.ref060] 60.Teixeira AC, Ribeiro J, Morais R, Sousa JJ, Cunha A. A Systematic Review on Automatic Insect Detection Using Deep Learning. Agriculture. 2023; 13: 713. doi: 10.3390/agriculture13030713 [DOI] [Google Scholar]

[pone.0295474.ref061] 61.Reyserhove L, Norton B, Desmet P. Best Practices for Managing and Publishing Camera Trap Data. Community review draft. 2023. [cited 2023 Nov 21]. Available from: https://docs.gbif-uat.org/camera-trap-guide/en/ doi: 10.35035/doc-0qzp-2x37 [DOI] [Google Scholar]

[pone.0295474.ref062] 62.van Klink R. Delivering on a promise: Futureproofing automated insect monitoring methods. EcoEvoRxiv [Preprint]. 2023. [posted 2023 Sep 01; revised 2023 Oct 26; cited 2023 Nov 21]. Available from: https://ecoevorxiv.org/repository/view/5888/ [Google Scholar]

[pone.0295474.ref063] 63.Kitzes J, Schricker L. The Necessity, Promise and Challenge of Automated Biodiversity Surveys. Envir Conserv. 2019; 46: 247–250. doi: 10.1017/S0376892919000146 [DOI] [Google Scholar]

PERMALINK

Insect detect: An open-source DIY camera trap for automated insect monitoring

Maximilian Sittinger

Johannes Uhler

Maximilian Pink

Annette Herz

Roles

Abstract

Introduction

Materials and methods

Hardware

Fig 1. Camera trap design.

Software

On-device insect detection

Table 1. Metrics of the YOLO insect detection models.

On-device processing pipeline

Fig 2. Diagram of the processing pipeline.

Fig 3. LQ frame and synced HQ frame with cropped detection.

Fig 4. Power consumption of the camera trap system.

Insect classification

Fig 5. Example images of the 27 classes from the dataset for classification model training.

Fig 6. Normalized confusion matrix for the EfficientNet-B0 insect classification model, validated on the dataset test split.

Table 2. Metrics of the EfficientNet-B0 insect classification model, validated on the dataset test split.

Metadata post-processing

Insect tracking evaluation

Insect classification validation

Field deployment

Table 3. Recording times of the five deployed camera traps and site details of the orchard meadows.

Data visualization

Results

Insect tracking evaluation

Fig 7. Evaluation of the insect tracking accuracy in a lab experiment.

Insect classification validation

Fig 8. Normalized confusion matrix for the EfficientNet-B0 insect classification model, validated on a real-world image dataset.

Field deployment

Weather resistance

Fig 9. Maximum air and OAK-1/RPi CPU temperatures per day.

Fig 10. Mean humidity per day measured inside the enclosure and at nearest weather station.

Fig 11. Mean PiJuice battery charge level and sum of the sunshine duration per day.

Insect monitoring data

Fig 12. Total number of unique tracking IDs for each predicted insect class.

Fig 13. Time difference to previous tracking ID classified as the same class.

Fig 14. Total number of unique tracking IDs classified as hoverfly and recording time per day.

Fig 15. Number of unique tracking IDs classified as hoverfly per hour and precipitation sum per day.

Fig 16. Estimated hoverfly activity (number of unique tracking IDs per hour) per time of day.

Discussion

Processing pipeline

Insect detection and tracking

Insect classification

Insect monitoring data

Future perspectives

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Ramzi Mansour

Roles

Author response to Decision Letter 0

Decision Letter 1

Ramzi Mansour

Roles

Author response to Decision Letter 1

Decision Letter 2

Ramzi Mansour

Roles

Acceptance letter

Ramzi Mansour

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases