Abstract
Identifying flying insects is a significant challenge for biologists. Entomological lidar offers a unique solution, enabling rapid identification and classification in field settings. No other method can match its speed and efficiency in identifying insects in flight. This non-intrusive tool is invaluable for assessing insect biodiversity, informing conservation planning, and evaluating efforts to address declining insect populations. Although the species richness of co-existing insects can reach tens of thousands, current photonic sensors and lidars can differentiate roughly one hundred signal types. While the retrieved number of clusters correlate with Malaise trap diversity estimates, this taxonomic specificity, the number of discernible signal types is currently limited by instrumentation and algorithm sophistication. In this study, we report 32,533 observations of wild flying insects along a 500-meter transect. We report the benefits of lidar polarization bands for differentiating species and compare the performance of two unsupervised clustering algorithms, namely Hierarchical Cluster Analysis and Gaussian Mixture Model. Our analysis shows that polarimetric properties could be partially predicted even with unpolarized light, thus polarimetric lidar bands provide only a minor improvement in specificity. Finally, we use the physical properties of the clustered observations, such as wing beat frequency, daily activity patterns, and spatial distribution, to establish a lower bound for the number of species represented by the differentiated signal types.
Introduction
Both the abundance and diversity of insects are declining [1–4], especially in regions with highly industrialized agriculture [5]. This decline may threaten ecosystem food chains [6] and pollination services of our crops [7]. Identifying conservation priorities to reverse this decline require efficient diagnostic tools to assess insect abundance and diversity though. Photonic approaches [8] such as photonic sensors [9, 10] and entomological lidars [11, 12] have the potential to count and classify free-flying insects in situ continuously with close to no running costs. To date, entomological lidar can detect more than 105 insects daily [13] and differentiate more than a dozen groups [11, 12]. While the count rate is superior to sweep netting [14], traps [15] and robotic analysis [16], the taxonomic specificity is inferior to classification by e.g., machine vision [17] and genetic approaches [14]. Advantages with photonic classification approaches include the non-intrusive nature, and that no post capture manual classification of specimens is needed. Moreover, photonic in situ observations of insects provide complementary information that cannot be obtained employing other currently available approaches. For example, data can be used to retrieve daily activity patterns [12], preferences for topographic features [18], or information on the spatial distributions of species abundance [19].
The number of insect species that can be identified by lidar or photonic sensors is constrained by a) the performance of the data clustering approach, b) the number of spectral- [20, 21] or polarization [22–24] bands of the instrument, or, in the ideal case, c) the number of species present in the habitat. There may be tens of thousands of insect species co-existing in the same habitat [25], amounting to an even overall higher number of groups constituted by sexes and age groups of the specimens.
Most proposed methods for photonic clustering of insects are based on assessing the wingbeat frequencies (WBF) [9, 26]. Insect WBFs range from approximately 10 Hz to 1000 Hz, however, the relative spread for a single species and sex under constant environmental conditions is generally 25%, which only leaves room for 18 distinct WBFs within this range. Wingbeat harmonics can provide additional information on wing dynamics [27] and specularity of the wings [28, 29], thus improving specificity. Multiple studies have exploited wingbeat harmonics to differentiate insect groups [30], showing that even sexes from a single species can produce distinct harmonic content depending on the observation aspect [22, 24, 31], with females generally being larger and having slower WBFs [32]. WBFs are also influenced by temperature [32–34]. However, closely related species may also produce similar signals that are indistinguishable for the instrument and setup. Nevertheless, species-rich insect ensembles will generally produce a more diverse ensemble of signals [9].
Multiple studies have highlighted how multiple wavelengths could aid the differentiation of closely related species [22, 28, 35, 36]. In particular, specular flashes are highly sensitive to the ratio of laser wavelength to wing membrane thickness. Also, wing membrane thicknesses are frequently highly species-specific [28].
To what extent polarimetric information has the potential to improve specificity is less well-characterized. Generally, light loses its original polarization by multiple scattering in biological tissue [37]. Consequently, near-infrared (NIR) light depolarizes when interacting with larger probe volumes in insect bodies on the scale of millimeters [22, 31, 38], whereas polarization is maintained when light probes thin insect wings on the order of a micron [28, 39]. Factors increasing the degree of linear polarization (DoLP) include absorption by melanin and water, which primarily punish photons with longer interaction path lengths that are more prone to depolarization. Factors reducing DoLP include wing scales of moth and butterflies [29] and even eggs inside the abdomen [40], which increases multiple scattering. However, it remains unknown to what extent polarimetric information can improve species differentiation.
Here, we investigate the benefits of polarimetric information for clustering free-flying wild ensembles of insects. We report 32,533 insect polarimetric lidar observations, in a 500 m long transect over a lake. We employ two unsupervised clustering methods to estimate signal diversity with and without polarimetric information. We assess to what extent diverse signals derive from a single species by analyzing the similarity of daily activity patterns and spatial distributions.
Data collection
Field site
Field work was conducted on June 14th, 2020, at Stensoffa ecological field station, Sweden (55°41′44′′N 13°26′50′′E). The field site includes a forest, graze land, a pond, and a swamp [41], with low levels of light pollution and high species richness. Within this site, we positioned the experimental setup, with the lidar’s field of view extending over a 500-meter long, homogeneous, artificially created peat pond. A Scheimpflug lidar was positioned on one shore, with the laser’s termination point on the opposing shore. Both the lidar and termination point maintained a constant height over the pond throughout the transect, keeping approximately the same distance to the shore on both sides of the transect.
By selecting a rectangular pond, we aimed to minimize the influence of topological differences on insects flying across the laser beam, for example, due to differences in vegetation or flight distance between shores. However, some parts of the beam were visited by insects more frequently due to the presence of patches of reeds and floating water plants.
Instrument
The design of the Scheimpflug lidar system is described in reference Zhu et al., 2017 [42]. The system is based on kHz time-multiplexing, comprising two TE polarized 3W, 980 nm laser diodes (MLD-980-3000, CNI lasers, China). The laser apertures are 95μm and fast-axis-collimators (FACs) are glued to diodes reducing their divergence to 8° in both axes. A NIR wavelength was chosen to avoid disturbing the insects, as they are insensitive to this light. Furthermore, backscattering is increased at this wavelength because insect melanization absorbs less NIR light.
To retrieve polarimetric lidar data, we illuminate the targets with laser beams of alternating orthogonal linear polarization. To achieve this, we rotate the polarization state of one of the laser sources by 90° using a half-wave plate (WPQ10E-980, Thorlabs, USA), then co-align the two beams using a polarizing cube beam splitter (PBS203+B4CRP/M, Thorlabs, USA). The radiation is collimated by a Ø75 mm, f = 300 mm achromatic doublet (#88–597, Edmund Optics, UK) in a focus mechanism (Monorail, Teleskop-Service, Germany). The lidar overlap is controlled by a tangential mount (Stronghold, Baader planetarium, Germany). The receiving telescope is a Ø200 mm, f = 800 mm Newton reflector (Quattro, SkyWatcher, China). The received light passes a 10nm FWHM filter at 980 nm (#65–247, Edmund Optics, UK) and a NIR linear polarizer (LPNIRE200-B, Thorlabs, USA) before it is imaged onto a linear CMOS detector, which is tilted 45° according to the Scheimpflug condition and hinge rule. The linear array detector (OctoPlus, Teledyne e2v, USA) has 2048 pixels of 10x200 μm each. It can read out 80 kLines/s at 12 bits, but in this experiment, it was operated at 6 kHz.
Our system achieves kHz-rate separation of co-polarized and de-polarized light components by multiplexing two orthogonal laser sources [43, 44]. We sequentially illuminate the target with a three-timeslot cycle: timeslot 1, laser I is ON; timeslot 2, laser II is ON; timeslot 3, both lasers are OFF (used for real time subtraction of the background from the first two exposures). This effectively provides a 2 kHz sample rate with a maximum observable modulation frequency of 1 kHz due to the Nyquist criterion [45]. The lowest achievable frequency and resolution depend on the insect’s transit time through the laser beam.
Lidar observations
We conducted continuous lidar recordings throughout June 14, 2020, accumulating ~2.5 terabytes of raw data. To isolate insect observations, we implemented a thresholding technique, selecting data exceeding the median intensity of backscattered light plus five times the interquartile range (IQR) within each 5-second data file (~30,000 exposures), see [13, 46, 47] for detailed accounts of the preprocessing. We further refined the dataset to include only observations exceeding 40 ms transit time, corresponding to a minimum detectable WBF of 25 Hz. This criterion yielded a total of 32,533 observations. A typical insect observation manifests as a modulation of backscattered light intensity over both time (exposure number) and space (pixel number), as illustrated in Fig 1A.
Fig 1. Lidar insect observations.
(a) Modulation of backscattered light intensity from a single insect across exposures (time domain) and pixels (space domain). Co-polarized (cyan) and de-polarized (magenta) components shown. (b) Instantaneous echo in the range domain (@ exposure #148), with range and insect size deduced from absolute and differential pixels respectively. (c) Signal waveform showing intensity modulation over time. (d) Power spectra. (e) Distribution of observations by solar time (15-minute bins with bin centers from 00:07 and ending at 23:53) and range (20 logarithmically spaced bins between 48 m and 427 m). Time is reported in true solar time. (f) Range distribution of insect observations. (g) Time distribution of insect observations. (h) Distribution of insects’ transit times >40 ms. In (b-d), co-polarized components are in red, de-polarized in blue, see legend in (b).
We analyzed the lidar signal using multiple approaches. First, projecting the signal into the spatial domain provides lidar echo intensity across pixels. This information can be used in two ways: 1) by transforming absolute pixel numbers to determine the distance to a target (left y-axis in Fig 1B), and 2) by transforming differential pixel numbers to estimate the apparent insect size (right y-axis in Fig 1B).
Second, analyzing the signal from the co-polarized and de-polarized channels in the time domain generates two distinct waveforms (Fig 1C). When comparing these waveforms, we observe that co-polarized backscatter from glossy wings appears as a series of brief, specular flashes. In contrast, the de-polarized backscatter lacks these distinct flashes and instead presents a less intense, smoother waveform with the same periodicity, caused by broader scattering lobes by the de-polarizing wing features such as the veins and scales. The relative intensities of co-polarized and de-polarized light provide additional information about the surface properties of the insect’s wings. For example, nearly equal intensity in the co-polarized and de-polarized waveforms suggests that most of the backscattered light has a randomized polarization state (thus an equal chance to detect co-pol. and de-pol. signals). In contrast, a dominant co-polarized signal indicates a higher degree of glossiness of the insect’s wing.
We further analyzed the temporal and spatial distributions of the observations. Fig 1E visualizes a 2D histogram illustrating the count distribution, while Fig 1F and 1G show the probability of observations based on range and solar time. Notably, activity was reduced around noon, reflected in fewer observations, there is a higher probability of observing an insect closer to the detector. We also present a transit time histogram (Fig 1H) displaying the distribution of transit times for all observations exceeding the 40 ms threshold.
By combining spatial, temporal, and polarimetric information, we can enhance the classification of insect observations, allowing us to identify specific behaviors that may distinguish species or other taxonomic groups. This approach enables us to observe patterns that go beyond individual morphology, providing insights into behaviors that may be characteristic of certain species or groupings. For example, in the waveforms, periodic bright reflections correspond to the insect’s WBF, and the duration of these flashes can indicate wing specularity. By comparing the intensity of co-polarized and de-polarized backscatter, we can quantify the DoLP. This combined analysis enables us to differentiate between insects that share similar WBFs but distinct polarization signatures. Additionally, we can determine the detection range and time of day for each observation, or analyze these distributions for a group, revealing time activity patterns and spatial preferences for groups of insects.
Estimation of oscillatory power spectra
Although waveforms provide valuable information on insect size, wingbeat frequency, and wing specularity, directly comparing them for the purpose of insect clustering presents significant challenges. Variation in waveform shape can arise from external factors, such as time of transit in the lidar beam, and variation can arise due to the mismatch in timing between an insect’s wingbeat and the lidar’s sampling intervals. To address this, we calculate the oscillatory power spectra for each observation (Fig 1D), which represent the signal in the frequency domain as a distribution of power across normalized frequency bins. The resulting power spectra reveal the insect’s fundamental WBF and its harmonic overtones, providing a more robust basis for clustering and comparison.
To estimate the power spectral density, we use Welch’s method, a technique that averages modified periodograms, implemented in MATLAB Signal Processing Toolbox. We define the observable frequency range spanning between 25 Hz (reciprocal of minimal transit time) and 1000 Hz (the Nyquist frequency), and the number of linearly spaced frequency bins as 80 (the number of time samples in 40 ms-long observation at 2000 Hz sampling frequency). We also define a Gaussian time window with a FWHM of half the number of time samples to smoothen the signal. We set the number of overlapping samples in the sliding Welch power estimate to 79, the maximum possible overlap constituting the heaviest computations operation.
Power spectra preprocessing
While power spectra capture an insect’s wingbeats in a fundamental peak and wing glossiness as the number of harmonic overtones, we hypothesize that incorporating polarimetric data may reveal additional distinctions based on wings’ DoLP. To test this hypothesis, we generated three datasets representing different data acquisition scenarios: with and without polarimetric data.
Non-polarimetric data acquisition (unpolarized dataset)
This dataset simulates a scenario when a signal is acquired without polarimetry. We achieve this by summing up both co- and de-polarized power spectra and then normalizing the area under the merged curve to unity (Eq 1).
| (1) |
Here, Punpol(f) is the unpolarized power spectrum, Pco(f) and Pde(f) are the co-polarized and de-polarized power spectra, respectively.
We present the resulting power spectrum in Fig 2A (specular case) and 2D (diffuse case). By color-coding the proportion of the de-polarized signal, we illustrate the similarity between the unpolarized (total) signal and the de-polarized signal. We observe that in a specular case, de-polarized light improves the certainty of the peak at ~250 Hz, however, and has little influence on other frequency peaks. Whereas, in diffuse case, de-polarized light is the main contribution to powers.
Fig 2.
Three datasets with varying polarimetric information for a specular (top row) and a diffuse observation (bottom row) (a, d) Unpolarized data is shown as black solid line, whereas blue shade shows contribution from the co-polarized channel, and orange–from de-polarized; (b, e) Co-polarized dataset. (c, f) DoLP dataset.
Coherently backscattered light acquisition (co-pol. dataset)
To obtain the co-polarized dataset, we take only the co-polarized component and normalize it to unity (Eq 2). This represents an acquisition scenario, when targets are illuminated using linearly polarized light, and measurements made in the same polarization state (Fig 2B and 2E).
| (2) |
Polarimetric data acquisition with Degree of Linear polarization (DoLP dataset)
The DoLP dataset (Fig 2C and 2F) is a scaled version of the co-polarized dataset. In this dataset, the area under the co-polarized power spectrum represents the DoLP information for the oscillatory part of the signal, excluding the 0–25 Hz range (Eq 3).
| (3) |
Importantly, when normalizing the areas under all power spectra, we ensured that the relative strength of frequency components within each spectrum remains consistent regardless of the distance at which the insect was observed. This approach addresses a potential source of bias in our analysis—namely, the signal intensity attenuation with distance.
Methods
HCA
To cluster insect spectra, we conducted Hierarchical Cluster Analysis (HCA) on area-normalized, log-transformed power spectra using MATLAB’s linkage function, specifying ’ward’ as the method and ’euclidean’ as the metric. This method employs Euclidean distance to cluster power spectra based on similarity, accommodating minor variations in Wing Beat Frequencies (WBFs), a phenomenon frequently observed within the same species (9). Furthermore, this metric is sensitive to changes in the Degree of Linear Polarization (DoLP), including variations in the number of harmonic overtones and how power spectra scale with DoLP. We selected Ward’s linkage criterion [48] to minimize the variance within newly formed clusters, thereby ensuring that observations within each cluster closely resemble the cluster’s centroid.
To determine the optimal number of clusters, we analyze the changes in linkage rates, identifying significant deviations from the expected values due to random variations in power spectra. Fig 3 illustrates our method. Panel a presents the linkage values in reverse order (from largest to smallest). By displaying these values on a logarithmic scale, we linearize the decrease in linkage values. From this plot, we calculate the linkage rates (slopes) at each step of the HCA and determine the median slope (γ = −0.357), which is depicted in Fig 3A as a solid line. This slope represents the expected decrease in linkage under conditions of random spectral variation.
Fig 3. Identifying optimal cluster numbers in hierarchical cluster analysis.
(a) Reverse-ordered linkage values on a logarithmic scale. The median slope (γ, solid line) represents the expected linkage decrease. (b) Compensated linkage values. Shaded area highlights the expected linkage. (c) Distribution of compensated linkage values with median (red line) and outlier boundaries (Q1−1.5⋅IQR, Q3+1.5⋅IQR, blue shaded area).
Next, to identify significant linkages, we calculate compensated linkage values using the formula , where represents the compensated linkage, Li is the reversed linkage (from largest to smallest), γ is the median slope, and i ranges from 1 to the total number of steps, N. This transformation effectively modifies the linkage plot from Fig 3A to 3B. Subsequently, we analyze the distribution of these compensated linkage values (Fig 3C) and identify significant linkages (outliers) using the 1.5xIQR rule [49]. Specifically, we select those linkage values that exceed Q3+1.5⋅IQR. The optimal number of clusters is then determined by the count of these outliers, as illustrated above the shaded area in Fig 3B.
GMM clustering
Prior to clustering lidar observations using GMM, we reduced the dimensionality of area-normalized, log-transformed power spectra using Uniform Manifold Approximation and Projection (UMAP) [50], MATLAB implementation [51]. UMAP parameters were n_components = 3, dmin = 0.01, n_neighbours = 199, and metric = ’euclidean’. The dmin parameter is chosen to achieve tighter grouping of similar observations, while n_neighbours balanced algorithm between focusing on local and global structure of the data. We chose the maximal n_neighbours value allowed by the UMAP library. Reducing the data from 81 features (frequencies) to three (UMAP-coordinates) increased data point density, aiding a density-based GMM algorithm to identify clusters.
Next, we fit a Gaussian mixture distribution [52] to the UMAP-embedded data using MATLAB’s fitgmdist function (Statistics and Machine Learning Toolbox). To determine the optimal number of clusters, we scanned the n_components parameter (range: 55–555) and selected the solution minimizing the Bayesian Information Criterion (BIC). BIC is calculated as BIC = ln(n)k−2ln(L), where n is the number of observations, k is the number of estimated parameters, and L is the maximum value of the likelihood function for the model. Other fitgmdist parameters were: RegularizationValue = 1e-6, CovarianceType = ’full’, SharedCovariance = ’false’, Replicates = 1, and Options = statset (MaxIter = 100, TolFun = 1e-3).
Another approach to finding the optimal number of clusters would be to use Akaike Information Criterion (AIC) calculated as AIC = 2k−2ln(L). Both AIC and BIC criteria favor models that fit data well (large L) and have fewer parameters (small k), however BIC tends to impose a stronger penalty on the number of parameters, resulting in favoring simpler models than AIC.
Evaluating clustering agreement
Next, we evaluate how well clustering algorithms agree about the optimal partitioning of the data. To compare solutions, we leverage two metrics from the scikit-learn library in Python [53]: Adjusted Mutual Information Score (AMI) and Homogeneity Score. AMI [54] is a variation of Mutual Information (MI) that accounts for a chance for two solutions to agree, especially when we compare clusters of different sizes or different numbers of clusters. AMI scores range from 0 to 1, with 1 indicating perfect agreement and a score of 0 indicating agreement no better than random chance.
We also employ a Homogeneity score [55], a metric that reflects the internal consistency of solutions, for example, if larger clusters in one solution are split into many in another. A homogeneity score of 0 indicates that clusters of one solution have random observations compared to another solution. A score of 1 indicates perfect homogeneity, with each cluster in one solution containing observations of the same cluster in another.
Time and range communities
We analyze the time and range profiles associated with clusters by extracting the time and range stamps of assigned observations. For each cluster, we calculate the probability of observing a member at a specific time and range bins (as in Fig 1F and 1G). To compare insect clusters’ time and range distributions, we employ the two-sample Kolmogorov-Smirnov (K-S) test [56], implemented in MATLAB, Statistics, and Machine Learning Toolbox. The K-S test assesses whether two empirical distributions originated from the same parent distribution, providing a distance metric and p-value. Using K-S p-values measured between cluster pairs, we construct two similarity matrices: one for time and another for range.
We construct similarity matrices to understand how clusters naturally group into communities. These communities are characterized by greater internal similarity compared to their similarity with clusters outside the group. To identify these communities, we first calculate a modularity matrix [57] using modularity_f(A, gamma), implemented in an external MATLAB library [58], where:
A: The similarity matrix (K-S p-values) that contains K-S p-values between all cluster pairs.
gamma (γ): The resolution parameter controlling the granularity of the community structure. Lower values (γ < 1) tend to produce fewer communities, while higher values (γ > 1) result in more communities. In our analysis, we use the default value of γ = 1.
GenLouvain algorithm [58], with deterministic output and default parameters, is then applied to the modularity matrix. This yields a community assignment for each cluster, effectively partitioning the clusters into time and range communities.
Additionally, based on this cluster-to-community mapping, we calculate a modularity score (M), which quantifies the strength of the identified community structure. Modularity values range from 0 (indicating a random structure) to 1 (signifying a well-defined structure), or even -1 (suggesting a structure less optimal than random).
Lidar based diversity indices
To quantify and compare clustering results across experiments, we employed Hill numbers [59], a family of diversity metrics that allows us to emphasize different aspects of diversity by adjusting a single parameter, α [60]. Hill numbers are expressed by the following equation (Eq 4):
| (4) |
, where S is a set of all clusters, pj is a relative size of cluster j (cluster j∈S) calculated as the number of observations in cluster j divided by the total observations, and α is an integer value ranging from ±∞.
Varying α, we land at three diversity indices:
Total number of clusters
The H0 metric (α = 0) reflects the total number of clusters (species) S, giving a high importance to rare clusters (Eq 5):
| (5) |
Effective number of clusters
The H1 (α = 1), also known as Shannon diversity of order 1, weighs both rare and abundant clusters [61], providing an estimate of how many equally-sized clusters would yield the same Shannon Entropy (Eqs 6, 7). This is analogous to the number of effective choices in a prediction model.
| (6) |
| (7) |
The number of dominant clusters
The H2 metric (α = 2) emphasizes dominant clusters, indicating a more even spread of diversity across clusters (Eq 8).
| (8) |
Detrending of power spectra
For visualization purposes, we detrended the power spectra by fitting a line (trend) to area-normalized and log-transformed power spectra and then subtracting it. The resulting positive and negative values indicate power above and below the trend. This approach, applied for heatmap visualization with a diverging colormap, allows us to highlight even subtle oscillations. However, it is important to note that we do not use the detrended power spectra in any analysis.
Bootstrapping to evaluate confidence intervals
To assess the variability of our metrics, we employed a bootstrapping technique [62], a resampling-based method well-suited for scenarios with limited day-to-day data. This approach involves generating N = 1000 synthetic datasets by randomly sampling observations with replacement from the original dataset. Each original observation has an equal probability of being included in a synthetic dataset, and some may be included multiple times. The original dataset can be observations from the same cluster, community, or any other relevant subset.
For each synthetic sample, we calculate the metric of interest, resulting in N = 1000 variants depending on the drawn observations. From this distribution, we empirically estimate the mean of the metric and its 95% confidence intervals (CIs) using the 2.5th and 97.5th percentiles. This provides a range within which we are 95% confident that the true value of the metric lies, accounting for sampling variability.
In our study, we applied bootstrapping to estimate confidence intervals (CIs) for several key metrics:
Clusters’ mean DoLP: To assess the DoLP for both found and random clusters, we generated N = 1000 synthetic samples for each cluster by randomly drawing observations with replacement from the evaluated cluster. For each synthetic sample, we calculated the mean DoLP. By retrieving N values of mean DoLP, we then evaluated this distribution to obtain the mean and CIs for the cluster’s DoLP.
Time and Range Profiles: For each time/range community or range-DoLP subset, we generated N = 1000 synthetic samples by randomly drawing observations with replacement. For each synthetic sample, we determined the probability of an observation in time (or range). To quantify the variability of these probabilities, we analyzed the N values obtained at each time (or range) bin, reporting the mean probability and its 95% CIs.
Results and discussion
Cluster count and agreement analysis: HCA vs. GMM
Unsupervised clustering is a valuable tool for rapidly assessing insect diversity from lidar observations. Unlike classification, which requires labeled data that is often scarce and costly to obtain, clustering groups insect observations based on inherent similarities in their characteristics. This study focuses on characteristics embedded into power spectra, specifically the frequency content (WBF and harmonic overtones) and the DoLP (when using the DoLP dataset).
However, these features may not enable distinguishing insects to species. WBF can be shared across multiple species, or exhibit significant variability even within the same species, causing multiple species to merge into clusters or a single species to split into multiple clusters [9, 63, 64]. These classification complications may affect insect diversity estimates. Additionally, diversity estimates could be biased due to different clustering algorithms producing different solutions, that vary in the number and size of identified clusters.
In this section, we explore the differences between clustering solutions by employing two contrasting methods. One is Hierarchical Clustering Analysis (HCA), a deterministic approach previously employed to group observations from photonic sensors and lidar [9, 11, 12, 19] (see Methods: HCA), and Gaussian Mixture Model (GMM), a stochastic approach (see Methods: GMM). Comparing HCA and GMM clustering results, we observed that these methods clustered lidar observations with varying granularity. HCA yielded 803 (unpolarized), 245 (co-polarized), and 256 (DoLP) clusters, while GMM produced fewer: 80 (unpolarized), 86 (co-polarized), and 89 (DoLP).
To determine if these methods produce consistent results despite the varying granularity, with HCA offering a more fine-grained view, we assessed the agreement between their respective clustering using two metrics: Adjusted Mutual Information (AMI) and Homogeneity score. AMI ranges from 0 to 1, with higher values indicating that the same observations are grouped into the same clusters across both methods, after adjusting for chance. The Homogeneity score, also ranging from 0 to 1, evaluates whether each cluster from one method contains observations primary from a single cluster in the other method. A high Homogeneity score indicates that one method’s clusters are subsets of the others’. We explain both metrics in detail in Section Methods: Evaluating clustering agreement.
We observed moderate agreement between the methods, with AMI scores ranging from 0.47 to 0.55 (S1A Fig). However, the Homogeneity score was higher, ranging from 0.66 to 0.74 (S1B Fig, the upper triangle). This result suggests that there is a difference in the underlying composition of clusters, and that the methods did not merely identify the same clusters at different resolutions. Despite these differences in the number and composition of clusters, most clusters in both solutions exhibited discernible frequency content (see median power spectra for clusters in S2 and S3 Figs). In the absence of ground truth for optimal partitioning, we then evaluated clustering results based on DoLP homogeneity, distinction in activity time patterns, and spatial distribution.
Degree of linear polarization for clusters
In this section, we investigate whether wings’ polarimetric characteristics (from glossy to diffuse) can be predicted using unpolarized data alone, and how this prediction is improved by including polarimetric data. To quantify the differences between datasets, we measure the clusters’ DoLP homogeneity as detailed in Methods: Bootstrapping to evaluate confidence intervals. We report the clusters’ homogeneity as the mean DoLP and its 95% confidence interval (CI) (2.5th and 97.5th percentiles). To determine the significance of the observed results, we compared the CIs of a mean DoLP for found clusters against those derived from randomly assembled clusters of the same size. We also divided clusters into four groups based on DoLP quartiles (from Q1, most glossy, to Q4, most diffuse).
We find that most of the clusters from the glossy group (Q1) and some from the diffuse group (Q4) are significantly different from random ones (CIs of observed and random clusters do not overlap), see Fig 4 (DoLP dataset) and S4 and S5 Figs (unpol. and co-pol. datasets). The clusters’ DoLP uncertainty is the largest for the HCA applied to the unpolarized dataset (S4A Fig), however, this dataset returns smaller clusters. The major difference between the three datasets is that including polarimetric information improves the isolation of low-DoLP observations into distinct clusters. Notably, HCA shows greater sensitivity in finding clusters with lower DoLP compared to GMM. Intriguingly, both methods identified clusters with anomalously low DoLP (~1–2%), suggesting a less than random polarization state for the backscattered light. Potential explanations include scattering from extremely small, fluffy insects where polarized light escapes on the backside before having the chance to scatter 180°. It could also be measurement outliers due to imperfect beam overlap.
Fig 4. DoLP characterization of clustering results.
(a) HCA and (b) GMM show comparisons of cluster DoLP distributions for found clusters (black error bars) and randomly generated clusters of the same size (gray error bars). Error bars represent the bootstrapped mean DoLP and its 95% CI for each cluster. Found clusters are ranked by decreasing mean DoLP (x-axis). Vertical lines denote DoLP quartile boundaries (Q1-Q4).
To further examine the impact of polarimetric information on clustering results, we visualized the rearrangement of observations across different DoLP quartiles (Fig 5). We aggregated observations based on the DoLP of their assigned cluster and represented these rearrangements using flow lines. Our analysis shows that the Q1 quartile produces the most consistent results, with 26% of Q1 observations being shared across the three datasets in HCA and 37% in GMM. Significant rearrangements between unpolarized and DoLP datasets predominantly occur between adjacent quartiles, though 9% (HCA) or 12% (GMM) observations are reassigned across non-adjacent quartiles (e.g., from Q1 to Q4). We conclude that even without polarimetric information, clustering algorithms can identify highly glossy wings. However, polarimetric data is particularly beneficial for co-clustering together low-DoLP observations.
Fig 5. Rearrangement of observations between cluster’s DoLP quartiles.
(a) HCA clusters. (b) GMM clusters. Left panels show rearrangements between the unpolarized and the DoLP datasets, and right panels illustrate differences between co-polarized and DoLP datasets. Each quartile (Q1—Q4) is labeled with the number of observations. The flows (lines) between quartiles indicate the fraction of observations, with line width proportional to the number of observations. To plot the alluvial diagrams we use RAWGraphs [65].
To evaluate if HCA and GMM agree on the content of the top five glossy clusters, we next compare their median power spectra (DoLP dataset, S6 and S7 Figs). Despite both returning similar power spectra for rank 1 and 2 clusters, GMM aggregates more observations per cluster (e.g., rank 1: 123 observations in GMM vs. 23 in HCA). This indicates that GMM generalizes power spectral patterns more broadly, leading to larger clusters, while HCA maintains a stricter similarity criterion. The conclusion is thus the same when based on the similarity of the top two glossy clusters as when based on the homogeneity score.
Time and range differences among clusters
Distinct species are likely to exploit distinct niches in time and space. This could be a matter of crepuscular species adapted to a certain ambient light level or bumble bees adapted to forage earlier and be active at lower temperatures compared to other pollinators [66, 67]. Range differences among clusters could arise both due to detection biases and that the insect clusters have distinct preferences for topographic features such as vegetation or reeds along the transect. Resolution biases depending on the instrument could arise since larger, brighter, or glossier species could be detected over further ranges.
To assess the biological relevance of clustering, we investigated whether distinct daily activity patterns and range profiles could define communities–groups of clusters that are more similar within a group than between (see Section Methods: Time and range communities). Comparing the two clustering approaches, we find that the GMM method most clearly recovers community structure, whereas HCA performs worse. We quantify it using a modularity metric (M). It ranges from 0 (random structure), to 1 (well-defined structure), or to -1 (less optimal than random). In HCA, modularity increased with the addition of polarimetric information (unpol. < co-pol. < DoLP). This trend was evident in both time communities (Munpol. = 0.07, Mco-pol. = 0.15, MDoLP = 0.16) and range communities (Munpol. = 0.08, Mco-pol. = 0.13, MDoLP = 0.14). In contrast, clusters identified by GMM show relatively strong community structure across all datasets, with modularity remaining consistent for both time (Munpol. = 0.26, Mco-pol. = 0.27, MDoLP = 0.25) and range communities (Munpol. = 0.12, Mco-pol. = 0.09, MDoLP = 0.11). The presence of community structure indicates that the time and range profiles of the clusters diverge from the average pattern, suggesting ecologically distinct groups. However, the moderate modularity scores imply these patterns are not discrete but rather overlapping, with some clusters exhibiting similarity to multiple communities. This is visualized in Fig 6, a heatmap of cluster-to-cluster similarity, where communities appear as bright squares along the diagonal, but some clusters show high similarity across communities.
Fig 6. Community structure analysis for HCA and GMM clustering results based on time and range profiles.
Each symmetric matrix displays similarity of time (top panels) and range (bottom panels) profiles across cluster pairs, with darker pixels indicating greater dissimilarity. The heatmaps are organized to place rows/columns adjacently if clusters are from the same community, thus making communities appear as bright squares along the matrix diagonal, manifesting greater similarity within a community than between them (see the bottom-right schematic).
Next, we characterized both time and range communities by plotting communities’ probability distributions across time and range bins (Fig 6 and S8 Fig). To illustrate clusters’ variability, we employed bootstrapping (see Methods: Bootstrapping to evaluate confidence intervals). We observe that time communities primarily differentiate based on variation in evening activity patterns (Fig 7 I-III), whereas range communities are characterized by a decaying probability of observation with a different detectability cut-off: with some clusters detected at mid-ranges, <160 m (Fig 7A) and others primarily at long ranges, <255 m (Fig 7B).
Fig 7. Characterization of time and range communities.
CIs of observation probability for (A, B) range and (I-II-III) time communities (see legend). At the top of each panel, we show the size of a community. Heatmaps at the AB and I-III intersection display median power spectra for a corresponding time-range community. The y-axis segments heatmaps into stripes, one for each cluster. Variation of colors within a stripe indicates power magnitude at corresponding frequencies (x-axis). The powers are shown after normalization, logarithmic transformation, and detrending. The color-bar encompasses 5th to 95th percentiles of all ranges of power values.
The variation in spatiotemporal profiles may be related to the frequency content of the lidar signal. To visualize this, we plot clusters’ median power spectra after detrending (see Methods: Detrending of power spectra), showing them as heatmaps at the intersection of (A, B) and (I-II-III) probability plots (Fig 7B). Here, we observe that insects detected at long ranges (group B) tend to have a first peak in their frequency spectrum below 250 Hz. This peak could correspond to the fundamental frequency of a wingbeat, suggesting that larger insects, which have lower WBFs, are more likely to be detected at greater distances (for example, larger insects with lower WBFs, e.g. dragonflies, are more likely).
Range dependence of co-polarized backscatter
To further explore the factors influencing long-range detectability, we investigated the impact of wing glossiness. We hypothesize that insects with glossy and clear wings scatter laser light coherently, with a narrow lobe and rapid angular speeds, resulting in improved transmission over distances. To test this hypothesis, we subdivided insects from the range communities (A: mid; B: far) into four quartiles based on their DoLP (Q1-Q4, representing decreasing glossiness, see Fig 4). Creating these subsets of clusters allows us to compare range profiles of, for example, highly glossy insects detectable at far ranges (Q1-B subset of clusters) with diffusive insects detectable at the mid-range (Q4-A). Next, for each subset, we calculated the mean probability of detection at each range bin, along with the 2.5th and 97.5th percentiles (CIs), as described in section Methods: Bootstrapping to evaluate confidence intervals.
Comparing range profiles for different DoLP groups, we observe a striking feature in the far-range community: a peak at ~120m in an otherwise decaying with distance probability of observation (Fig 8 and S9 Fig). This peak is most prominent for glossy insects (Q2). Visualizing the laser beam path over the pond (Fig 8, bottom), we note that this peak coincides with the proximity of a landmass, marked with a red dot. This suggests differences in insect communities based on proximity to land. Acknowledging the noise introduced by assuming that observations from all DoLP groups (Q1-Q4) have an equal probability of being present at this landmass, we hypothesize that the lack of a peak at 120 m in the low-DoLP distributions (particularly Q4) implies that glossiness significantly affects detectability at this distance. These findings indicate that the clusters reflect the spatial preferences of insects and thus could be seen as a meaningful coarse-grained representation of lidar observations. This representation can be further employed to describe insects’ activity patterns and spatial preferences, for example, due to changes in vegetation over seasons, or to provide a means for evaluating the attraction range of conventional insect traps.
Fig 8. Range dependence of co-polarized backscatter.
CIs of probability distributions show the likelihood of observing an insect within a DoLP quartile (Q1-Q4: glossy to diffuse) and range community (A: mid-range, B: far-range). In the top-right corner of probability distributions, we show the number of observations. In B-plots, we show with the red dot the spike in the probability of observing an insect, potentially linked to a nearby landmass (see bottom panel). Heatmaps depict median power spectra for clusters within corresponding DoLP-range subsets (as in Fig 7).
Our findings also highlight some limitations of the current lidar setup for assessing biodiversity. Specifically, there are biases in determining the abundance and richness of insects. For example, some morphological features make certain insects easier to detect, leading to overestimation of their presence. These features could be size, brightness, and glossiness, and detection probability depends on how wing thickness resonates with the lidar wavelength. This observation suggests a direction for improving lidar technology by using longer wavelengths to enhance specularity and detection range. Longer (infrared) wavelengths have proven efficient in clustering moths [29, 68].
Lidar based diversity indices
We hypothesized that integrating polarimetric information into lidar signals would enhance discrimination between insect taxa, through adding the similarity of polarimetric properties of insect wings and bodies to the frequency content of power spectra. Moreover, cluster count and composition depend not only on the instrument but also on the choice of clustering algorithm, influencing conclusions about the diversity at the monitored site. To evaluate the impact of clustering approaches on diversity estimates, we compared the results of HCA and GMM clustering, focusing on the number and relative size of the identified clusters.
To illustrate cluster count and their relative size, we plotted the Ranked Abundance Distribution (RAD), depicting cluster sizes in descending order (Fig 9). We further characterized clustering results using Hill numbers, a family of diversity metrics (see Methods: Lidar based diversity indices). Specifically, H0 represents the total cluster count, providing an overall estimate of diversity; H1 represents the effective number of clusters, accounting for relative abundance; and H2 represents the dominant number of clusters, highlighting the most prevalent clusters.
Fig 9. Clusters’ size distribution.
(a—c) HCA clustering on three datasets; (d—f) GMM clustering on three datasets. The solid line shows the number of observations per cluster for clusters sorted from largest to smallest. Vertical lines mark Hill numbers.
Our analysis revealed a consistent trend of HCA producing a higher number of clusters compared to GMM (~250 vs. ~85), particularly evident in the unpolarized dataset (~800 vs. ~80) as illustrated in Fig 9 and Table 1. This suggests that HCA clusters are generally more diverse than GMM clusters. However, the high homogeneity score (~0.7, S1B Fig) between the two clustering solutions indicates that larger GMM clusters are often fragmented into smaller HCA clusters. Thus, the higher diversity estimates obtained through HCA likely reflect a finer resolution level at which the data is partitioned.
Table 1. Characterization of clustering results with Hill numbers.
NoC is a number of clusters.
| Dataset | H 0 | H′ | H 1 | H 2 | |
|---|---|---|---|---|---|
| (NoC) | (Shannon Index) | (Effective NoC) | (Dominant NoC) | ||
| HCA | unpol. | 803 | 6.58 | 724 | 662 |
| co-pol. | 245 | 5.40 | 222 | 204 | |
| DoLP | 256 | 5.45 | 232 | 213 | |
| GMM | unpol. | 80 | 4.14 | 63 | 54 |
| co-pol. | 86 | 4.23 | 68 | 59 | |
| DoLP | 89 | 4.24 | 69 | 58 |
To address the potential disproportionate influence of rare clusters on cluster richness (H0), we further evaluated the cluster size distribution using the effective number of clusters (H1). HCA consistently yielded a larger effective number of clusters than GMM relative to the total number of clusters, suggesting a more balanced distribution of cluster sizes. Moreover, HCA identified a substantially larger proportion of dominant clusters (H2) compared to GMM (~90% vs. ~65%) (Fig 9), indicating that our diversity estimates were not significantly inflated by rare clusters.
Hill numbers reveal that each method can lead to distinct conclusions, particularly regarding the proportion of dominant clusters within the total cluster count. These discrepancies are largely due to HCA and GMM exhibiting different levels of tolerance for variation within clusters. HCA favors similarly sized, spherical clusters because of the Ward linkage criterion, which defines a "good" cluster as one where all observations are relatively close to the cluster centroid. In contrast, GMM identifies clusters based on the probability of an observation belonging to a specific Gaussian distribution, allowing for the identification of elliptical clusters. Consequently, these differences impact the number and size distribution of clusters, and subsequently, the estimated diversity indices. Therefore, when interpreting insect diversity estimates derived from lidar data, it is crucial to carefully consider the inherent biases and assumptions of different clustering algorithms.
To move beyond the limitations of single clustering solutions and ensure more robust lidar-based diversity assessments, future research would benefit from evaluating the robustness of these indices through a more comprehensive approach. One promising avenue involves using stochastic algorithms to analyze an ensemble of clustering solutions, rather than relying on a single outcome [69, 70]. This would allow us to report a range of values for each Hill number, gaining valuable insights into the sensitivity of these metrics in detecting changes within the monitored site (see additional analysis for GMM results in S1 Text). Additionally, focusing on observations that consistently co-cluster together across multiple solutions could provide a more reliable basis for diversity estimates, as these observations represent a stronger signal compared to those that are grouped inconsistently and may introduce unpredictable variability.
Conclusions
Estimating insect diversity has traditionally been labor-intensive, relying on manual capture and classification [71]. To address this challenge, researchers have sought to automate this process [72] using technologies like radar [73] and lidar. In this study, we use polarimetric lidar to detect free-flying insects and investigate whether polarimetry improves diversity estimates. We anticipated that diversity estimates would be more accurate if polarimetric information was added.
We also explored how the use of clustering algorithm affected outcome. We initially focused on the total cluster count produced by each of two clustering methods. We observed a distinct difference in resolution, with GMM yielding ~85 clusters and HCA ~250. However, when interpreting this value as an estimate of insect diversity, it is important to recognize that neither algorithm intrinsically determines the optimal number of clusters. In HCA, increasing the significance threshold for compensated linkage would lower the cluster count, while in GMM, minimizing the AIC instead of the BIC would increase it, yielding ~300 clusters per dataset. Therefore, this value should be seen as a lidar-based diversity index rather than a direct measure of insect diversity.
Regardless of clustering resolution, our goal was to determine which lidar signal (unpolarized, co-polarized, or DoLP) results in more robust diversity estimates when comparing results within the same clustering approach. This analysis yielded conflicting results. GMM yielded fewer clusters for the unpolarized dataset than for DoLP (80 vs. 89), while HCA produced a significantly higher number for unpolarized data (803 vs. 256).
To investigate whether HCA’s higher cluster count in the unpolarized dataset truly indicates greater insect diversity, we analyzed the time/range dependence of clusters retrieved to evaluate the overall composition and distribution of the insect communities. Our hypothesis was that higher species specificity would show a stronger tendency for the identified clusters to be closely grouped together either spatially or temporally. However, our findings revealed that the HCA-derived clustering classification showed weaker primarily temporal correspondence of the retrieved clusters (Fig 6). This suggests that HCA’s additional clusters may not correspond to distinct insect species but rather to over-sensitivity to variation in power spectra.
Over-sensitivity to power spectra likely arises from the inherent differences in how HCA and GMM generalize power spectra patterns. HCA, being sensitive to variations in the relative powers of frequency peaks [74], may focus on differences between the powers of the fundamental frequency and its overtones. These differences can result from aspect angles of observations and could be accentuated for the fundamental peaks and a few harmonic overtones after averaging co-polarized and de-polarized signals.
In contrast to the other methods applied in this study, the GMM approach was applied not to the power spectra directly but to their UMAP-reduced representations. This transformation reduced the 81-dimensional power spectra into a three-dimensional representation, aimed at preserving the global structure and relationships between observations rather than focusing on specific frequencies and powers. Consequently, this transformation makes GMM clustering less prone to overfitting and reduces sensitivity to individual spectral components.
Despite observing different granularity at which datasets are partitioned depending on clustering approach, we argue that the total cluster count remains a valid proxy for diversity, provided that the same approach is consistently used in comparative studies and reliably scales the number of clusters with actual insect diversity. While automation is key to streamlining biodiversity estimates, ensuring that these estimates reflect true biodiversity remains crucial. Rydhmer et al. [9] demonstrated that clustering can effectively differentiate taxa by using caged taxa, reporting specificity stabilizing beyond 30 species. Rydhmer et al. [9] also found a 70% correlation between the number of clusters retrieved and Malaise trap diversity, estimated from trap catch classified to family. This study [9], demonstrates the potential of estimating insect diversity using photonic data, further validating our approach. However, replicating their extensive ground-truthing efforts requires sacrificing high numbers of insects, substantial financial resources and extensive classification work by taxonomic experts, and is not feasible for smaller research groups and outside of the scope of this study. Thus, while the approach employed has been demonstrated to correlate with species diversity [9], cautious interpretation of the diversity estimates and recognition that factors like sex, temperature, and observation angles may lead to subclusters even within species are warranted.
To further improve on the precision and understanding of clustering, we evaluate if applying polarimetric lidar, can improve clustering or if analyses of the polarimetric properties of clusters can reveal additional information. Our comparative analysis of clusters retrieved from unpolarized and DoLP datasets reveals that the unpolarized approach struggles to co-cluster observations with low DoLP values. However, its clusters exhibit significant DoLP differentiation from random ones within the glossiest (Q1) and most diffuse (Q4) DoLP quartiles (compare Fig 4 and S4 Fig). Moreover, incorporating polarimetric information only minimally rearranges observations (~10%) across non-adjacent DoLP quartiles (Fig 5). This suggests that unpolarized backscatter retains sufficient information on wing glossiness to effectively co-cluster the majority of DoLP-similar observations.
Furthermore, our comparison of results from co-polarized and DoLP datasets indicates that they yield similar diversity estimates. Also, both HCA and GMM produce DoLP-homogeneous clusters (Fig 4 and S4 and S5 Figs), with the strongest agreement observed within the top glossy clusters (Q1 group) (Fig 5). This suggests that most information on wing glossiness is derived from the harmonic content of co-polarized power spectra, while DoLP quantification remains valuable for identifying rare low-DoLP cases. Our findings highlight the potential for improved specificity when utilizing a polarimetric lidar signal, suggesting that information related to wing specularity can be extracted from harmonic modulation content, even without polarization. Future investigations might explore this further by evaluating the performance of dual-wavelength lidar systems, which could potentially offer more substantial gains in species differentiation.
While our study focused on a controlled setting, this lidar method is highly adaptable and can be applied in various habitats including meadows, agricultural fields, and forests verges or open forests given that there is a clear line-of-sight. This flexibility holds significant promise for the technique to radically improve the efficiency for estimating insect diversity. Transitioning to more heterogeneous environments will, however, introduce complexities. For instance, disentangling species-specific microhabitat preferences from potential instrumentation biases may prove more challenging. However, the robustness of the lidar method makes any such biases small relative to those inherent to traditional trapping approaches. Future research should focus on refining sensor configurations, replicating these methods across diverse sites and seasons, and potentially integrating ground-truthing with national and international programs that estimate biodiversity using traditional approaches.
Our findings underscore the interplay between instrument sensitivity to insect morphology and the chosen clustering methodology. We find that while polarimetric lidar provides additional information, much of the relevant information is also present in unpolarized data, suggesting a need to balance instrument complexity with research goals. Furthermore, our findings highlight the importance of understanding the biases inherent to different clustering algorithms, as these can significantly influence diversity estimates.
Supporting information
(TIF)
(TIF)
(TIF)
Comparison of HCA clustering results (black) with random clustering (gray).
(TIF)
Comparison of DoLP clustering results (black) with random clustering (gray).
(TIF)
(TIF)
(TIF)
Probability distributions for range (ABC) and time (I-II-III) communities. Heatmaps at the ABC and I-II-III intersection display median power spectra for each time-range community.
(TIF)
Probability distributions show the likelihood of observations within range communities (A, B, C) and DoLP quartiles (Q1-Q4), with heatmaps of corresponding power spectra. Note the probability spike in C-plots (red dot) co-occurred with the land piece left of the laser beam over the pond.
(TIF)
(DOCX)
Acknowledgments
The lidar instrumentation were in part, kindly provided by Norsk Elektro Optikk A/S, Norway. We thank Ebba von Wachenfeldt, Zachary Nolen, Emma Kärrnäs, Magne Friberg, Jadranka Rota and in particular, Jens Rydell for assistance in field work, may he rest in peace. We thank Rachel Muheim for receiving us at the Stensoffa ecological field station. We thank Zhicheng Xu and Jacobo Salvador for discussion and initial data analysis.
Data Availability
All relevant data can be found at https://github.com/Amelet/bernenko2024polarimetric.
Funding Statement
This research work was sponsored by the European Research Council (ERC), under the European Union’s Horizon 2020 research and innovation program (grant #850463, ‘Bug-Flash’). In additional the FORMAS, Swedish Research Council (2018-01061). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.van Klink R, Bowler D, Gongalsky K, Shen M, Swengel S, Chase J. Disproportionate declines of formerly abundant species underlie insect loss. Nature. 2023. Dec 20;628:1–6. doi: 10.1038/s41586-023-06861-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Hallmann CA, Sorg M, Jongejans E, Siepel H, Hofland N, Schwan H, et al. More than 75 percent decline over 27 years in total flying insect biomass in protected areas. PLOS ONE. 2017. Oct 18;12(10):e0185809. doi: 10.1371/journal.pone.0185809 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Goulson D. The insect apocalypse, and why it matters. Curr Biol. 2019. Oct 7;29(19):R967–71. doi: 10.1016/j.cub.2019.06.069 [DOI] [PubMed] [Google Scholar]
- 4.Møller AP. Parallel declines in abundance of insects and insectivorous birds in Denmark over 22 years. Ecol Evol. 2019;9(11):6581–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.van Klink R, Bowler DE, Gongalsky KB, Swengel AB, Gentile A, Chase JM. Meta-analysis reveals declines in terrestrial but increases in freshwater insect abundances. Science. 2020. Apr 24;368(6489):417–20. doi: 10.1126/science.aax9931 [DOI] [PubMed] [Google Scholar]
- 6.Moreno-Mateos D, Power ME, Comín FA, Yockteng R. Structural and Functional Loss in Restored Wetland Ecosystems. PLOS Biol. 2012. Jan 24;10(1):e1001247. doi: 10.1371/journal.pbio.1001247 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Potts SG, Biesmeijer JC, Kremen C, Neumann P, Schweiger O, Kunin WE. Global pollinator declines: trends, impacts and drivers. Trends Ecol Evol. 2010. Jun 1;25(6):345–53. doi: 10.1016/j.tree.2010.01.007 [DOI] [PubMed] [Google Scholar]
- 8.Brydegaard M, Svanberg S. Photonic Monitoring of Atmospheric and Aquatic Fauna. Laser Photonics Rev. 2018;12(12):1800135. [Google Scholar]
- 9.Rydhmer K, Jansson S, Still L, Beck BD, Chatzaki V, Olsen K, et al. Photonic sensors reflect variation in insect abundance and diversity across habitats. Ecol Indic. 2024;158(111483). [Google Scholar]
- 10.Saha T, Genoud AP, Williams GM, Thomas BP. Monitoring the abundance of flying insects and atmospheric conditions during a 9-month campaign using an entomological optical sensor. Sci Rep. 2023. Sep 20;13(1):15606. doi: 10.1038/s41598-023-42884-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kouakou BK, Jansson S, Brydegaard M, Zoueu JT. Entomological Scheimpflug lidar for estimating unique insect classes in-situ field test from Ivory Coast. OSA Contin. 2020;3(9):2362–71. [Google Scholar]
- 12.Brydegaard M, Jansson S, Malmqvist E, Mlacha YP, Gebru A, Okumu F, et al. Lidar reveals activity anomaly of malaria vectors during pan-African eclipse. Sci Adv. 2020. May 13;6(20):eaay5487. doi: 10.1126/sciadv.aay5487 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Jansson S, Malmqvist E, Mlacha Y, Ignell R, Okumu F, Killeen G, et al. Real-time dispersal of malaria vectors in rural Africa monitored with lidar. PLOS ONE. 2021. Mar 4;16(3):e0247803. doi: 10.1371/journal.pone.0247803 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Svenningsen CS, Frøslev TG, Bladt J, Pedersen LB, Larsen JC, Ejrnæs R, et al. Detecting flying insects using car nets and DNA metabarcoding. Biol Lett [Internet]. 2021. Mar [cited 2024 May 8];17(3). Available from: http://www.scopus.com/inward/record.url?scp=85103683350&partnerID=8YFLogxK doi: 10.1098/rsbl.2020.0833 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Diserud OH, Stur E, Aagaard K. How reliable are Malaise traps for biomonitoring?–A bivariate species abundance model evaluation using alpine Chironomidae (Diptera). Insect Conserv Divers. 2013;6(5):561–71. [Google Scholar]
- 16.Wührl L, Pylatiuk C, Giersch M, Lapp F, von Rintelen T, Balke M, et al. DiversityScanner: Robotic handling of small invertebrates with machine learning methods. Mol Ecol Resour. 2022;22(4):1626–38. doi: 10.1111/1755-0998.13567 [DOI] [PubMed] [Google Scholar]
- 17.Bjerge K, Geissmann Q, Alison J, Mann HMR, Høye TT, Dyrmann M, et al. Hierarchical classification of insects with multitask learning and anomaly detection. Ecol Inform. 2023. Nov 1;77:102278. [Google Scholar]
- 18.Jansson S, Malmqvist E, Brydegaard M, Åkesson S, Rydell J. A Scheimpflug lidar used to observe insect swarming at a wind turbine. Ecol Indic. 2020. Oct 1;117:106578. [Google Scholar]
- 19.Assoumou S. Doria Yamoa, Benoit K Kouakou, Adolphe Y. Gbogbo, Anna Runemark, Roel van Klink, Jeremie T. Zoueu, et al. Comparative lidar assessment of insect diversity at four Ivorian habitats. Unpublished. 2024. May; [Google Scholar]
- 20.Santos V, Costa-Vera C, Rivera-Parra P, Burneo S, Molina J, Encalada D, et al. Dual-Band Infrared Scheimpflug Lidar Reveals Insect Activity in a Tropical Cloud Forest. Appl Spectrosc. 2023. Jun 1;77(6):593–602. doi: 10.1177/00037028231169302 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Müller L, Li M, Månefjord H, Salvador J, Reistad N, Hernandez J, et al. Remote Nanoscopy with Infrared Elastic Hyperspectral Lidar. Adv Sci. 2023;10(15):2207110. doi: 10.1002/advs.202207110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Gebru A, Jansson S, Ignell R, Kirkeby C, Prangsma JC, Brydegaard M. Multiband modulation spectroscopy for the determination of sex and species of mosquitoes in flight. J Biophotonics. 2018;11(8):e201800014. doi: 10.1002/jbio.201800014 [DOI] [PubMed] [Google Scholar]
- 23.Genoud AP, Basistyy R, Williams GM, Thomas BP. Optical remote sensing for monitoring flying mosquitoes, gender identification and discussion on species identification. Appl Phys B. 2018. Feb 17;124(3):46. doi: 10.1007/s00340-018-6917-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Li Y, Han Z, Nessler R, Yi Z, Hemmer P, Brick R, et al. Optical multiband polarimetric modulation sensing for gender and species identification of flying native solitary pollinators. iScience. 2023. Nov 17;26(11):108265. doi: 10.1016/j.isci.2023.108265 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Basset Y, Cizek L, Cuénoud P, Didham RK, Guilhaumon F, Missa O, et al. Arthropod Diversity in a Tropical Forest. Science. 2012. Dec 14;338(6113):1481–4. doi: 10.1126/science.1226727 [DOI] [PubMed] [Google Scholar]
- 26.Potamitis I, Rigakis I. Measuring the fundamental frequency and the harmonic properties of the wingbeat of a large number of mosquitoes in flight using 2D optoacoustic sensors. Appl Acoust. 2016. Aug 1;109:54–60. [Google Scholar]
- 27.Bomphrey RJ, Nakata T, Phillips N, Walker SM. Smart wing rotation and trailing-edge vortices enable high frequency mosquito flight. Nature. 2017. Apr;544(7648):92–5. doi: 10.1038/nature21727 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Li M, Runemark A, Hernandez J, Rota J, Bygebjerg R, Brydegaard M. Discrimination of Hover Fly Species and Sexes by Wing Interference Signals. Adv Sci. 2023;10(34):2304657. doi: 10.1002/advs.202304657 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Li M, Seinsche C, Jansson S, Hernandez J, Rota J, Warrant E, et al. Potential for identification of wild night-flying moths by remote infrared microscopy. J R Soc Interface. 2022. Jun 22;19(191):20220256. doi: 10.1098/rsif.2022.0256 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Moore A, Miller RH. Automated Identification of Optically Sensed Aphid (Homoptera: Aphidae) Wingbeat Waveforms. Ann Entomol Soc Am. 2002. Jan 1;95(1):1–8. [Google Scholar]
- 31.Li M, Jansson S, Runemark A, Peterson J, Kirkeby CT, Jönsson AM, et al. Bark beetles as lidar targets and prospects of photonic surveillance. J Biophotonics. 2021;14(4):e202000420. doi: 10.1002/jbio.202000420 [DOI] [PubMed] [Google Scholar]
- 32.Wang M, Wang J, Liang P, Wu K. Nutritional Status, Sex, and Ambient Temperature Modulate the Wingbeat Frequency of the Diamondback Moth Plutella xylostella. Insects. 2024. Feb;15(2):138. doi: 10.3390/insects15020138 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Unwin DM, Corbet SA. Wingbeat frequency, temperature and body size in bees and flies. Physiol Entomol. 1984;9(1):115–21. [Google Scholar]
- 34.Saha T, Genoud AP, Park JH, Thomas BP. Temperature Dependency of Insect’s Wingbeat Frequencies: An Empirical Approach to Temperature Correction. Insects. 2024. May;15(5):342. doi: 10.3390/insects15050342 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Shevtsova E, Hansson C. Species recognition through wing interference patterns (WIPs) in Achrysocharoides Girault (Hymenoptera, Eulophidae) including two new species. ZooKeys. 2011. Dec 12;154:9–30. doi: 10.3897/zookeys.154.2158 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Shevtsova E, Hansson C, Janzen DH, Kjærandsen J. Stable structural color patterns displayed on transparent insect wings. Proc Natl Acad Sci. 2011. Jan 11;108(2):668–73. doi: 10.1073/pnas.1017393108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Jacques SL, Ramella-Roman JC, Lee K. Imaging skin pathology with polarized light. J Biomed Opt. 2002. Jul;7(3):329–40. doi: 10.1117/1.1484498 [DOI] [PubMed] [Google Scholar]
- 38.Jansson S, Atkinson P, Ignell R, Brydegaard M. First Polarimetric Investigation of Malaria Mosquitoes as Lidar Targets. IEEE J Sel Top Quantum Electron. 2019. Jan;25(1):1–8. [Google Scholar]
- 39.Li M, Runemark A, Guilcher N, Hernandez J, Rota J, Brydegaard M. Feasibility of Insect Identification Based on Spectral Fringes Produced by Clear Wings. IEEE J Sel Top Quantum Electron. 2023. Jul;29(4: Biophotonics):1–8. [Google Scholar]
- 40.Genoud AP, Gao Y, Williams GM, Thomas BP. Identification of gravid mosquitoes from changes in spectral and polarimetric backscatter cross sections. J Biophotonics. 2019. Oct;12(10):e201900123. doi: 10.1002/jbio.201900123 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Tuva [Internet]. [cited 2024 May 30]. Available from: https://etjanst.sjv.se/tuvaut/ [Google Scholar]
- 42.Zhu S, Malmqvist E, Li W, Jansson S, Li Y, Duan Z, et al. Insect abundance over Chinese rice fields in relation to environmental parameters, studied with a polarization-sensitive CW near-IR lidar system. Appl Phys B. 2017. Jul 10;123(7):211. [Google Scholar]
- 43.Zhao G, Malmqvist E, Török S, Bengtsson PE, Svanberg S, Bood J, et al. Particle profiling and classification by a dual-band continuous-wave lidar system. Appl Opt. 2018. Dec 10;57(35):10164–71. doi: 10.1364/AO.57.010164 [DOI] [PubMed] [Google Scholar]
- 44.Mei L, Guan P. Development of an atmospheric polarization Scheimpflug lidar system based on a time-division multiplexing scheme. Opt Lett. 2017. Sep 15;42(18):3562. doi: 10.1364/OL.42.003562 [DOI] [PubMed] [Google Scholar]
- 45.Nyquist H. Certain Topics in Telegraph Transmission Theory. Trans Am Inst Electr Eng. 1928. Apr;47(2):617–44. [Google Scholar]
- 46.Chen H, Li M, Månefjord H, Travers P, Salvador J, Müller L, et al. Lidar as a potential tool for monitoring migratory insects. iScience [Internet]. 2024. May 17 [cited 2024 Apr 27];27(5). Available from: https://www.cell.com/iscience/abstract/S2589-0042(24)00810-1 doi: 10.1016/j.isci.2024.109588 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Brydegaard M, Kouakou B, Jansson S, Rydell J, Zoueu J. High Dynamic Range in Entomological Scheimpflug Lidars. IEEE J Sel Top Quantum Electron. 2021. Jul;27(4):1–11. [Google Scholar]
- 48.Ward JH Jr.. Hierarchical Grouping to Optimize an Objective Function. J Am Stat Assoc. 1963. Mar 1;58(301):236–44. [Google Scholar]
- 49.Páez A, Boisjoly G. Exploratory Data Analysis. In: Discrete Choice Analysis with R [Internet]. Cham: Springer International Publishing; 2022. [cited 2024 May 6]. p. 25–64. (Use R!). Available from: https://link.springer.com/10.1007/978-3-031-20719-8_2 [Google Scholar]
- 50.McInnes L, Healy J, Melville J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction [Internet]. arXiv; 2020. [cited 2024 Apr 19]. Available from: http://arxiv.org/abs/1802.03426 [Google Scholar]
- 51.Meehan Connor, Ebrahimian Jonathan, Moore Wayne, and Meehan Stephen. Uniform Manifold Approximation and Projection (UMAP)—File Exchange—MATLAB Central [Internet]. 2022. [cited 2024 Apr 19]. Available from: https://se.mathworks.com/matlabcentral/fileexchange/71902-uniform-manifold-approximation-and-projection-umap [Google Scholar]
- 52.Reynolds DA, others. Gaussian mixture models. Encycl Biom. 2009;741(659–663). [Google Scholar]
- 53.Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011;12:2825–30. [Google Scholar]
- 54.Vinh NX, Epps J, Bailey J. Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance. [Google Scholar]
- 55.Rosenberg A, Hirschberg J. V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. In: Eisner J, editor. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) [Internet]. Prague, Czech Republic: Association for Computational Linguistics; 2007. [cited 2024 May 14]. p. 410–20. Available from: https://aclanthology.org/D07-1043 [Google Scholar]
- 56.Massey FJ Jr.. The Kolmogorov-Smirnov Test for Goodness of Fit. J Am Stat Assoc. 1951. Mar;46(253):68–78. [Google Scholar]
- 57.Newman MEJ. Modularity and community structure in networks. Proc Natl Acad Sci. 2006. Jun 6;103(23):8577–82. doi: 10.1073/pnas.0601602103 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Jeub Lucas G. S., Bazzi Marya, Jutla Inderjit S., and Mucha Peter J. GenLouvain/GenLouvain [Internet]. GenLouvain; 2024. [cited 2024 Apr 22]. Available from: https://github.com/GenLouvain/GenLouvain [Google Scholar]
- 59.Hill MO. Diversity and Evenness: A Unifying Notation and Its Consequences. Ecology. 1973;54(2):427–32. [Google Scholar]
- 60.Cazzolla Gatti R, Amoroso N, Monaco A. Estimating and comparing biodiversity with a single universal metric. Ecol Model. 2020. May 15;424:109020. [Google Scholar]
- 61.Morris EK, Caruso T, Buscot F, Fischer M, Hancock C, Maier TS, et al. Choosing and using diversity indices: insights for ecological applications from the German Biodiversity Exploratories. Ecol Evol. 2014;4(18):3514–24. doi: 10.1002/ece3.1155 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Efron B. Bootstrap Methods: Another Look at the Jackknife. Ann Stat. 1979. Jan;7(1):1–26. [Google Scholar]
- 63.Byrne DN, Buchmann SL, Spangler HG. Relationship Between Wing Loading, Wingbeat Frequency and Body Mass in Homopterous Insects. J Exp Biol. 1988. Mar 1;135(1):9–23. [Google Scholar]
- 64.Li M. Coherent Backscattering from Free-Flying Insects: Implications for Remote Species Identification [Doctoral Thesis (compilation)]. Lund; 2024. [Google Scholar]
- 65.Mauri M, Elli T, Caviglia G, Uboldi G, Azzi M. RAWGraphs: A Visualisation Platform to Create Open Outputs. In: Proceedings of the 12th Biannual Conference on Italian SIGCHI Chapter [Internet]. New York, NY, USA: Association for Computing Machinery; 2017. [cited 2022 Jun 1]. p. 1–5. (CHItaly ‘17). Available from: 10.1145/3125571.3125585 [DOI] [Google Scholar]
- 66.Vaudo AD, Patch HM, Mortensen DA, Grozinger CM, Tooker JF. Bumble bees exhibit daily behavioral patterns in pollen foraging. Arthropod-Plant Interact. 2014. Aug 1;8(4):273–83. [Google Scholar]
- 67.Aizen MA, Garibaldi LA, Cunningham SA, Klein AM. How much does agriculture depend on pollinators? Lessons from long-term trends in crop production. Ann Bot. 2009. Jun 1;103(9):1579–88. doi: 10.1093/aob/mcp076 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Basistyy R, Genoud A, Thomas B. Backscattering properties of topographic targets in the visible, shortwave infrared, and mid-infrared spectral ranges for hard-target lidars. Appl Opt. 2018. Aug 20;57(24):6990–7. doi: 10.1364/AO.57.006990 [DOI] [PubMed] [Google Scholar]
- 69.Lee D, Lee SH, Kim BJ, Kim H. Consistency landscape of network communities. Phys Rev E. 2021. May 11;103(5):052306. doi: 10.1103/PhysRevE.103.052306 [DOI] [PubMed] [Google Scholar]
- 70.Calatayud J, Bernardo-Madrid R, Neuman M, Rojas A, Rosvall M. Exploring the solution landscape enables more reliable network community detection. Phys Rev E. 2019. Nov 21;100(5):052308. doi: 10.1103/PhysRevE.100.052308 [DOI] [PubMed] [Google Scholar]
- 71.Montgomery GA, Belitz MW, Guralnick RP, Tingley MW. Standards and Best Practices for Monitoring and Benchmarking Insects. Front Ecol Evol [Internet]. 2021. Jan 15 [cited 2024 May 21];8. Available from: https://www.frontiersin.org/articles/10.3389/fevo.2020.579193 [Google Scholar]
- 72.van Klink R, Sheard JK, Høye TT, Roslin T, Do Nascimento LA, Bauer S. Towards a toolkit for global insect biodiversity monitoring. Philos Trans R Soc B Biol Sci. 2024. May 6;379(1904):20230101. doi: 10.1098/rstb.2023.0101 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Drake VA. Distinguishing target classes in observations from vertically pointing entomological radars. Int J Remote Sens. 2016. Aug 17;37(16):3811–35. [Google Scholar]
- 74.Xu Z. Insect Diversity Estimation in Entomological Lidar. 2022. [cited 2024 May 23]; Available from: http://lup.lub.lu.se/student-papers/record/9102879 [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
(TIF)
(TIF)
(TIF)
Comparison of HCA clustering results (black) with random clustering (gray).
(TIF)
Comparison of DoLP clustering results (black) with random clustering (gray).
(TIF)
(TIF)
(TIF)
Probability distributions for range (ABC) and time (I-II-III) communities. Heatmaps at the ABC and I-II-III intersection display median power spectra for each time-range community.
(TIF)
Probability distributions show the likelihood of observations within range communities (A, B, C) and DoLP quartiles (Q1-Q4), with heatmaps of corresponding power spectra. Note the probability spike in C-plots (red dot) co-occurred with the land piece left of the laser beam over the pond.
(TIF)
(DOCX)
Data Availability Statement
All relevant data can be found at https://github.com/Amelet/bernenko2024polarimetric.









