Leveraging synthetic data produced from museum specimens to train adaptable species classification models

Jarrett D Blair; Kamal Khidas; Katie E Marshall

doi:10.1371/journal.pone.0329482

. 2025 Sep 3;20(9):e0329482. doi: 10.1371/journal.pone.0329482

Leveraging synthetic data produced from museum specimens to train adaptable species classification models

Jarrett D Blair ^1,^2,^*, Kamal Khidas ^3,⁴, Katie E Marshall ¹

Editor: Gianniantonio Domina⁵

PMCID: PMC12407421 PMID: 40901908

Abstract

Computer vision has increasingly shown potential to improve data processing efficiency in ecological research. However, training computer vision models requires large amounts of high-quality, annotated training data. This poses a significant challenge for researchers looking to create bespoke computer vision models, as substantial human resources and biological replicates are often needed to adequately train these models. Synthetic images have been proposed as a potential solution for generating large training datasets, but models trained with synthetic images often have poor generalization to real photographs. Here we present a modular pipeline for training generalizable classification models using synthetic images. Our pipeline includes 3D asset creation with the use of 3D scanners, synthetic image generation with open-source computer graphic software, and domain adaptive classification model training. We demonstrate our pipeline by applying it to skulls of 16 mammal species in the order Carnivora. We explore several domain adaptation techniques, including maximum mean discrepancy (MMD) loss, fine-tuning, and data supplementation. Using our pipeline, we were able to improve classification accuracy on real photographs from 55.4% to a maximum of 95.1%. We also conducted qualitative analysis with t-distributed stochastic neighbor embedding (t-SNE) and gradient-weighted class activation mapping (Grad-CAM) to compare different domain adaptation techniques. Our results demonstrate the feasibility of using synthetic images for ecological computer vision and highlight the potential of museum specimens and 3D assets for scalable, generalizable model training.

Introduction

The field of ecology is transitioning into a ‘big data’ science [1–3]. This means that the volume and velocity of data required for modern ecological analytics is surpassing the capacity of individual researcher’s ability to collect and process [1, 4]. As such, manual data collection methods such as visual taxonomic classification have become a bottleneck in the ecological data acquisition pipeline [5]. To alleviate this bottleneck, significant effort has gone into automating the classification process through tools such as computer vision [6–8]. However, building effective computer vision tools for ecological research presents its own unique challenges. Among the most pervasive of these challenges is collecting high quality, annotated data to train the classification algorithms used by these tools [9, 10]. This challenge stems from two main underlying issues: a lack of human resources to collect and process data, and, in some cases, few biological replicates. Applications such as iNaturalist and Wildlife Insights have addressed the human resource issue through massive crowd-sourcing and volunteer campaigns to collect their images and annotations, but such campaigns present their own logistical challenges [11, 12]. Additionally, crowdsourced data often has a taxonomic bias towards charismatic and conspicuous fauna, and thus the issue of a lack of biological replicates for the majority of species persists [13].

Oversampling via image augmentation is a potential solution to both the human resource and biological replicate challenges of collecting high quality image datasets. Generally, oversampling refers to the process of generating additional training images from existing data, and image augmentation refers to the application of transformations (cropping, rotations, colour adjustments, etc.) to images [9, 14]. This can greatly increase the size of a training dataset without any additional specimen collection or annotation effort. While many forms of image augmentation are applied through post-processing raw images, other forms of image augmentation work by changing how the images are originally captured. One example is multiview augmentation, which works by capturing several images of a single specimen from multiple angles or postures [15–17]. While 2D images are limited to a single perspective, multiview augmentation works in three dimensions, which enhances the model’s feature learning and generalization better than standard augmentations made to raw 2D images. This is because the same specimen, when viewed from multiple perspectives (dorsal, ventral, side, anterior, etc.), can appear very differently on a two-dimensional plane. If a model never sees a given perspective during training, it might not be able to generalize its knowledge to that perspective when the model is deployed. However, multiview augmentation can often have the disadvantage of requiring greater manual sampling effort compared to other oversampling techniques as it requires multiple views of the same specimen [18, 19].

Another oversampling technique is the use of “synthetic images” such as rendered images of 3D assets. 3D assets can be generated from real specimens by using methods such as photogrammetry, or completely synthetically through artistic creation [20, 21]. Similar to multiview augmentation, 3D assets can be rendered from different perspectives, in different postures, or under different imaging conditions [19, 22]. However, unlike with manual multiview augmentation, this can all be done automatically and repeatedly once the 3D asset is created. Additionally, 3D assets allow for otherwise-destructive augmentations to be made (e.g., removing legs from a beetle mesh) without damaging the original specimen. Given that all these techniques (destructive, multiview, and other image augmentations) can all be applied in combination automatically, 3D assets have considerable potential to generate large amounts of synthetic images from a relatively small number of specimens.

The motivation for creating 3D assets from real specimens goes beyond their applicability in computer vision. To improve the accessibility of their specimens, museums have gone to great effort to digitise their natural history collections, primarily through the use of 2D imaging [23–26]. However, by definition, 2D images contain far less information than 3D assets [27–29]. Even when they are used for 1-dimensional morphometric measurements, 2D images are less robust than 3D assets due to parallax errors (i.e., errors due to changes in viewing angles) [21]. Therefore, by capturing anatomically-correct 3D representations of museum specimens, we can simultaneously improve museum collection digitization while also creating a valuable resource for computer vision model training.

Although synthetic images can be used to build large image datasets, most computer vision models in ecology are used to classify real photographs. This presents a significant challenge to using this type of workflow as a source of training data because synthetic images belong to a different image domain (i.e., image type) than images taken using standard photography [30–32]. While synthetic images produced from renders of 3D assets often appear photorealistic, computer vision models may still pick up on subtle differences, thus making model generalization across domains more difficult [33–36]. To date, most research on synthetic animal images in computer vision has focussed on pose estimation [22, 31, 37–40]. When used for classification, challenges in domain generalization have led to synthetic images being used as supplementation in training datasets rather than as a primary source [20]. Given that much of the purpose of exploring synthetic images for computer vision is to use them for model training, problems with domain generalization appear to be limiting their potential.

Domain adaptation techniques are one way to address the challenge of domain generalization [35, 41]. The goal of domain adaptation techniques is to enable a model trained on a source domain (where abundant labelled data is available) to perform well on a target domain (where labelled data is scarce or absent), despite differences in the data distribution between these domains. Domain adaptation techniques fall into two broad categories: data solutions, which aim to minimise the distribution gap between the source and target domains at the data level (e.g., make the domains appear more alike), and algorithmic solutions, which modify the learning algorithm to learn features that are generalizable across the domains. A relatively simple example of an algorithmic solution is maximum mean discrepancy loss (MMD loss). MMD is a statistical measure of the difference between two data distributions [42, 43]. When used in a loss function, the model learns the primary task of classification while also trying to minimise the MMD distance between the source and target domain distributions [44, 45]. When applied to models trained on synthetic images, domain adaptation techniques could help these models generalize to real photographs, thus improving their practicality [41].

Here we demonstrate a classification model training pipeline that predominantly uses synthetic images rendered from 3D assets to achieve predictive performance on photographs similar to a model trained entirely with real photographs, thus providing a potential solution to domain adaptation problems in synthetic biological data. Our pipeline includes three modules: 1) 3D asset creation using white-light 3D scanners, 2) rendering synthetic images in the computer graphics software “Blender”, and 3) classification model training using domain adaptation techniques. The pipeline is modular and can be used in combination with other synthetic image pipelines. As a case study for our pipeline, we used the skulls from 16 terrestrial Canadian carnivore species (Order: Carnivora). As a scalable, generalizable approach that addresses the challenge of domain adaptation when using synthetic images, the pipeline we describe here demonstrates the feasibility and advantages of employing 3D assets for computer vision model training.

Methods

In this study, we used skulls from 16 terrestrial/semi-aquatic Canadian carnivore species (Order: Carnivora), currently preserved at the Canadian Museum of Nature (Gatineau, Quebec; S1 Table, (S1 File)). Skulls served as an ideal case study for three reasons: (1) all species included in our study can be distinguished by external skull morphology, (2) the smooth surfaces of the skulls make them suitable for white-light 3D scanning, and (3) skulls are underrepresented among computer vision classification models [46]. Individual skulls were selected based on their development stage and completeness. Only adult skulls with >50% of their teeth intact and no excessive damage were included. Mandibles were not included, primarily due to their frequent damage or omission from the remainder of the specimen, as well as the fact that they would occlude other diagnostic features on the ventral side of the skull.

All data and code is available on GitHub [47]. No permits were required for the described study, which complied with all relevant regulations.

Data collection

Creating 3D assets.

We scanned the skulls in batches with a Creaform GoSCAN 20 handheld scanner at an original resolution of 0.7 mm. Batches were species-specific, and batch sizes were determined by the number of skulls that could fit on a 40 cm turntable with minimum separation of Â1/2 skull width (S1 Fig, S1 File). We scanned each batch twice (once dorsally and once ventrally) with the relative position of each skull remaining the same between scans. To keep the skulls in position for the ventral scans, we affixed them to the turntable using adhesive putty. For the dorsal scans, we simply rested the skulls in their natural position. In total, we scanned 30 skulls from each species. However, one scan of Vulpes vulpes (red fox, Linnaeus 1758) became corrupted and is excluded from the study. Across all the scanned specimens, there was a slight, unintentional male bias (193 males, 162 females, 124 no sex recorded). This is consistent with sex biases in mammalian natural history collections broadly [48].

After each scan was completed, we processed the 3D images using VXelements 9 [49]. First, we refined the resolution of each completed scan to 0.5 mm with the “Smart Resolution” feature of VXelements. Then we made a first pass cleaning of the image, which included removing objects such as the turntable and adhesive putty from the scan. After a pair of corresponding images (i.e., dorsal and ventral scan) had been cleaned, we used the VXelements “Merge scans” feature to merge them. We then completed a final round of cleaning in the VXmodel application of VXelements. This included filling holes in the skull (e.g., the foramen magnum) and removing artifacts of the scanning and merging process. Finally, we saved each skull image individually as a Wavefront OBJ mesh, a .mtl material file, and a bitmap texture file.

Image collection.

To create a synthetic image dataset, we rendered 2D images of our skull assets using the open-source 3D modelling software Blender [50]. For specific Blender environment settings, refer to [47]. In Blender, we created a standard environment for each skull to be rendered in one at a time (S2 Fig, S1 File). This ensured the rendering conditions (such as lighting and camera perspective) were consistent across specimens and species, and more closely matched the real photograph imaging conditions. After a skull asset was loaded into the Blender environment, we scaled it to a standardised length (measured anterior to posterior) and decimated it to a standardised polygon count. We did this to increase rendering efficiency and ensure higher uniformity across specimens and species. Images of each skull were rendered with the ‘Cycles’ rendering engine from 92 angles: 18 angles (one every 20^° around the yaw axis) at camera pitches of 0^°, ± 30^°, ± 60^°, and one render at ± 90^° (S3 Fig, S1 File). We automated skull loading and rendering with a Blender Python API script. Each render had a resolution of 720×720 pixels. In total, 479 3D skull assets were rendered to produce 44,068 synthetic images (Fig 1).

Fig 1 — (a) A rendered 3D skull image of a *Canis lupus* specimen. (b) A photograph of a different *C. lupus* specimen.

To create an image dataset of real photographs, we photographed an additional 10 specimens from each of the 16 species. As in the synthetic image dataset, there was an unintentional male bias among the specimens photographed (65 males, 48 females, 47 no sex recorded). None of these specimens had been included in the 3D asset dataset. As in the Blender environment, we created a standardised photography setup to minimise variation in the imaging conditions between specimens and species. However, due to the large size variation of the skulls (∼8 cm to ∼44 cm in length), the distance from the camera to the specimen varied depending on the species being photographed. We photographed the specimens one at a time on a black turntable in front of a black backdrop with a Laowa FFii 90mm f2.8 CA-Dreamer Macro 2X lens attached to a Nikon Z6 mirrorless camera. Each specimen was photographed from the same 92 angles as the synthetic images. Markers placed around a turntable and protractor attached to the camera tripod were used to assist with angle precision. However, due to human error in the photography process, only an average of 90.4 photos were taken per specimen. Most notably, Neogale vison (American mink, Schreber 1777) had no photographs taken from a camera pitch of 0^°.

Using the image processing software Fiji [51], we cropped the photos to a square that would allow the skull to make a full rotation while remaining in the frame. We also removed any duplicate photos. This resulted in a total of 14,467 images for 160 specimens.

Model training

Data split and processing.

To create training and testing datasets, we split the sets of synthetic and photograph images separately, each at a ratio of 80:20. Splits were made at the specimen level so that all images of a single individual were only in one dataset. Despite being the target domain, we created a photograph training dataset for two reasons. The first was because the MMD model (see "Model architecture and training procedure") required unlabelled target domain images during training. The second reason was to create a baseline for comparison when a model was only trained by use of photographs. Other than in the supplemented model, the synthetic images and photographs were kept separate. This resulted in a synthetic image dataset split of 24:6 individuals and 35,328:8740 images, and a photograph dataset split of 8:2 individuals and 11,561:2906 images. The supplemented model, fine-tuned model, and subset photograph model used a 25% subset of the photograph training data, resulting in a specimen split of 2:2:6 (training:testing:unused).

We scaled all images to 224 × 224 pixels for training and testing, as this is the resolution required by the VGG19 architecture [52]. We applied the following augmentations to datasets that were used for supervised training: vertical and horizontal translation, horizontal flips, and colour jitter to brightness, contrast, saturation, and hue. Testing dataset images had no augmentations.

Model architecture and training procedure.

We built all classification models with the VGG19 feature extractor architecture with preloaded ImageNet weights [52]. To this, we added a max pooling layer, flattened layer, two dense layers, and a softmax classification layer. All models used the Adam optimizer and were trained for 100 epochs with early stopping. For all models except the supplemented model and fine-tuned model, the training stopped if there were 15 consecutive epochs with no improvements to the training domain testing loss. For example, if the model was trained by use of photographs, the early stopping criteria would be based on the photograph testing dataset loss. The supplemented and fine-tuned models had access to labelled synthetic images and photographs, so the final epoch for these models were selected manually to optimise performance across both domains. All models were trained with the PyTorch and Pytorch-Adapt libraries in Python [47, 53–55].

To set baseline comparisons for how classification models performed in the absence of domain adaptation techniques, we trained three models. The first baseline model was trained by use of only the synthetic image training dataset (hereon simply referred to as ‘the baseline model’). This model set a lower limit of classification performance when models were tested on photographs. The ‘photograph baseline model’ was trained exclusively with the photograph training dataset to set an upper-limit comparison for model performance when tested on photographs. Finally, the ‘subset photograph model’ was used as a point of comparison to see how a model would perform when trained with the same number of labelled photographs as the fine-tuned and supplemented model. All three baseline models were optimised by cross-entropy loss.

To improve generalization from the synthetic images to the photographs, we tested three models that each used a different domain adaptation technique. The first domain adaptation model used MMD loss to align the feature space of the synthetic images and photographs [43]. During training, the model was fed with labelled synthetic images and unlabelled photographs from each domain’s respective training dataset. This implementation of MMD loss was based on [44]. The second domain adaptation model used a technique called "fine-tuning". This model was trained on the subset photograph dataset, but rather than beginning training with the ImageNet weights as in the other models, it began training from the final weights of the MMD model. Finally, we trained a ‘supplemented model’ which was trained with an image dataset that combined the synthetic image training dataset with the subset photograph training dataset (35,328 synthetic images and 2,907 photographs). Both the supplemented model and fine-tuned model were optimised using cross-entropy loss.

Feature visualization.

To visualise each model’s representation of the testing data’s feature space, we used t-SNE on the activations of the model’s post-convolution flattened layer [56]. To quantify the feature extractor’s clustering ability, we measured silhouette scores from each set of t-SNE embeddings. Silhouette scores measure how well-separated clusters are in a dataset, considering both the distance within clusters (i.e., cluster tightness) and the distance between clusters (i.e., cluster separation). Clusters were defined by the ground truth labels of the images. From these clusters, we calculated three silhouette scores: by use of only synthetic images, only photographs, and with both datasets combined. Domain confusion for each model was assessed visually from the t-SNE embeddings.

To visualise which features of the images the model was using to make classifications, we created gradient-weighted class activation maps (Grad-CAMs) with 100 randomly-selected images from the photograph testing dataset [57]. The Grad-CAMs were generated by activation of the predicted class in the final convolutional layer of each model. To quantify the types of features used by the model, we assigned each Grad-CAM a score based on the criteria in (S2 Table, S1 File). Grad-CAM scores were assigned manually and averaged for each model.

Results

Accuracy

When trained on exclusively synthetic images and tested on photographs, the baseline model recorded an accuracy of 55.4% (Table 1). All methods of domain adaptation produced improvements in photograph classification accuracy over the baseline model, with the supplemented model resulting in the highest classification accuracy at 95.1% (Fig 2). This was the only model to measure >90% classification accuracy on both the synthetic and photograph testing datasets (95.6% and 95.1%, respectively).

Table 1. Skull species classification accuracy for each model, as measured on the synthetic image and photograph testing datasets. The synthetic image dataset was composed of renders of 3D skull assets and the photograph dataset was composed of photographs taken from skulls directly. The highest accuracy score for each dataset is underlined and italicised. The “Epochs” column represents the number of epochs each model was trained for. The MMD + Fine-tuning model combines the number of epochs used for the MMD model with the number of subsequent fine-tuning epochs.

Model	Training dataset	Synthetic image accuracy	Photograph accuracy	Epochs
Baseline	Synthetic only	0.952	0.554	13
MMD	Synthetic + unlabelled photographs	0.949	0.654	50
MMD + Fine-tuning	MMD step: Synthetic + unlabelled photographs Fine-tuning step: Photograph subset	0.929	0.896	50 + 2
Supplemented	Synthetic + photograph subset	0.956	0.951	24
Photograph baseline	Photographs	0.283	0.992	40
Photograph subset	Photograph subset	0.213	0.853	25

Open in a new tab

Fig 2 — The rows are the species’ true classifications, while the columns represent the times the model made a classification as that species. Cells are shaded according to the proportion of the true labels classified as each species (i.e., shaded by row).

To measure how well the model would perform if it were trained exclusively on images from its own domain, we also measured accuracy on two photograph-trained models. Across all models, the photograph baseline model recorded the highest photograph classification accuracy at 99.2% (Table 1). When the photograph model was trained with the same number of labelled images as the domain-adapted models, its accuracy dropped to 85.3%, lower than both domain-adapted models that also used labelled photographs during training. When tested for domain generalization on the synthetic image test dataset, both the photograph baseline model and photograph subset model measured <30% classification accuracy.

Qualitative analysis

t-SNE visualisation.

The t-SNE clustering and silhouette scores showed that the feature extractors of models trained with synthetic images were better at clustering species into single, distinguishable clusters than the feature extractors of models trained only with photographs (Fig 3, Table 2). The supplemented model recorded the highest species cluster silhouette scores, regardless of whether the synthetic and photograph clusters were measured separately or together. The photograph baseline and subset models both produced poor silhouette scores for the photograph species clusters, despite their relatively high accuracy measurements when classifying photographs.

Fig 3 — Given that the absolute axis values of t-SNE plots did not contain meaningful information, they are not shown. Each t-SNE plot was generated from using the activations of the model’s post-convolution flattened layer. Blue ‘x’ points represent synthetic images, and the red dots represent photographs. All images were from the test dataset.

Table 2. t-SNE silhouette scores based on clusters formed from the t-SNE embeddings of each model, and labelled using the ground truth labels from each dataset. The t-SNE embeddings were calculated from each model’s activations of the final convolutional layer. The combined score measures silhouette score when the synthetic image and photograph testing datasets were combined. The highest silhouette score for each dataset is underlined and italicised.

Model	Synthetic images	Photographs	Combined
Baseline	0.447	0.113	0.297
MMD	0.451	0.116	0.330
MMD + Fine-tuning	0.353	0.235	0.325
Supplemented	0.462	0.286	0.378
Photograph baseline	–0.092	0.016	–0.130
Photograph subset	–0.157	–0.032	–0.152

Open in a new tab

Qualitatively, overlap between domains in the t-SNE plots appeared highest in the fine-tuned model (Fig 3c). In the fine-tuned model, all individuals from a given species occupied the same general feature space, regardless of the image’s domain. This indicated that the model was extracting features generalizable across domains, opposed to domain-specific features. The only exceptions to this were Lontra canadensis (North American river otter, Schreber 1777) and Canis lupus (grey wolf, Linnaeus 1758), which had slight shifts in feature space between domains. The MMD model had similarly high domain confusion (Fig 3b), with the key exception being the Lo. canadensis photographs, which clustered with Gulo gulo (wolverine, Linnaeus 1758) images rather than with Lo. canadensis synthetic images (S4 Fig, S1 File). In the supplemented model, images of the same species, but from different image domains, often formed adjacent clusters (S4 Fig, S1 File). The photograph baseline and photograph subset models showed relatively poor domain confusion (Fig 3e,f).

Grad-CAMs.

The fine-tuned model measured the highest average Grad-CAM score, with 96 images scoring a value of 3, and four images scoring a value of 2 (Fig 4). This indicated that the model more frequently used features on the skull rather than the background to make classifications when compared to the other models. The photograph baseline and photograph subset models had the lowest average Grad-CAM scores, as both models had 50 or more images with a score of 0.

Discussion

The pipeline

Domain generalization is a challenge for using 3D assets as a source of training images in classification models for biodiversity monitoring [20, 41]. In this study, we present a simple, effective, and modular pipeline to train domain-generalizable classification models primarily using synthetic images generated from 3D assets. Through the pipeline’s three components (3D scanning, image rendering, and model training), the pipeline explicitly addresses classification generalizability on two fronts. First, for synthetic image generation we created a Blender environment that automatically produced 92 multiview images of each skull. The Blender environment also allowed for a high degree of precision in image angles, thus ensuring maximum diversity in the angles each skull was imaged from. The limit to the number of images produced for each skull was also entirely self-imposed, and the Blender environment can be modified to effectively have no limit on the number of unique images produced for each skull. Second, by using domain adaptation training procedures such as MMD loss, we were able to produce models that were highly accurate when tested on both source domain and target domain images (synthetic images and photographs, respectively).

The simplicity and modularity of this pipeline allow it to potentially be applied to a wide range of taxa and objectives. The pipeline’s three primary components all function independently of each other, and thus can be substituted with alternative methods without affecting the other components. For example, the 3D assets produced via scanning could be subjected to more complicated rendering methods that allow the assets to be posed and rendered more realistically, such as in replicAnt [22], which uses a video game graphics engine to render 3D assets in realistic environments for use in computer vision models. Scanning can also be swapped out with other 3D asset creation methods such as photogrammetry, which have the advantage of potentially being more affordable than 3D scanners [21, 28, 29]. While the subjects of the classification models can obviously be substituted with other taxa, the models themselves are not rigid either and can be customized to fit various classification problems (changing the model architecture, using alternative domain adaptation techniques, etc.).

Domain adaptation

Here we show that domain adaptation methods such as MMD loss and fine-tuning can significantly improve classification performance on target domain images (Table 1). On its own, MMD loss improved classification performance over the baseline model, but still underperformed both models trained exclusively with real photographs. Accuracy of the supplemented and fine-tuned models surpassed the photograph subset model (+9.8% and +4.1% respectively), even though all three models were exposed to the same number of labelled photographs during training. This shows that in the absence of a large, labelled target domain dataset, training datasets primarily composed of synthetic images can yield high classification accuracy.

Even in cases where large, labelled target domain datasets are available, the inclusion of synthetic images during training might still warrant consideration. A known challenge in ecological applications of computer vision is that models can learn contextual clues in images (e.g., background features) to inform classification decisions [58]. In moderation this might be acceptable, as the inclusion of relevant contextual metadata (e.g., spatiotemporal data) were shown to improve model performance [59, 60]. However, if the contextual clues are not generalizable to new situations, such as the markings on the turntable in our photograph dataset, or if the contextual data becomes more important to the model than the morphological features of the specimen, this becomes problematic. As we have shown by testing our photograph models on synthetic images (Table 1), models that rely heavily on contextual features generalize poorly to new situations. An advantage of synthetic images is that all features of the images can be tightly controlled by the dataset’s creator. In the synthetic images used for this study, we ensured that all contextual features of the images were uniform across species, thus forcing the model to exclusively learn from the morphology of the specimens. This yielded a more generalizable baseline model (55.4% cross-domain accuracy vs. 28.3%), which was further enhanced using domain adaptation techniques such as MMD loss and fine-tuning. Here we have shown that by first training a model on a dataset with generalizable features (such as our synthetic image dataset), and then adapting that model to the real-life domain in which it will be deployed, a model can be encouraged to learn relevant, generalizable features while still maintaining high classification accuracy (Fig 4).

Conclusion

Collecting labelled images is an obstacle to building computer vision models for ecological applications due to the high quantity of images required and the human resources needed to collect and label said images. Leveraging the wealth of readily available biological samples in natural history collections to create 3D models for synthetic images generation is an interesting solution to this problem, but so far has faced its own challenge of domain adaptation. In this study, we report a simple approach to producing generalizable classification models trained primarily by use of 3D assets generated from museum specimens. Our work is a step towards unlocking the full potential of synthetic images and museum collections in the context of computer vision for ecology.

Supporting information

S1 File. Supporting figures and tables.

(PDF)

pone.0329482.s001.pdf^{(1.1MB, pdf)}

Acknowledgments

We thank the Canadian Museum of Nature for allowing access to their specimens, equipment, and facilities. We thank Greg Rand, Marie-Helene Hubert, and Alan McDonald for their assistance with the specimen collections and scanning equipment. We also thank Drs Leonid Sigal, Michelle Tseng, and Rachel Germain for their insightful discussions.

Data Availability

All data files are are available on Zenodo (https://doi.org/10.5281/zenodo.15038538). All code is available in a public GitHub repository (https://doi.org/10.5281/zenodo.15536489).

Funding Statement

JDB received a Mitacs Accelerate grant to fund this work. The grant does not have a specific grant number. Mitacs’ website can be found at https://www.mitacs.ca/. The funders did not play any role in the study design, data collection, decision to publish, or preparation of the manuscript.

References

1.Farley SS, Dawson A, Goring SJ, Williams JW. Situating ecology as a big-data science: Current advances, challenges, and solutions. BioScience. 2018;68(8):563–76. doi: 10.1093/biosci/biy068 [DOI] [Google Scholar]
2.Wüest RO, Zimmermann NE, Zurell D, Alexander JM, Fritz SA, Hof C, et al. Macroecology in the age of Big Data – Where to go from here?. J Biogeogr. 2019;47(1):1–12. doi: 10.1111/jbi.13633 [DOI] [Google Scholar]
3.Nathan R, Monk CT, Arlinghaus R, Adam T, Alós J, Assaf M, et al. Big-data approaches lead to an increased understanding of the ecology of animal movement. Science. 2022;375(6582):eabg1780. doi: 10.1126/science.abg1780 [DOI] [PubMed] [Google Scholar]
4.Yang C, Huang Q. Spatial cloud computing: A practical approach; CRC Press.
5.Karlsson D, Hartop E, Forshage M, Jaschhof M, Ronquist F. The Swedish Malaise Trap Project: A 15 year retrospective on a countrywide insect inventory. Biodivers Data J. 2020;8:e47255. doi: 10.3897/BDJ.8.e47255 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Weinstein BG. A computer vision for animal ecology. J Anim Ecol. 2018;87(3):533–45. doi: 10.1111/1365-2656.12780 [DOI] [PubMed] [Google Scholar]
7.Høye TT, Ärje J, Bjerge K, Hansen OLP, Iosifidis A, Leese F, et al. Deep learning and computer vision will transform entomology. Proc Natl Acad Sci U S A. 2021;118(2):e2002545117. doi: 10.1073/pnas.2002545117 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Tuia D, Kellenberger B, Beery S, Costelloe BR, Zuffi S, Risse B, et al. Perspectives in machine learning for wildlife conservation. Nat Commun. 2022;13(1):792. doi: 10.1038/s41467-022-27980-y [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Schneider S, Greenberg S, Taylor GW, Kremer SC. Three critical factors affecting automated image species recognition performance for camera traps. Ecol Evol. 2020;10(7):3503–17. doi: 10.1002/ece3.6147 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Blair JD, Gaynor KM, Palmer MS, Marshall KE. A gentle introduction to computer vision-based specimen classification in ecological datasets. J Anim Ecol. 2024;93(2):147–58. doi: 10.1111/1365-2656.14042 [DOI] [PubMed] [Google Scholar]
11.Van Horn G, Mac Aodha O, Song Y, Cui Y, Sun C, Shepard A. The iNaturalist species classification and detection dataset. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. 8769–78.
12.Ahumada JA, Fegraus E, Birch T, Flores N, Kays R, O’Brien TG, et al. Wildlife insights: A platform to maximize the potential of camera trap and other passive sensor wildlife data for the planet. Environ Conserv. 2019;47(1):1–6. doi: 10.1017/s0376892919000298 [DOI] [Google Scholar]
13.Blair J, Weiser MD, Kaspari M, Miller M, Siler C, Marshall KE. Robust and simplified machine learning identification of pitfall trap-collected ground beetles at the continental scale. Ecol Evol. 2020;10(23):13143–53. doi: 10.1002/ece3.6905 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. J Big Data. 2019;6(1). doi: 10.1186/s40537-019-0197-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Raitoharju J, Meissner K. On confidences and their use in (semi-)automatic multi-image taxa identification. In: 2019 IEEE symposium series on computational intelligence (SSCI); 2019. p. 1338–43. 10.1109/ssci44817.2019.9002975 [DOI]
16.Ärje J, Melvad C, Jeppesen MR, Madsen SA, Raitoharju J, Rasmussen MS, et al. Automatic image-based identification and biomass estimation of invertebrates. Methods Ecol Evol. 2020;11(8):922–31. doi: 10.1111/2041-210x.13428 [DOI] [Google Scholar]
17.Niri R, Gutierrez E, Douzi H, Lucas Y, Treuillet S, Castaneda B, et al. Multi-view data augmentation to improve wound segmentation on 3D surface model by deep learning. IEEE Access. 2021;9:157628–38. doi: 10.1109/access.2021.3130784 [DOI] [Google Scholar]
18.3D/VR in the academic library: Emerging practices and trends.
19.Irschick DJ, Christiansen F, Hammerschlag N, Martin J, Madsen PT, Wyneken J, et al. 3D visualization processes for recreating and studying organismal form. iScience. 2022;25(9):104867. doi: 10.1016/j.isci.2022.104867 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Beery S, Liu Y, Morris D, Piavis J, Kapoor A, Meister M, et al. Synthetic examples improve generalization for rare classes. In: Proceedings – 2020 IEEE winter conference on applications of computer vision, WACV; 2020.
21.Plum F, Labonte D. scAnt—An open-source platform for the creation of 3D models of arthropods (and other small objects). PeerJ. 2021;9:e11155. doi: 10.7717/peerj.11155 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Plum F, Bulla R, Beck HK, Imirzian N, Labonte D. replicAnt: A pipeline for generating annotated images of animals in complex environments using Unreal Engine. Nat Commun. 2023;14(1):7195. doi: 10.1038/s41467-023-42898-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Soltis PS. Digitization of herbaria enables novel research. Am J Bot. 2017;104(9):1281–4. doi: 10.3732/ajb.1700281 [DOI] [PubMed] [Google Scholar]
24.Tegelberg R, Kahanpaa J, Karppinen J, Mononen T, Wu Z, Saarenmaa H. Mass digitization of individual pinned insects using conveyor-driven imaging. In: Proceedings – 13th IEEE international conference on eScience; 2017.
25.Nelson G, Ellis S. The history and impact of digitization and digital data mobilization on biodiversity research. Philos Trans R Soc Lond B Biol Sci. 2018;374(1763):20170391. doi: 10.1098/rstb.2017.0391 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Nelson G, Paul DL. DiSSCo, iDigBio and the future of global collaboration. BISS. 2019;3. doi: 10.3897/biss.3.37896 [DOI] [Google Scholar]
27.Wheeler Q, Bourgoin T, Coddington J, Gostony T, Hamilton A, Larimer R, et al. Nomenclatural benchmarking: The roles of digital typification and telemicroscopy. Zookeys. 2012;(209):193–202. doi: 10.3897/zookeys.209.3486 [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Nguyen CV, Lovell DR, Adcock M, La Salle J. Capturing natural-colour 3D models of insects for species discovery and diagnostics. PLoS One. 2014;9(4):e94346. doi: 10.1371/journal.pone.0094346 [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Ströbel B, Schmelzle S, Blüthgen N, Heethoff M. An automated device for the digitization and 3D modelling of insects, combining extended-depth-of-field and all-side multi-view imaging. Zookeys. 2018;(759):1–27. doi: 10.3897/zookeys.759.24584 [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Prakash A, Boochoon S, Brophy M, Acuna D, Cameracci E, State G, et al. Structured domain randomization: Bridging the reality gap by context-aware synthetic data. In: 2019 international conference on robotics and automation (ICRA); 2019. 10.1109/icra.2019.8794443 [DOI]
31.Jiang L, Liu S, Bai X, Ostadabbas S. Prior-aware synthetic data to the rescue: Animal pose estimation with very limited real data. doi: 10.48550/ARXIV.2208.13944 [DOI] [Google Scholar]
32.Jiang L, Ostadabbas S. SPAC-Net: Synthetic pose-aware animal controlnet for enhanced pose estimation. doi: 10.48550/ARXIV.2305.17845 [DOI] [Google Scholar]
33.Wang M, Deng W. Deep visual domain adaptation: A survey. Neurocomputing. 2018;312:135–53. doi: 10.1016/j.neucom.2018.05.083 [DOI] [Google Scholar]
34.Larrazabal AJ, Nieto N, Peterson V, Milone DH, Ferrante E. Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis. Proc Natl Acad Sci U S A. 2020;117(23):12592–4. doi: 10.1073/pnas.1919012117 [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Farahani A, Voghoei S, Rasheed K, Arabnia HR. A brief review of domain adaptation. Transactions on Computational Science and Computational Intelligence. Springer International Publishing; 2021. p. 877–94. 10.1007/978-3-030-71704-9_65 [DOI]
36.Roberts M, Driggs D, Thorpe M, Gilbey J, Yeung M, Ursprung S, et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat Mach Intell. 2021;3(3):199–217. doi: 10.1038/s42256-021-00307-0 [DOI] [Google Scholar]
37.Cao J, Tang H, Fang HS, Shen X, Lu C, Tai YW. Cross-domain adaptation for animal pose estimation; 2019. p. 9498–507. https://openaccess.thecvf.com/content_ICCV_2019/html/Cao_Cross-Domain_Adaptation_for_Animal_Pose_Estimation_ICCV_2019_paper.html
38.Zuffi S, Kanazawa A, Berger-Wolf T, Black MJ. Three-D safari: Learning to estimate zebra pose, shape, and texture from images “In the Wild”; 2019. p. 5359–68. https://openaccess.thecvf.com/content_ICCV_2019/html/Zuffi_Three-D_Safari_Learning_to_Estimate_Zebra_Pose_Shape_and_Texture_ICCV_2019_paper.html
39.Mu J, Qiu W, Hager GD, Yuille AL. Learning from synthetic animals; 2020. p. 12386–95.
40.Li C, Lee GH. From synthetic to real: Unsupervised domain adaptation for animal pose estimation. p. 1482–91. Available from: https://openaccess.thecvf.com/content/CVPR2021/html/Li_From_Synthetic_to_Real_Unsupervised_Domain_Adaptation_for_Animal_Pose_CVPR_2021_paper.html
41.Peng X, Usman B, Saito K, Kaushik N, Hoffman J, Saenko K. Syn2Real: A new benchmark for synthetic-to-real visual domain adaptation. doi: 10.48550/ARXIV.1806.09755 [DOI] [Google Scholar]
42.Borgwardt KM, Gretton A, Rasch MJ, Kriegel H-P, Schölkopf B, Smola AJ. Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics. 2006;22(14):e49-57. doi: 10.1093/bioinformatics/btl242 [DOI] [PubMed] [Google Scholar]
43.Gretton A, Sejdinovic D, Strathmann H, Balakrishnan S, Pontil M, Fukumizu K. Optimal kernel choice for large-scale two-sample tests. 25.
44.Long M, Cao Y, Wang J, Jordan MI. Learning transferable features with deep adaptation networks. doi: 10.48550/ARXIV.1502.02791 [DOI] [PubMed] [Google Scholar]
45.Yan H, Ding Y, Li P, Wang Q, Xu Y, Zuo W. Mind the class weight bias: Weighted maximum mean discrepancy for unsupervised domain adaptation. Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 2272–81.
46.Henzler P, Mitra NJ, Ritschel T. Escaping Plato’s Cave: 3D shape from adversarial rendering. p. 9984–93. https://openaccess.thecvf.com/content_ICCV_2019/html/Henzler_Escaping_Platos_Cave_3D_Shape_From_Adversarial_Rendering_ICCV_2019_paper.html
47.Blair J. Skull-adapt. https://zenodo.org/doi/10.5281/zenodo.15536489
48.Cooper N, Bond AL, Davis JL, Portela Miguez R, Tomsett L, Helgen KM. Sex biases in bird and mammal natural history collections. Proc Biol Sci. 2019;286(1913):20192025. doi: 10.1098/rspb.2019.2025 [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Creaform. VXelements.
50.Blender F. Blender. www.blender.org
51.Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, et al. Fiji: An open-source platform for biological-image analysis. Nat Methods. 2012;9(7):676–82. doi: 10.1038/nmeth.2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. doi: 10.48550/ARXIV.1409.1556 [DOI] [Google Scholar]
53.Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: An imperative style, high-performance deep learning library. In: Wallach H, Larochelle H, Beygelzimer A, Alché-Buc Fd, Fox E, Garnett R, editors. Advances in neural information processing systems. vol. 32. Curran Associates, Inc. Available from: https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf
54.Musgrave K, Belongie S, Lim SN. PyTorch adapt. doi: 10.48550/ARXIV.2211.15673 [DOI] [Google Scholar]
55.Foundation PS. Python language reference. www.python.org
56.Van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9(11):2579–605. [Google Scholar]
57.Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. Int J Comput Vis. 2019;128(2):336–59. doi: 10.1007/s11263-019-01228-7 [DOI] [Google Scholar]
58.Beery S, Van Horn G, Perona P. Recognition in Terra incognita. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). Berlin, Heidelberg: Springer; 2018.
59.Terry JCD, Roy HE, August TA. Thinking like a naturalist: Enhancing computer vision of citizen science images by harnessing contextual data. Methods Ecol Evol. 2020;11(2):303–15. doi: 10.1111/2041-210x.13335 [DOI] [Google Scholar]
60.Blair J, Weiser MD, de Beurs K, Kaspari M, Siler C, Marshall KE. Embracing imperfection: Machine-assisted invertebrate classification in real-world datasets. Ecol Inform. 2022;72:101896. doi: 10.1016/j.ecoinf.2022.101896 [DOI] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0329482.r001

Decision Letter 0

Gianniantonio Domina

2 May 2025

PONE-D-25-15313Leveraging synthetic data produced from museum specimens to train adaptable species classification modelsPLOS ONE

Dear Dr. Blair,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jun 16 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Gianniantonio Domina, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please update your submission to use the PLOS LaTeX template. The template and more information on our requirements for LaTeX submissions can be found at http://journals.plos.org/plosone/s/latex.

3. In your manuscript, please provide additional information regarding the specimens used in your study. Ensure that you have reported human remain specimen numbers and complete repository information, including museum name and geographic location.

If permits were required, please ensure that you have provided details for all permits that were obtained, including the full name of the issuing authority, and add the following statement:

'All necessary permits were obtained for the described study, which complied with all relevant regulations.'

If no permits were required, please include the following statement:

'No permits were required for the described study, which complied with all relevant regulations.'

For more information on PLOS ONE's requirements for paleontology and archeology research, see https://journals.plos.org/plosone/s/submission-guidelines#loc-paleontology-and-archaeology-research.

4. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

5. Thank you for stating the following in the Acknowledgments Section of your manuscript:

“This work was supported by a Mitacs Elevate grant to J.D.B, as well as contributions from the Canadian Museum of Nature Foundation. We thank the Canadian Museum of Nature for allowing access to their specimens, equipment, and facilities. We thank Greg Rand, Marie Helene Hubert, and Alan McDonald for their assistance with the specimen collections and scanning equipment. We also thank Drs Leonid Sigal, Michelle Tseng, and Rachel Germain for their insightful discussions.”

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

“JDB received a Mitacs Accelerate grant to fund this work. The grant does not have a specific grant number. Mitacs' website can be found at https://www.mitacs.ca/. The funders did not play any role in the study design, data collection, decision to publish, or preparation of the manuscript.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

6. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

7. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

This MS deserves to be published on PLOS One. It is well written in a scientific form. The authors have to follow the suggestions of the reviewer to prepare your imporved MS. In addition do not have to forget to add the authors of species names the first time that them are cited.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Dear authors, I find the manuscript well written. It seems to me that the strategy you propose is solid and could be applied to several domains other than the example you provide.

I have one major concern only. In line 201, you state that you scaled the images to 224x224 pixels for both training and testing. It seems to me that this low resolution could bias the performance of the model, since it could lead to loss of details in the images. Can you please explain why you decided to operate at this resolution, and why? Have you compared other resolutions, in order to decide the one with the best cost/effectiveness ratio? Please, provide some metrics.

Best regards

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Stefano Martellos

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2025 Sep 3;20(9):e0329482. doi: 10.1371/journal.pone.0329482.r002

Author response to Decision Letter 1

23 Jun 2025

Please see the uploaded response to reviewers document.

Attachment

Submitted filename: Plos Response.pdf

pone.0329482.s002.pdf^{(60.9KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0329482.r003

Decision Letter 1

Gianniantonio Domina

17 Jul 2025

Leveraging synthetic data produced from museum specimens to train adaptable species classification models

PONE-D-25-15313R1

Dear Dr. Blair,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Gianniantonio Domina, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

PLoS One. doi: 10.1371/journal.pone.0329482.r004

Acceptance letter

Gianniantonio Domina

PONE-D-25-15313R1

PLOS ONE

Dear Dr. Blair,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Prof. Gianniantonio Domina

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 File. Supporting figures and tables.

(PDF)

pone.0329482.s001.pdf^{(1.1MB, pdf)}

Attachment

Submitted filename: Plos Response.pdf

pone.0329482.s002.pdf^{(60.9KB, pdf)}

Data Availability Statement

All data files are are available on Zenodo (https://doi.org/10.5281/zenodo.15038538). All code is available in a public GitHub repository (https://doi.org/10.5281/zenodo.15536489).

[pone.0329482.ref001] 1.Farley SS, Dawson A, Goring SJ, Williams JW. Situating ecology as a big-data science: Current advances, challenges, and solutions. BioScience. 2018;68(8):563–76. doi: 10.1093/biosci/biy068 [DOI] [Google Scholar]

[pone.0329482.ref002] 2.Wüest RO, Zimmermann NE, Zurell D, Alexander JM, Fritz SA, Hof C, et al. Macroecology in the age of Big Data – Where to go from here?. J Biogeogr. 2019;47(1):1–12. doi: 10.1111/jbi.13633 [DOI] [Google Scholar]

[pone.0329482.ref003] 3.Nathan R, Monk CT, Arlinghaus R, Adam T, Alós J, Assaf M, et al. Big-data approaches lead to an increased understanding of the ecology of animal movement. Science. 2022;375(6582):eabg1780. doi: 10.1126/science.abg1780 [DOI] [PubMed] [Google Scholar]

[pone.0329482.ref004] 4.Yang C, Huang Q. Spatial cloud computing: A practical approach; CRC Press.

[pone.0329482.ref005] 5.Karlsson D, Hartop E, Forshage M, Jaschhof M, Ronquist F. The Swedish Malaise Trap Project: A 15 year retrospective on a countrywide insect inventory. Biodivers Data J. 2020;8:e47255. doi: 10.3897/BDJ.8.e47255 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0329482.ref006] 6.Weinstein BG. A computer vision for animal ecology. J Anim Ecol. 2018;87(3):533–45. doi: 10.1111/1365-2656.12780 [DOI] [PubMed] [Google Scholar]

[pone.0329482.ref007] 7.Høye TT, Ärje J, Bjerge K, Hansen OLP, Iosifidis A, Leese F, et al. Deep learning and computer vision will transform entomology. Proc Natl Acad Sci U S A. 2021;118(2):e2002545117. doi: 10.1073/pnas.2002545117 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0329482.ref008] 8.Tuia D, Kellenberger B, Beery S, Costelloe BR, Zuffi S, Risse B, et al. Perspectives in machine learning for wildlife conservation. Nat Commun. 2022;13(1):792. doi: 10.1038/s41467-022-27980-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0329482.ref009] 9.Schneider S, Greenberg S, Taylor GW, Kremer SC. Three critical factors affecting automated image species recognition performance for camera traps. Ecol Evol. 2020;10(7):3503–17. doi: 10.1002/ece3.6147 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0329482.ref010] 10.Blair JD, Gaynor KM, Palmer MS, Marshall KE. A gentle introduction to computer vision-based specimen classification in ecological datasets. J Anim Ecol. 2024;93(2):147–58. doi: 10.1111/1365-2656.14042 [DOI] [PubMed] [Google Scholar]

[pone.0329482.ref011] 11.Van Horn G, Mac Aodha O, Song Y, Cui Y, Sun C, Shepard A. The iNaturalist species classification and detection dataset. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. 8769–78.

[pone.0329482.ref012] 12.Ahumada JA, Fegraus E, Birch T, Flores N, Kays R, O’Brien TG, et al. Wildlife insights: A platform to maximize the potential of camera trap and other passive sensor wildlife data for the planet. Environ Conserv. 2019;47(1):1–6. doi: 10.1017/s0376892919000298 [DOI] [Google Scholar]

[pone.0329482.ref013] 13.Blair J, Weiser MD, Kaspari M, Miller M, Siler C, Marshall KE. Robust and simplified machine learning identification of pitfall trap-collected ground beetles at the continental scale. Ecol Evol. 2020;10(23):13143–53. doi: 10.1002/ece3.6905 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0329482.ref014] 14.Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. J Big Data. 2019;6(1). doi: 10.1186/s40537-019-0197-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0329482.ref015] 15.Raitoharju J, Meissner K. On confidences and their use in (semi-)automatic multi-image taxa identification. In: 2019 IEEE symposium series on computational intelligence (SSCI); 2019. p. 1338–43. 10.1109/ssci44817.2019.9002975 [DOI]

[pone.0329482.ref016] 16.Ärje J, Melvad C, Jeppesen MR, Madsen SA, Raitoharju J, Rasmussen MS, et al. Automatic image-based identification and biomass estimation of invertebrates. Methods Ecol Evol. 2020;11(8):922–31. doi: 10.1111/2041-210x.13428 [DOI] [Google Scholar]

[pone.0329482.ref017] 17.Niri R, Gutierrez E, Douzi H, Lucas Y, Treuillet S, Castaneda B, et al. Multi-view data augmentation to improve wound segmentation on 3D surface model by deep learning. IEEE Access. 2021;9:157628–38. doi: 10.1109/access.2021.3130784 [DOI] [Google Scholar]

[pone.0329482.ref018] 18.3D/VR in the academic library: Emerging practices and trends.

[pone.0329482.ref019] 19.Irschick DJ, Christiansen F, Hammerschlag N, Martin J, Madsen PT, Wyneken J, et al. 3D visualization processes for recreating and studying organismal form. iScience. 2022;25(9):104867. doi: 10.1016/j.isci.2022.104867 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0329482.ref020] 20.Beery S, Liu Y, Morris D, Piavis J, Kapoor A, Meister M, et al. Synthetic examples improve generalization for rare classes. In: Proceedings – 2020 IEEE winter conference on applications of computer vision, WACV; 2020.

[pone.0329482.ref021] 21.Plum F, Labonte D. scAnt—An open-source platform for the creation of 3D models of arthropods (and other small objects). PeerJ. 2021;9:e11155. doi: 10.7717/peerj.11155 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0329482.ref022] 22.Plum F, Bulla R, Beck HK, Imirzian N, Labonte D. replicAnt: A pipeline for generating annotated images of animals in complex environments using Unreal Engine. Nat Commun. 2023;14(1):7195. doi: 10.1038/s41467-023-42898-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0329482.ref023] 23.Soltis PS. Digitization of herbaria enables novel research. Am J Bot. 2017;104(9):1281–4. doi: 10.3732/ajb.1700281 [DOI] [PubMed] [Google Scholar]

[pone.0329482.ref024] 24.Tegelberg R, Kahanpaa J, Karppinen J, Mononen T, Wu Z, Saarenmaa H. Mass digitization of individual pinned insects using conveyor-driven imaging. In: Proceedings – 13th IEEE international conference on eScience; 2017.

[pone.0329482.ref025] 25.Nelson G, Ellis S. The history and impact of digitization and digital data mobilization on biodiversity research. Philos Trans R Soc Lond B Biol Sci. 2018;374(1763):20170391. doi: 10.1098/rstb.2017.0391 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0329482.ref026] 26.Nelson G, Paul DL. DiSSCo, iDigBio and the future of global collaboration. BISS. 2019;3. doi: 10.3897/biss.3.37896 [DOI] [Google Scholar]

[pone.0329482.ref027] 27.Wheeler Q, Bourgoin T, Coddington J, Gostony T, Hamilton A, Larimer R, et al. Nomenclatural benchmarking: The roles of digital typification and telemicroscopy. Zookeys. 2012;(209):193–202. doi: 10.3897/zookeys.209.3486 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0329482.ref028] 28.Nguyen CV, Lovell DR, Adcock M, La Salle J. Capturing natural-colour 3D models of insects for species discovery and diagnostics. PLoS One. 2014;9(4):e94346. doi: 10.1371/journal.pone.0094346 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0329482.ref029] 29.Ströbel B, Schmelzle S, Blüthgen N, Heethoff M. An automated device for the digitization and 3D modelling of insects, combining extended-depth-of-field and all-side multi-view imaging. Zookeys. 2018;(759):1–27. doi: 10.3897/zookeys.759.24584 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0329482.ref030] 30.Prakash A, Boochoon S, Brophy M, Acuna D, Cameracci E, State G, et al. Structured domain randomization: Bridging the reality gap by context-aware synthetic data. In: 2019 international conference on robotics and automation (ICRA); 2019. 10.1109/icra.2019.8794443 [DOI]

[pone.0329482.ref031] 31.Jiang L, Liu S, Bai X, Ostadabbas S. Prior-aware synthetic data to the rescue: Animal pose estimation with very limited real data. doi: 10.48550/ARXIV.2208.13944 [DOI] [Google Scholar]

[pone.0329482.ref032] 32.Jiang L, Ostadabbas S. SPAC-Net: Synthetic pose-aware animal controlnet for enhanced pose estimation. doi: 10.48550/ARXIV.2305.17845 [DOI] [Google Scholar]

[pone.0329482.ref033] 33.Wang M, Deng W. Deep visual domain adaptation: A survey. Neurocomputing. 2018;312:135–53. doi: 10.1016/j.neucom.2018.05.083 [DOI] [Google Scholar]

[pone.0329482.ref034] 34.Larrazabal AJ, Nieto N, Peterson V, Milone DH, Ferrante E. Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis. Proc Natl Acad Sci U S A. 2020;117(23):12592–4. doi: 10.1073/pnas.1919012117 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0329482.ref035] 35.Farahani A, Voghoei S, Rasheed K, Arabnia HR. A brief review of domain adaptation. Transactions on Computational Science and Computational Intelligence. Springer International Publishing; 2021. p. 877–94. 10.1007/978-3-030-71704-9_65 [DOI]

[pone.0329482.ref036] 36.Roberts M, Driggs D, Thorpe M, Gilbey J, Yeung M, Ursprung S, et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat Mach Intell. 2021;3(3):199–217. doi: 10.1038/s42256-021-00307-0 [DOI] [Google Scholar]

[pone.0329482.ref037] 37.Cao J, Tang H, Fang HS, Shen X, Lu C, Tai YW. Cross-domain adaptation for animal pose estimation; 2019. p. 9498–507. https://openaccess.thecvf.com/content_ICCV_2019/html/Cao_Cross-Domain_Adaptation_for_Animal_Pose_Estimation_ICCV_2019_paper.html

[pone.0329482.ref038] 38.Zuffi S, Kanazawa A, Berger-Wolf T, Black MJ. Three-D safari: Learning to estimate zebra pose, shape, and texture from images “In the Wild”; 2019. p. 5359–68. https://openaccess.thecvf.com/content_ICCV_2019/html/Zuffi_Three-D_Safari_Learning_to_Estimate_Zebra_Pose_Shape_and_Texture_ICCV_2019_paper.html

[pone.0329482.ref039] 39.Mu J, Qiu W, Hager GD, Yuille AL. Learning from synthetic animals; 2020. p. 12386–95.

[pone.0329482.ref040] 40.Li C, Lee GH. From synthetic to real: Unsupervised domain adaptation for animal pose estimation. p. 1482–91. Available from: https://openaccess.thecvf.com/content/CVPR2021/html/Li_From_Synthetic_to_Real_Unsupervised_Domain_Adaptation_for_Animal_Pose_CVPR_2021_paper.html

[pone.0329482.ref041] 41.Peng X, Usman B, Saito K, Kaushik N, Hoffman J, Saenko K. Syn2Real: A new benchmark for synthetic-to-real visual domain adaptation. doi: 10.48550/ARXIV.1806.09755 [DOI] [Google Scholar]

[pone.0329482.ref042] 42.Borgwardt KM, Gretton A, Rasch MJ, Kriegel H-P, Schölkopf B, Smola AJ. Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics. 2006;22(14):e49-57. doi: 10.1093/bioinformatics/btl242 [DOI] [PubMed] [Google Scholar]

[pone.0329482.ref043] 43.Gretton A, Sejdinovic D, Strathmann H, Balakrishnan S, Pontil M, Fukumizu K. Optimal kernel choice for large-scale two-sample tests. 25.

[pone.0329482.ref044] 44.Long M, Cao Y, Wang J, Jordan MI. Learning transferable features with deep adaptation networks. doi: 10.48550/ARXIV.1502.02791 [DOI] [PubMed] [Google Scholar]

[pone.0329482.ref045] 45.Yan H, Ding Y, Li P, Wang Q, Xu Y, Zuo W. Mind the class weight bias: Weighted maximum mean discrepancy for unsupervised domain adaptation. Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 2272–81.

[pone.0329482.ref046] 46.Henzler P, Mitra NJ, Ritschel T. Escaping Plato’s Cave: 3D shape from adversarial rendering. p. 9984–93. https://openaccess.thecvf.com/content_ICCV_2019/html/Henzler_Escaping_Platos_Cave_3D_Shape_From_Adversarial_Rendering_ICCV_2019_paper.html

[pone.0329482.ref047] 47.Blair J. Skull-adapt. https://zenodo.org/doi/10.5281/zenodo.15536489

[pone.0329482.ref048] 48.Cooper N, Bond AL, Davis JL, Portela Miguez R, Tomsett L, Helgen KM. Sex biases in bird and mammal natural history collections. Proc Biol Sci. 2019;286(1913):20192025. doi: 10.1098/rspb.2019.2025 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0329482.ref049] 49.Creaform. VXelements.

[pone.0329482.ref050] 50.Blender F. Blender. www.blender.org

[pone.0329482.ref051] 51.Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, et al. Fiji: An open-source platform for biological-image analysis. Nat Methods. 2012;9(7):676–82. doi: 10.1038/nmeth.2019 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0329482.ref052] 52.Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. doi: 10.48550/ARXIV.1409.1556 [DOI] [Google Scholar]

[pone.0329482.ref053] 53.Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: An imperative style, high-performance deep learning library. In: Wallach H, Larochelle H, Beygelzimer A, Alché-Buc Fd, Fox E, Garnett R, editors. Advances in neural information processing systems. vol. 32. Curran Associates, Inc. Available from: https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf

[pone.0329482.ref054] 54.Musgrave K, Belongie S, Lim SN. PyTorch adapt. doi: 10.48550/ARXIV.2211.15673 [DOI] [Google Scholar]

[pone.0329482.ref055] 55.Foundation PS. Python language reference. www.python.org

[pone.0329482.ref056] 56.Van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9(11):2579–605. [Google Scholar]

[pone.0329482.ref057] 57.Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. Int J Comput Vis. 2019;128(2):336–59. doi: 10.1007/s11263-019-01228-7 [DOI] [Google Scholar]

[pone.0329482.ref058] 58.Beery S, Van Horn G, Perona P. Recognition in Terra incognita. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). Berlin, Heidelberg: Springer; 2018.

[pone.0329482.ref059] 59.Terry JCD, Roy HE, August TA. Thinking like a naturalist: Enhancing computer vision of citizen science images by harnessing contextual data. Methods Ecol Evol. 2020;11(2):303–15. doi: 10.1111/2041-210x.13335 [DOI] [Google Scholar]

[pone.0329482.ref060] 60.Blair J, Weiser MD, de Beurs K, Kaspari M, Siler C, Marshall KE. Embracing imperfection: Machine-assisted invertebrate classification in real-world datasets. Ecol Inform. 2022;72:101896. doi: 10.1016/j.ecoinf.2022.101896 [DOI] [Google Scholar]

PERMALINK

Leveraging synthetic data produced from museum specimens to train adaptable species classification models

Jarrett D Blair

Kamal Khidas

Katie E Marshall

Roles

Abstract

Introduction

Methods

Data collection

Creating 3D assets.

Image collection.

Fig 1. Example of a rendered 3D skull image and specimen photograph.

Model training

Data split and processing.

Model architecture and training procedure.

Feature visualization.

Results

Accuracy

Fig 2. Confusion matrices for the (a) baseline (synthetic only) and (b) supplemented (synthetic and photograph subset) models’ classifications of the Carnivora skulls housed at the Canadian Museum of Nature.

Qualitative analysis

t-SNE visualisation.

Fig 3. Visualisation of the feature space of six skull classification models using t-SNE.

Grad-CAMs.

Fig 4. Grad-CAM scores for each model.

Discussion

The pipeline

Domain adaptation

Conclusion

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Gianniantonio Domina

Roles

Author response to Decision Letter 1

Decision Letter 1

Gianniantonio Domina

Roles

Acceptance letter

Gianniantonio Domina

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases