Skip to main content
PeerJ logoLink to PeerJ
. 2023 Oct 9;11:e16216. doi: 10.7717/peerj.16216

Machine learning algorithms accurately identify free-living marine nematode species

Simone Brito de Jesus 1,, Danilo Vieira 1, Paula Gheller 2, Beatriz P Cunha 3, Fabiane Gallucci 1, Gustavo Fonseca 1
Editor: Khor Waiho
PMCID: PMC10569207  PMID: 37842061

Abstract

Background

Identifying species, particularly small metazoans, remains a daunting challenge and the phylum Nematoda is no exception. Typically, nematode species are differentiated based on morphometry and the presence or absence of certain characters. However, recent advances in artificial intelligence, particularly machine learning (ML) algorithms, offer promising solutions for automating species identification, mostly in taxonomically complex groups. By training ML models with extensive datasets of accurately identified specimens, the models can learn to recognize patterns in nematodes’ morphological and morphometric features. This enables them to make precise identifications of newly encountered individuals. Implementing ML algorithms can improve the speed and accuracy of species identification and allow researchers to efficiently process vast amounts of data. Furthermore, it empowers non-taxonomists to make reliable identifications. The objective of this study is to evaluate the performance of ML algorithms in identifying species of free-living marine nematodes, focusing on two well-known genera: Acantholaimus Allgén, 1933 and Sabatieria Rouville, 1903.

Methods

A total of 40 species of Acantholaimus and 60 species of Sabatieria were considered. The measurements and identifications were obtained from the original publications of species for both genera, this compilation included information regarding the presence or absence of specific characters, as well as morphometric data. To assess the performance of the species identification four ML algorithms were employed: Random Forest (RF), Stochastic Gradient Boosting (SGBoost), Support Vector Machine (SVM) with both linear and radial kernels, and K-nearest neighbor (KNN) algorithms.

Results

For both genera, the random forest (RF) algorithm demonstrated the highest accuracy in correctly classifying specimens into their respective species, achieving an accuracy rate of 93% for Acantholaimus and 100% for Sabatieria, only a single individual from Acantholaimus of the test data was misclassified.

Conclusion

These results highlight the overall effectiveness of ML algorithms in species identification. Moreover, it demonstrates that the identification of marine nematodes can be automated, optimizing biodiversity and ecological studies, as well as turning species identification more accessible, efficient, and scalable. Ultimately it will contribute to our understanding and conservation of biodiversity.

Keywords: Nematoda, Identification-key, Acantholaimus, Sabatieria, Random Forest, Support vector machine, Stochastic gradient boosting, K-nearest neighbor

Introduction

The correct taxonomic identification of species forms the foundation for biodiversity, ecology, phylogeny, and conservation studies. Traditionally, species identification has relied on the use of dichotomous keys based on morphological characters (Griffing, 2001; De & Dey, 2019). Despite the advent of DNA barcoding, morphological identification remains prevalent, primarily due to the limitations of DNA reference databases (Blaxter, 2004; Valentini, Pompanon & Taberlet, 2009; Guo et al., 2022). However, dichotomous keys are often limited to a specific geographic area, a small number of species, and a restricted set of morphological characters (Osborne, 1963; Walter & Winterton, 2007). Alternative tools such as polytomous keys (Weiss, 1995), pictorial keys (Schmidt-Rhaesa, 2014), and tabular keys (Fonseca, Vanreusel & Decraemer, 2006) have been proposed but also show similar limitations. To address these challenges, studies have explored the use of multivariate statistical techniques to analyze various morphological characteristics and morphometric measures simultaneously (Bailey & Byrnes, 1990; Stock & Kaya, 1996; Shokoohi & Moyo, 2022). While these approaches have been useful in grouping similar specimens and providing a more objective basis for species delimitation, their effectiveness in identifying new individuals, as expected from an identification key, has not been adequately evaluated. Thus, the challenge of evaluating newly collected specimens and assigning appropriate species names remains, hindering progress in research reliant on species identification.

In recent years, machine learning (ML) algorithms have emerged as a powerful tool to enhance data processing and facilitate species identification across taxa, including birds, insects, and plants (Wäldchen & Mäder, 2018; Islam et al., 2019; Kasinathan, Singaraju & Uyyala, 2021; Bojamma & Shastry, 2021). The fundamental principle behind ML-based species identification involves leveraging existing taxonomic knowledge, where each new observation is assigned a probability of belonging to a previously described species. Notably, a common aspect of these ML studies is that the identification was done on images or, in the case of birds, their songs and calls as well (Jadhav, Patil & Parasar, 2020; Mehyadin et al., 2021). Nonetheless, the application of ML approaches is not limited to images or audio data but can be extended to virtually any data type. This is particularly relevant in cases where obtaining high-quality images is challenging or not always possible. In such instances, species identification often relies on numerical data matrices that combine morphometric measurements and the presence/absence of morphological characters (Larrazabal-Filho, Neres & Esteves, 2018; Maria et al., 2009; Surmacz, Morek & Michalczyk, 2020; Tumanov, 2020; Mitra et al., 2019). In this regard, machine learning techniques can also potentially be effectively utilized for species identification. Supervised algorithms can be employed in these cases to automate the identification process. These algorithms utilize the species labels as the supervised variable (Y) and the morphological characteristics as the predictors (X). By training the algorithm on this data, it can learn the patterns and relationships between the morphological features and the corresponding species.

The aim of this study is to evaluate the performance of multiple machine learning algorithms on the identification of free-living marine nematode species. Free-living marine nematodes are small invertebrates that belong to the meiofauna. They are highly abundant and species-rich (Vanreusel, Fonseca & Danovaro, 2010; Hauquier et al., 2019; Zeppilli et al., 2019). These organisms are known as good ecological indicators due to their ubiquitous presence in diverse ecosystems and sensitivity to environmental changes (Moreno et al., 2011; Bianchelli et al., 2018). Moreover, they play a crucial role in various ecosystem functions, such as mineralization, oxygenation of the sediment, and secondary productivity (Schratzberger & Ingels, 2018).

Despite their ecological importance, the lack of reliable identification tools at lower taxonomical levels hampers ecological, molecular, and conservation studies (Macheriotou et al., 2019; Ridall & Ingels, 2021; Pantó et al., 2021). As a result, nematodes are often identified at the genus level rather than the species level (Miljutin et al., 2010; Sandulli, Semprucci & Balsamo, 2014; Brannock et al., 2017; Spedicato et al., 2020). The use of ML techniques in nematode identification is still limited. It has been successfully applied in the identification of species through image analysis (Thevenoux et al., 2021) and in the processes of detecting morphological and phenotypic features (Hakim et al., 2018). Although incipient, the initiatives demonstrated the versatility and potential of using machine learning in nematodes. The methodology proposed in this study will be tested using individuals from the genera Acantholaimus and Sabatieria. Acantholaimus (Allgén, 1933) is typically found in the deep sea (Miljutin & Miljutina, 2016a). Sabatieria (Rouville, 1903) is one of the most abundant and dominant genera along continental shelves and slopes, serving as an indicator of ecosystem wealth (Vanreusel, Fonseca & Danovaro, 2010; Kotwicki, Grzelak & Bełdowski, 2016; Mincks et al., 2021). Both genera are characterized by a large morphological variation, the presence of many described species, and have recent taxonomic reviews (Miljutin & Miljutina, 2016a; Venekey et al., 2019; Fonseca & Bezerra, 2014; Yang et al., 2019) making them highly suitable for testing ML tools for species identification.

Material and Methods

Literature review

The first step towards testing ML algorithms in the identification of Acantholaimus and Sabatieria species was to list all valid species described for each genus. All taxonomic descriptions and reviews considering these two genera were used in this study (Tables S1 and S2). Species for which publication provided the measurements of a single individual or descriptions that lacked significant taxonomic information were not included in the analysis. Considering these criteria, for Acantholaimus, a total of 40 out of the 46 valid species were considered (Table S1), while for Sabatieria, a total of 60 out of the 107 species were included (24 species were excluded due to the absence of information of characters and 23 were excluded because the description was limited to a single specimen; Table S2). Below we present a brief description of the genera and the morphological characters used for species identification in this study. To describe each species, body regions are abbreviated using the De Man (1880) and Cobb (1917) system of indices.

Morphological and morphometric data

Acantholaimus

The genus Acantholaimus Allgén, 1933 belongs to the family Chromadoridae, Filipjev, 1917, subfamily Spilipherinae, and it includes 46 valid species (Venekey et al., 2019; Holovachov, 2020; Manoel, Esteves & Neres, 2022). Venekey et al. (2019) provided the latest diagnosis of the genus.

The selection of characters to be included in the model was based on De Mesel et al. (2006) and Miljutin & Miljutina (2016b): 14 morphometric measurements (in µm); eight quantitative ratios; and seven categorical morphological characters, namely the amphid position (AP), amphid size (AS), cervical setae position (CSP), head shape (HS), pharynx shape (Pha.S), cuticular ornamentation (CO) and tail shape (TS). All morphometric characters for both genera are depicted in Table 1. For each morphological character, categories were assigned as detailed below:

  1. Head shape (HS): Acantholaimus species may have one out of four different head shapes: (a) truncated; (b) round; (c) tapered; and (d) narrow (Fig. 1A).

  2. Cervical setae position (CSP): In general, a pair of cervical setae is located posterior to the base of the amphid in each sublateral line, but the distance from the posterior border of the amphid varies between species. Three categories were established (Fig. 1B): (a): anterior or at the level of the posterior border of the amphid; (b): moderate distance in relation to the posterior border of the amphid (<1.0 AH); and (c): distant from the posterior border of the amphid (=or >1.0 AH).

  3. Amphid size (AS): The amphid size was estimated considering the ratio between its height (AH) and the corresponding body diameter (CD). Four categories were established (Fig. 1C): (a) AH/CD ≈ 1 (very large); (b): 1 > AH/CD > 0.5 (large); (c): AH/CD ≈ 0.5 (medium); and (d): AH/CD < 0.5 (small).

  4. Amphid position (AP): The amphid position was assessed considering the ratio between the distance from the anterior end to the amphid anterior borderline (AAE) and the corresponding body diameter at the mid-amphideal level (CD). Three categories were separated (Fig. 1D): (a): <1.0 (close to head end); (b): ≈1.0 (mid-amphideal); and (c): >1.0 (behind anterior end).

  5. Pharynx shape (Pha.S): Often, the pharynx is thick and muscular with numerous plasmatic interruptions. Three categories were assigned: (a): cylindrical; (b): round bulb; and (c): elongated bulb (Fig. 1E).

  6. Cuticular ornamentation (CO): The cuticle is ornamented with transverse rows of numerous punctuations. The lateral field of the cuticle may be distinguished by a wide lateral differentiation comprising larger, more sparsely, and sometimes more irregularly distributed punctations, or by several longitudinal rows of bigger dots. Three categories were assigned: (a): cuticle without lateral differentiation; (b): lateral differentiation of larger dots arranged irregularly; and (c): lateral differentiation of larger dots arranged in longitudinal rows (Fig. 1F).

  7. Tail shape (TS): Usually, the tail of the Acantholaimus species is conical-cylindrical and long. The change from conical to cylindrical can be abrupt, with a proximal conical section distinct from a distal filiform cylindrical section or gradually tapered to the tip. Two categories were established: (a): tail conical-cylindrical with the distinct filiform part distal section; and (b): tail with proximal conical section gradually tapered, and elongated, smoothly transitioning to the filiform distal section (Fig. 1G).

Table 1. List of selected morphometric characters used for the identification of Acantholaimus and Sabatieria species.
Code Measurement Acantholaimus Sabatieria
L Total body length (µm)
L′ Body length without tail
Amphid D Amphid diameter
OLSL Length of outer labial setae
CSL Length of cephalic setae
Cerv. LS Length of cervical setae
SSL Length of somatic setae
Spic.arc Length of spicule in the arc
D.A.E. A Distance from anterior end to amphid
D.L.C. S Diameter at the level of cephalic setae
D.L.M. A Diameter at the level of the middle of the amphid
D.L.C Diameter at the level of cardia
D.L. A Diameter at the level of anus
MBD Maximum body diameter
HD Head diameter
B.C. W Buccal cavity width
Amphid. H Amphid height
Amphid. AE Amphid from the anterior end
Nerv.ring Nerve ring from the anterior end
Pha.L Pharynx length
Pha.BBD Pharyngeal bulb body diameter
Gub.apoph. L Gubernacular apophyses length
Suppl. N° Number of supplements
abd Anal body diameter
TL Tail length
TL/abd Tail length abd
a, b, c De Man’s ratios
a′, b′, c′ De Man’s ratios
V Distance from anterior end to vulva/total body length %
V′ Distance from anterior end to vulva/body length without tail %
Figure 1. Morphological characters and diagnostic categories considered for Acantholaimus species.

Figure 1

Sabatieria

Sabatieria (Rouville, 1903) belongs to the family Comesomatidae (Filipjev, 1918), within the subfamily Sabatieriinae (Filipev, 1934). This genus is relatively speciose with 107 valid species (Fu, Leduc & Zhao, 2019; Yang et al., 2019; Zhai, Wang & Huang, 2020; Leduc & Zhao, 2023). The latest diagnosis has been presented by Fonseca & Bezerra (2014).

According to the literature survey, 16 measurements (in µm); six quantitative ratios (Table 1), and eight categorical morphological characters were selected to characterize the species of this genus. The categorical variables were buccal cavity (BC), number of amphideal turns (Amphid. Turn), cuticular ornamentation (CO), spicules (Spic), apophyses shape (Apoph), supplements aspect (Suppl. A), supplements position (Suppl. P), and tail shape (TS). The categories of each morphological character are as follows:

  1. Buccal cavity (BC): Within the genus Sabatieria, the degree of cuticularization of the buccal cavity is an important feature to distinguish the species (Jensen, 1979). Three categories were assigned: (a): without cuticularization, where the small buccal cavity is cup-shaped and narrow in the posterior portion; (b): little cuticularization, where the cup-shaped buccal cavity is slightly cuticularized at the base; and (c): with cuticularization, where the cup-shaped buccal cavity has a cuticularized like a tooth (Fig. 2A).

  2. Number of amphideal turns (Amphid. Turn): The genus Sabatieria has a spiral amphid fovea with usually 2 to 3 turns. The number of spiral turns and the percentage of the amphid fovea diameter (compared to the corresponding body diameter) have intraspecific variations (Platt, 1985). For the amphideal fovea number of turns, three categories were chosen: (a): 2–2.5 spiral turns; (b): 3–3.5 spiral turns, and (c): 4–4.5 spiral turns (Fig. 2B).

  3. Cuticular ornamentation (CO): This genus has a punctuated cuticle with or without lateral differentiation of larger punctations regularly or irregularly arranged. For the ornamentation of the cuticle, three categories were chosen: (a) without lateral differentiation; (b) lateral differentiation of larger and irregularly arranged punctations; and (c) lateral differentiation of larger and regularly arranged punctations (Fig. 2C).

  4. Supplements aspect (Suppl. A): The precloacal supplement aspect is also relevant for species delimitation within Sabatieria. The character was classified into three categories: (a) pore-like or tubular; (b) papillae; and (c) not visible when there is no display of that character (Fig. 2D).

  5. Supplements position (Suppl. P): For the distribution pattern of the precloacal supplements, three categories were designated: (a) uniform, when the spacing between the supplements is equal; (b) anterior closer, when the spacing between supplements increases toward the posterior part of the body; and (c) posterior closer, when the spacing between supplements decreases toward the posterior part of the body (Fig. 2E).

  6. Spicules (Spic): The size of the spicule is an essential characteristic of the differentiation of Sabatieria species. The character was classified into three categories considering the relation of the spicules length (SL) by the anal body diameter (ABD): (a) short, with SL/ABD < 1.0–1.3; (b) medium, with SL/ABD ≈ 1.3–1.6; and (c) long, with SL/ABD > 1.6 (Fig. 2F)

  7. Tail shape (TS): Most species of Sabatieria have a conical-cylindrical tail, consisting of an anterior conical portion and a posterior cylindrical portion with a drop-shaped tail tip and three short terminal setae. However, there are species with a conical (blunt) tail, and the lengths between the conical and the cylindrical portion are different, being an important characteristic to differentiate the species. Four categories were assigned: (a) conical, short tail with a rounded or blunt distal portion; (b) short conical-cylindrical, cylindrical distal portion with a length less than a conical anterior portion and slightly clavate tip; (c) medium conical-cylindrical, distal cylindrical portion similar in length to the conical anterior portion; and (d) long conical-cylindrical, cylindrical distal portion longer than the conical anterior portion (Fig. 2G).

  8. Apophyses shape (Apoph): The males of Sabatieria species usually present gubernaculum provided with apophyses that may have three different formats: (a) straight; (b) curved; and (c) complex (with loops or more than one curve) (Fig. 2H).

Figure 2. Morphological characters and diagnostic categories considered for Sabatieria species.

Figure 2

Data analysis

Pre-process data

Encoding categorical data

Prior to the analysis, categorical morphological characters were transformed into numeric variables using two techniques: Integer Encoding and One-Hot Encoding (Dahouda & Joe, 2021; Fig. 3). The criteria for choosing the appropriate encoding technique for each categorical variable were based on the domain knowledge and understanding of the data, as well as the characteristics of the variables themselves. This involves distinguishing between nominal features which have a binary nature from those that have an ordinal nature. By using the most suitable encoding method for each type of categorical data, we aimed to optimize the representation of the information and enhance the model’s ability to learn and make accurate predictions. The integer encoding technique assigned a unique integer value to each category, with a fixed reference level. They are used for categorical variables with ordinal relationships, where the categories have a specific order or hierarchy. For Acantholaimus, morphological characters such as amphid position (AP), amphid size (AS), and cervical setae position (CSP) were encoded as integers. One-Hot Encoding transformed each variable with n observations and d distinct values into d binary variables with n observations. Each observation indicated the dichotomous binary variable’s presence (1) or absence (0). For Acantholaimus, characters such as head shape (HD), pharynx shape (Pha.S), cuticular ornamentation (CO), and tail shape (TS) were treated as binary. For Sabatieria, morphological characters such as the number of amphideal turns (Amphid. Turn), spicules (Spic), apophyses shape (Apoph), and were encoded as integers. Characters like buccal cavity (BC), supplements aspect (Suppl. A), supplements position (Suppl. P), cuticular ornamentation (CO), and tail shape (TS) were treated as binary variables.

Figure 3. The workflow for applying machine learning algorithms.

Figure 3

Dataset acquisition, data analysis and output. The chosen dataset, sourced from descriptions literature on Acantholaimus and Sabatieria species, was organized into matrix labels representing individuals and their corresponding morphological and morphometric characteristics. This organized data served as the input for the subsequent machine-learning stages. The selection and classification algorithms employed encompassed Random Forest, Stochastic Gradient Boosting, Support Vector Machine, and K-nearest neighbor techniques. These algorithms were utilized to identify the optimal set of features for species recognition and to construct predictive models for accurately identifying individuals based on the presence/absence of morphological and morphometric characteristics.

Handling the missing data and feature scaling

Data imputation was performed to address missing values in some morphometric characters of both genera. To ensure a conservative analysis and avoid potential bias, imputation was done by replacing missing values with the mean value of the respective character across the genus. Additionally, the data was scaled before applying the algorithms. Scaling was necessary to ensure fair comparisons, accurate distance calculations, and reliable predictions (Sukumar, 2014). It also helps to eliminate biases introduced by varying scales and enhances the algorithm’s performance.

Splitting the dataset

To validate the identification of the two models constructed for each genus, the input data for all the algorithms were split into training and testing sets. The minimum number of individuals required to perform the split is four (one for testing and three to perform the cross-validation in the training data). So, only species of Acantholaimus and Sabatieria which were described based on four or more individuals were included in the testing set (Supplemental Material Tables S3 and S4). For descriptions based on 4–9 individuals, one was randomly left out for validation, whereas for descriptions based on more than 10 individuals, two individuals were randomly left out. For the Acantholaimus model, the training set had 131 individuals from the 40 species, and the testing set had 14, resulting in a total of 145 individuals. In the case of the Sabatieria model, out of the 60 species, 227 individuals were used for training and 33 individuals were used for testing, totaling 260 individuals.

Machine-learning analysis

Algorithms

Four algorithms were selected to generate the identification models for Acantholaimus and Sabatieria species: Random Forest (RF), Stochastic Gradient Boosting (SGboost), Support Vector Machine (SVM; linear and radial), and K-nearest neighbor (KNN). The RF algorithm consists of a set of decision trees generated within the same object. Each object, which consists of multiple trees, undergoes a voting mechanism (bagging) to determine the most voted classification (Knauer et al., 2019; Shaik & Srinivasan, 2019). SGBoost combines simple decision trees, known as weak models (Hastie, Tibshirani & Friedman, 2001), to create a strong classifier (Natekin & Knoll, 2013). The SVM (linear and radial) method is a popular classification algorithm that plots each sample data in an n-dimensional space, where n is the number of features. The SVM algorithm then finds the best-fit hyperplane that maximizes the margin between the nearest support vectors of both classes, using the chosen hyperplane (Yan & Zhu, 2022). In the KNN model, each data point is represented in an n-dimensional space, and when an unknown sample is introduced, the distance between the unknown sample and each data point is calculated based on the Euclidean distance (Alimjan et al., 2018).

Training the model

The parametrization of the models was done following the guidelines provided by Fonseca & Vieira (2023). All algorithms were executed using a cross-validation method with five-fold and 10 repetitions. The hyperparameter mtry, which determines the number of variables used as candidates at each split point, was fine-tuned using a random search with a tune length 10. The RF was performed with 500 trees, while the SGBoost was done with 250 and 500 trees. The models were evaluated using the total accuracy and kappa metrics (Vieira & Fonseca, 2022). Accuracy represents the ratio of correct responses to the total number of observations. Kappa statistics quantify the level of agreement between observed and expected values, taking into account the agreement that could occur by chance alone. Additionally, Kappa can be interpreted as the average reliability of categories or as an indicator of intraclass correlation (Warrens, 2015).

All the analyses were conducted in the iMESc—An Interactive Machine Learning App for Environmental Science, which is an open-source application built on R language (Vieira & Fonseca, 2022). Comprehensive details and step-by-step guidelines to extract the raw data are available at https://danilocvieira.github.io/iMESc_help/. The data can be accessed through “savepoint_acantholaimus” and “savepoint_sabatieria” in iMESC or in R following the same reference. The iMESc software can be downloaded at https://zenodo.org/record/7278042. The savepoints include both the datasets and the model’s results and outputs, which can be accessed by others for further analysis and validation. The save points ensure transparency and reproducibility of the study.

Results

Identification of Acantholaimus species

The accuracy of algorithms in identifying Acantholaimus species showed significant variability among them (Table 2). In the training of data, the RF algorithm achieved the highest accuracy of 94%, followed by SVM_L with 92% accuracy, and SVM_R with 92% accuracy (Table 2).

Table 2. Accuracies and Kappa index for the training and test part of the data from each algorithm used to construct the identification key: Random Forest (RF), Stochastic Gradient Boosting (SGboost), Support Vector Machine (SVM; linear (L) and radial (R)), and K-nearest neighbor (KNN). SD, standard deviations.

Models Training Testing
Accuracy Kappa Accuracy SD Kappa SD Accuracy Kappa
Acantholaimus
RF 0.94 0.94 0.04 0.04 0.93 0.92
SVM_L 0.92 0.91 0.05 0.05 0.92 0.92
SVM_R 0.92 0.92 0.04 0.04 0.92 0.92
SGboost 0.76 0.75 0.07 0.07 0.85 0.84
KNN 0.51 0.49 0.06 0.06 0.78 0.76
Sabatieria
RF 0.97 0.97 0.02 0.02 1 1
SVM_L 0.95 0.95 0.02 0.02 1 1
SVM_R 0.93 0.92 0.03 0.03 0.97 0.96
SGboost 0.74 0.73 0.04 0.04 0.90 0.90
KNN 0.61 0.60 0.04 0.04 0.93 0.93

Note:

Bold values indicate the highest accuracy and kappa index.

Upon evaluation of the testing dataset, the top four algorithms, including RF, SVM linear and radial, SGboost, and KNN, were able to accurately classify almost all specimens except for one individual of the species A. veitkoehlerae. (Id.47), which was misidentified as A. robustus (Table 3). When applied to the testing data, the RF algorithm yielded an overall accuracy of 93% and SVM; linear and radial achieved an accuracy of 92%, along with a corresponding kappa coefficient of 92% (Table 2).

Table 3. Percentages of accuracies (correct classifications), and errors (misclassifications) for each individual used to test the prediction of the Random Forest for Acantholaimus after calculating 500 trees.

Id, identification label of each individual; Species, species described in the original description; Predicted species, species predicted by the model.

Id Accuracy (%) Error (%) Species Predicted species
Id.9 76 23 A.angustus A.angustus
Id.10 90 10 A.angustus A.angustus
Id.18 82 18 A.arthrochaeta A.arthrochaeta
Id.21 88 11 A.barbatus A.barbatus
Id.31 58 41 A.cornutus A.cornutus
Id.47 39 60 A.veitkoehlerae A.robustus
Id.52 81 18 A.sieglerae A.sieglerae
Id.64 96 4 A.veitkoehlerae A.veitkoehlerae
Id.65 98 2 A.veitkoehlerae A.veitkoehlerae
Id.74 70 29 A.quintus A.quintus
Id.81 78 22 A.verscheldi A.verscheldi
Id.89 66 33 A.microdontus A.microdontus
Id.99 30 69 A.septimus A.septimus
Id.108 41 58 A.megamphis A.megamphis

Note:

Bold value indicates the misclassified Acantholaimus species.

Out of the 29 characters analyzed, a subset of 17 characters stood out, comprising 8 morphometric measurements, seven quantitative ratios, and two categorical morphological characters (see Fig. 4). Several key characters emerged as highly significant across all models, including the diameter at the level of cephalic setae (DLCS), diameter at the level of cardia (DCL), body length without tail/length of the pharynx (b′), body length without tail (L′) and diameter at the level of the middle of the amphid (DLMA).

Figure 4. The Random Forest features importance analysis of the significant characters used in the identification of the Acantholaimus species.

Figure 4

The variables were ranked based on their average positions among the nodes of the 500 generated trees. The color gradient represents the position of the nodes (Minimal depth) in the trees. The higher the node position, the greater the variable importance. Abbreviations are listed in Table 1.

Identification of Sabatieria species

As for the Acantholaimus model, the algorithms with the Sabatieria species data showed significantly variable performance. Based on the training and testing data, the RF algorithm was the most accurate, followed by both SVM; linear and radial (Table 2).

Considering the testing part of the data, both RF and SVM (linear) models demonstrated a perfect global accuracy and kappa coefficient of 100%, whereas SVM (radial) achieved an accuracy of 97% and kappa of 96%. This success encompassed the accurate identification of all species (Table 4).

Table 4. Percentages of accuracies (correct classifications), and errors (misclassifications) for each individual used to test the prediction of the Random Forest for Sabatieria after calculating 500 trees. Id, identification label of each individual; Species, species described in the original description; Predicted species, species predicted by the model.

Id Accuracy (%) Error (%) Species Predicted species
Id.3 73 27 S.alata S.alata
Id.8 74 26 S.armata S.armata
Id.13 67 33 S.balbutiens S.balbutiens
Id.28 88 13 S.celtica S.celtica
Id.30 90 10 S.celtica S.celtica
Id.36 78 23 S.conicauda S.conicauda
Id.44 97 3 S.conicoseta S.conicoseta
Id.59 86 14 S.elongata S.elongata
Id.64 82 18 S.execulta S.execulta
Id.76 22 77 S.fidelis S.fidelis
Id.81 87 12 S.granifer S.granifer
Id.88 63 37 S.granifer S.granifer
Id.108 100 0 S.lepida S.lepida
Id.112 100 0 S.lepida S.lepida
Id.115 94 6 S.longicaudata S.longicaudata
Id.121 97 2 S.longispinosa S.longispinosa
Id.145 74 26 S.multisupplementia S.multisupplementia
Id.153 100 0 S.ornata S.ornata
Id.158 100 0 S.ornata S.ornata
Id.166 69 31 S.parabyssalis S.parabyssalis
Id.169 54 46 S.parapraedatrix S.parapraedatrix
Id.180 58 42 S.pisinna S.pisinna
Id.183 94 7 S.pomarei S.pomarei
Id.190 35 65 S.praedatrix S.praedatrix
Id.195 96 4 S.propisinna S.propisinna
Id.206 100 0 S.pulchra S.pulchra
Id.216 100 0 S.pulchra S.pulchra
Id.222 96 4 S.punctata S.punctata
Id.226 100 0 S.punctata S.punctata
Id.232 62 38 S.sinica S.sinica
Id.242 82 18 S.stekhoveni S.stekhoveni
Id.246 95 4 S.stenocephalus S.stenocephalus
Id.258 80 19 S.vasicola S.vasicola

In the case of Sabatieria, the feature importance analysis selected a subset of 16 characters among the 30 used. Nine of them were morphometric measurements, four quantitative ratios, and three categorical morphological characters (Fig. 5). Notably, characters such as apophyses (Apoph), spicules (Spic), pharynx length (Pha. L), length of cephalic setae (CSL) and pharyngeal bulb body diameter (Pha. BBD) held prominent positions in the analysis, indicating their significance as the most important characters.

Figure 5. The Random Forest feature importance analysis of the significant characters used in the identification of the Sabatieria species.

Figure 5

The variables were ranked based on their average positions among the nodes of the 500 generated trees. The color gradient represents the position of the nodes (Minimal depth) in the trees. The higher the node position, the greater the variable importance. Abbreviations are listed in Table 1.

Discussion

The utilization of machine learning algorithms has demonstrated its effectiveness in identifying Acantholaimus and Sabatieria species. The findings that RF was the top-performing algorithm and KNN the least accurate agree with the literature (Liu et al., 2022). RF possesses the capability to handle large numbers of input variables and assign varying importance to each, thus effectively managing errors in datasets (Wäldchen & Mäder, 2018). RF also showed superior performance in the identification of wood species (Shugar, Drake & Kelley, 2021). KNN, in contrast, is known to be sensitive to outliers and becomes less efficient when dealing with large volumes of data (Cao et al., 2018). SVM also showed high accuracy values. This algorithm is normally applied to the classification of high-dimension data with many features, offering a fast classification process (Kremic & Subasi, 2016). The fact that RF performed better here does not mean that it will always outperform the others. Therefore, the recommendation is to compare the results of different algorithms, and eventually even an ensemble.

The construction of a comprehensive database of morphological characteristics is critical for implementing the proposed methodology across the phylum. In the case of the two genera studied here, the availability of outstanding systematic reviews (Jensen, 1979; Platt, 1985; Miljutin & Miljutina, 2016b) greatly facilitated the selection of relevant characteristics. While these reviews highlight several important characteristics for distinguishing species, not all of them were included in the analysis of this study. For instance, the complex structure of the copular apparatus (spicules and gubernaculum) and the shape of the buccal cavity in Acantholaimus were omitted from the analysis due to the challenging nature of categorizing them. The shape of the buccal cavity, in particular, is influenced by the degree of retraction/eversion of the stoma which is a result of the fixation method (Miljutin & Miljutina, 2016b). Similarly, the degree of eversion may also influence the head shape so this character must be used cautiously. In that case, however, we decided to keep the character since it was consistently present in individuals of each described species and was generally combined with other relevant morphological traits such as the length of cephalic setae and amphids’ position.

In the scope of this study, from the initial selection of 29 characters for Acantholaimus and 30 for Sabatieria, the feature importance analysis yielded a result of 17 (Acantholaimus) and 16 (Sabatieria) key characters for each genus. For Acantholaimus, significant features included morphological aspects such as amphid size and cervical setae position alongside specific morphometric attributes like the De man ratios. In the context of Sabatieria species, the analysis selected the characters related to the copular apparatus together with the tail length and the number of amphideal turns. In practice, if, these sets of characters are observed during the identification processes, it will enhance the chances of the model performing an accurate identification. On the other hand, a set of 12 and 14 characters for Acantholaimus and Sabatieria, respectively, were less relevant for distinguishing the species. Yet, the reasons why one character is more informative than another are a matter of further investigation. It could be that the selected characters have gone through disruptive selection (Rueffler et al., 2006) In that way, the implementation of a ML identification key facilitates the selection of the main traits to be used during the species identification process (Bogale, Baniya & DiGennaro, 2020; Tan et al., 2021), as well as gives us elements to further explore potential evolutionary process (Avila & Mullon, 2023).

The proposed approach does not eliminate the steps involved in the identification: observing the specimens, taking measurements, and categorizing the morphological characters. Instead, by leveraging the use of ML algorithms in taxonomy, it ensures a unified database and identification procedure for all researchers. As such, it allows the results of the identification processes to be equivalent across studies. Having comprehensive and well-documented species descriptions that cover multiple individuals and morphological characters is crucial for the success of the ML identification key. The more observations and detailed descriptions available, the better the quality and accuracy of the key. This issue is particularly important for species with strong sexual dimorphisms (Decraemer, Coomans & Baldwin, 2013). It is important to emphasize that the number of observations plays a central role in ML methods. Sufficient individuals are needed to train the models, and a separate set of individuals is required for testing and validation. Single individual descriptions pose challenges and limit the effectiveness of such methods, as they do not capture variation within a species. To implement this approach effectively, it would be advisable to start with taxonomic groups that have recent and comprehensive systematic reviews, such as Chromadoridae (Venekey et al., 2019) and Cyatholaimidae (Cunha, Fonseca & Amaral, 2022). These groups serve as the foundation for the morphological database and training of the ML models. As more comprehensive reviews become available for other taxonomic groups, the methodology can be extended to cover a wider range of marine nematode species.

It is important to acknowledge that misclassification can occur in ML algorithms, as observed for A. veitkoehlerae. The limited number of observations for certain morphological characters in this study may have contributed to the errors. ML algorithms rely on informative features extracted from the observations, which in this study are the specimens, to make accurate classifications (Bartlett et al., 2022). If the chosen features lack sufficient information or fail to capture the essential characteristics of the specimens, the algorithm performance will be compromised. Incorporating additional data, either new morphological characters or more individuals that capture the relevant variation within and among species, will enhance the algorithms’ predictive power. Thus, accurate taxonomic descriptions are crucial to achieve a better identification key.

There are, however, some limitations in implementing the tool for identifying Acantholaimus and Sabatieria species. The genus Acantholaimus benefits from having a significant number of described species, each based on detailed observations of four to seven individuals, with many of these species having been recently described. Conversely, Sabatieria poses challenges due to the descriptions being, in many cases, based on single or inadequately characterized individuals (Allgén, 1953; Wieser, 1954). Some descriptions focused only on females or males and there are instances where only (Micoletzky, 1924; Sergeeva, 1973) juveniles were included (Allgén, 1929). As a result, a considerable number of species (47 in total) could not be included in the analysis due to insufficient information and possessing somewhat incomplete descriptions. Future taxonomic efforts should prioritize obtaining multiple individuals at different life stages and sexes to address these limitations. The species left out from the analysis could be recollected and better described. Such an effort would provide a more robust identification tool covering a greater number of species. The ML identification key can be continuously improved and refined as more data (i.e., morphological characters, individuals, and species) becomes available, ensuring its accuracy and reliability in future applications.

Finally, when it comes to the identification of nematodes, it is of utmost importance to clarify the morphological characteristics and establish standardized terminology for these features. This ensures that researchers consistently use the same names to refer to the same structures (Decraemer, Coomans & Baldwin, 2013). A prime example is the case of supplements found in Sabatieria, which can exhibit pore-like or tubular appearances, essentially representing the same structure but describe with different terms (Leduc, 2013; Botelho et al., 2007; Botelho, Esteves & Fonsêca-Genevois, 2014). Such variations in terminology create confusion and hinder accurate identification. By promoting uniformity in character descriptions and adopting standardized terminology, we can greatly enhance the accuracy and clarity of nematode identification. This practice allows researchers to communicate effectively, compare findings across studies and build a comprehensive understanding of nematode anatomy (De Ley, 1995; Jenner, 2004).

Conclusion

This study showed that ML techniques can identify species of free-living marine nematodes. We suggest performing multiple algorithms to choose the most appropriate one. The results indicate that based on the presence/absence of morphological characters and a morphometric table, the process of identifying marine nematodes can be performed by algorithms, substituting the process of running traditional identification keys. Implementing ML keys can improve the speed and accuracy of species identification and allow researchers to efficiently process vast amounts of data. This approach also empowers non-taxonomists to confidently perform reliable identifications. Ultimately, introducing ML algorithms in taxonomy will contribute to our understanding and conservation of biodiversity. The success of having these keys depends on the quality of descriptions and systematic reviews.

Supplemental Information

Supplemental Information 1. List of valid Acantholaimus species according to Worms.

The green color indicates species that were excluded from the analysis due to poor taxonomical descriptions either by the absence of information of characters or were limited to a single specimen.

DOI: 10.7717/peerj.16216/supp-1
Supplemental Information 2. List of valid Sabatieria species according to Worms.

The green color indicates species that were excluded from the analysis due to poor taxonomical descriptions either by the absence of information of characters or were limited to a single specimen.

DOI: 10.7717/peerj.16216/supp-2
Supplemental Information 3. Number of individuals for Acantholaimus species Classification.

The number of individuals required for carrying out the classification of Acantholaimus species.

DOI: 10.7717/peerj.16216/supp-3
Supplemental Information 4. Number of individuals for Sabatieria species classification.

The number of individuals required for carrying out the classification of Sabatieria species.

DOI: 10.7717/peerj.16216/supp-4
Supplemental Information 5. Savepoint_acantholaimus.

Analytical details and all data.

DOI: 10.7717/peerj.16216/supp-5
Supplemental Information 6. Savepoint_sabatieria.

Analytical details and all data.

DOI: 10.7717/peerj.16216/supp-6

Acknowledgments

The authors extend their appreciation to Luciana Yaginuma, Nilvea Ramalho, Mauricio Shimabukuro, and Maikon Di Domenico for their invaluable support and contributions throughout the project. Additionally, the authors are also thankful to the dedicated members of the meiofauna team from UNIFESP and USP for their assistance and commitment to processing the samples. We would also like to express our gratitude to the reviewers, Dr. Jose Andrés Pérez-García and anonymous reviewers, for their comments, which significantly enhanced the quality of the manuscript.

Funding Statement

Financial support was provided by the Conselho Nacional de Desenvolvimento Científico e Tecnológico—CNPQ to Gustavo Fonseca (306780/2022-4). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Additional Information and Declarations

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Simone Brito de Jesus conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Danilo Vieira analyzed the data, authored or reviewed drafts of the article, and approved the final draft.

Paula Gheller performed the experiments, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Beatriz P. Cunha conceived and designed the experiments, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Fabiane Gallucci conceived and designed the experiments, authored or reviewed drafts of the article, and approved the final draft.

Gustavo Fonseca conceived and designed the experiments, analyzed the data, authored or reviewed drafts of the article, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

The datasets and the model’s results and outputs are available in the Supplemental Files.

References

  • Alimjan et al. (2018).Alimjan G, Sun T, Liang Y, Jumahun H, Guan Y. A new technique for remote sensing image classification based on combinatorial algorithm of SVM and KNN. International Journal of Pattern Recognition and Artificial Intelligence. 2018;32(7):1859012. doi: 10.1142/S0218001418590127. [DOI] [Google Scholar]
  • Allgén (1929).Allgén CA. About some Antarctic free-living marine nematodes [Über einige antarktische freilebende marine Nematoden] Zoologischer Anzeiger. 1929;84:126–140. (In German) [Google Scholar]
  • Allgén (1933).Allgén CA. Free-living nematodes from the Trondhjemsfjord [Freilebende Nematoden aus dem Trondhjemsfjord] Capita Zoologica. 1933;4(2):1–162. (In German) [Google Scholar]
  • Allgén (1953).Allgén CA. About a remarkable new South Sea species of the nematode genus Sabatieria De Rouville, S. heterospiculum from South Georgia [Über eine bemerkenswerte neue Südsee-Art der Nematodengattung Sabatieria De Rouville, S. heterospiculum von Süd-Georgien] Det Konglige Norske Videnskabers Selskabs Forhandlinger. 1953;26(2):4–6. (In German) [Google Scholar]
  • Avila & Mullon (2023).Avila P, Mullon C. Evolutionary game theory and the adaptive dynamics approach: adaptation where individuals interact. Philosophical Transactions of the Royal Society B: Biological Sciences. 2023;378(1876):20210502. doi: 10.1098/rstb.2021.0502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Bailey & Byrnes (1990).Bailey RC, Byrnes JA. New, old method for assessing measurement error in both univariate and multivariate morphometric studies. Systematic Zoology. 1990;39(2):2124–2130. doi: 10.2307/2992450. [DOI] [Google Scholar]
  • Bartlett et al. (2022).Bartlett P, Eberhardt U, Schütz N, Beker HJ. Species determination using AI machine-learning algorithms: Hebeloma as a case study. IMA Fungus. 2022;13(1):13. doi: 10.1186/s43008-022-00099-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Bianchelli et al. (2018).Bianchelli S, Buschi E, Danovaro R, Pusceddu A. Nematode biodiversity and benthic trophic state are simple tools for the assessment of the environmental quality in coastal marine ecosystems. Ecological Indicators. 2018;95(6):270–287. doi: 10.1016/j.ecolind.2018.07.032. [DOI] [Google Scholar]
  • Blaxter (2004).Blaxter ML. The promise of a DNA taxonomy. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences. 2004;359(1444):669–679. doi: 10.1098/rstb.2003.1447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Bogale, Baniya & DiGennaro (2020).Bogale M, Baniya A, DiGennaro P. Nematode identification techniques and recent advances. Plants. 2020;9(10):1260. doi: 10.3390/plants9101260. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Bojamma & Shastry (2021).Bojamma AM, Shastry CA. A study on the machine learning techniques for automated plant species identification: current trends and challenges. International Journal of Information Technology. 2021;13(3):989–995. doi: 10.1007/s41870-019-00379-7. [DOI] [Google Scholar]
  • Botelho, Esteves & Fonsêca-Genevois (2014).Botelho AP, Esteves AM, Fonsêca-Genevois V. Known and new species of Sabatieria Rouville, 1903 (Araeolaimida: Comesomatidae) from the southwest Atlantic (Campos Basin, Brazil) Marine Biology Research. 2014;10(9):871–891. doi: 10.1080/17451000.2013.866249. [DOI] [Google Scholar]
  • Botelho et al. (2007).Botelho AP, Silva MD, Esteves AM, Fonsêca-Genevois V. Four new species of Sabatieria Rouville, 1903 (Nematoda, Comesomatidae) from the continental slope of Atlantic Southeast. Zootaxa. 2007;1402(1):39–57. doi: 10.11646/zootaxa.1402.1.3. [DOI] [Google Scholar]
  • Brannock et al. (2017).Brannock PM, Sharma J, Bik HM, Kelley Thomas W, Halanych KM. Spatial and temporal variation of intertidal nematodes in the northern Gulf of Mexico after the Deepwater Horizon oil spill. Marine Environmental Research. 2017;130:200–212.484. doi: 10.1016/j.marenvres.2017.07.008. [DOI] [PubMed] [Google Scholar]
  • Cao et al. (2018).Cao J, Leng W, Liu K, Liu L, He Z, Zhu Y. Object-based mangrove species classification using unmanned aerial vehicle hyperspectral images and digital surface models. Remote Sensing. 2018;10(1):89. doi: 10.3390/rs10010089. [DOI] [Google Scholar]
  • Cobb (1917).Cobb NA. Notes on Nemas. Contributions to a Science of Nematology. 1917;5:117–128. [Google Scholar]
  • Cunha, Fonseca & Amaral (2022).Cunha BP, Fonseca G, Amaral ACZ. Diversity and distribution of cyatholaimidae (Chromadorida: Nematoda): a taxonomic and systematic review of the world records. Frontiers in Marine Science. 2022;9:836670. doi: 10.3389/fmars.2022.836670. [DOI] [Google Scholar]
  • Dahouda & Joe (2021).Dahouda MK, Joe I. A deep-learned embedding technique for categorical features encoding. IEEE Access. 2021;9 doi: 10.1109/ACCESS.2021.3104357. 114381–114391. [DOI] [Google Scholar]
  • De & Dey (2019).De M, Dey SR. An overview on taxonomic keys and automated species identification (ASI) International Journal of Experimental Research and Review. 2019;20:40–47. doi: 10.52756/ijerr.2019.v20.004. [DOI] [Google Scholar]
  • De Ley (1995).De Ley P. Ultrastructure of the stoma in Cephalobidae, Panagrolaimidae and Rhabditidae, with a proposal for a revised stoma terminology in Rhabditida (Nematoda) Nematologica. 1995;41(1–4):153–182. doi: 10.1163/003925995X00143. [DOI] [Google Scholar]
  • De Man (1880).De Man JG. The native ones, living freely in the pure earth and sweet water Nematodes. Preliminary report and descriptive-systematic part [Die einheimischen, frei in der reinen Erde und im süssen Wasser lebende Nematoden. Vorläufiger Bericht und deskriptiv-systematischer Teil] Tijdschrift Nederlandsche Dierkundig Vereeiging. 1880;5(1):104. (In German) [Google Scholar]
  • De Mesel et al. (2006).De Mesel I, Lee HJ, Vanhove S, Vincx M, Vanreusel A. Species diversity and distribution within the deep-sea nematode genus Acantholaimus on the continental shelf and slope in Antarctica. Polar Biology. 2006;29(10):860–871. doi: 10.1007/s00300-006-0124-7. [DOI] [Google Scholar]
  • Decraemer, Coomans & Baldwin (2013).Decraemer W, Coomans A, Baldwin J. Morphology of nematoda. In: Schmidt-Rhaesa A, editor. Handbook of Zoology: Gastrotricha, Cycloneuralia and Gnathifera. Vol. 2. Nematoda, Berlin: De Gruyter; 2013. p. 159. [DOI] [Google Scholar]
  • Filipev (1934).Filipev IN. The classification of the free-living nematodes and their relation to the parasitic nematodes. Smithsonian Miscellaneous Collections. 1934;89(6):1–63. [Google Scholar]
  • Filipjev (1917).Filipjev IN. A new free-living nematode from the Caspian Sea, Chromadorissa gen. nov. (Chromadoridae, Chromadorini) [Un nématode libre nouveau de la mer Caspienne, Chromadorissa gen. nov.(Chromadoridae, Chromadorini)] Zoologichesky Zhurnal. 1917;2:24–30. (In French) [Google Scholar]
  • Filipjev (1918).Filipjev IN. Free-living marine nematodes of the Sevastopol area. Transactions of the zoological laboratory and the Sevastopol biological station of Russian academy of sciences. Petrograd Series II. 1918;2(4) [Google Scholar]
  • Fonseca & Bezerra (2014).Fonseca G, Bezerra TN. Order Monhysterida Filipjev, 1929. In: Schmidt-Rhaesa A, editor. Handbook of Zoology: Gastrotricha, Cycloneuralia and Gnathifera. Vol. 2. Nematoda Berlin: De Gruyter; 2014. pp. 435–465. [Google Scholar]
  • Fonseca, Vanreusel & Decraemer (2006).Fonseca G, Vanreusel A, Decraemer W. Taxonomy and biogeography of Molgolaimus Ditlevsen, 1921 (Nematoda: Chromadoria) with reference to the origins of deep-sea nematodes. Antarctic Science. 2006;18(1):23–50. doi: 10.1017/S0954102006000034. [DOI] [Google Scholar]
  • Fonseca & Vieira (2023).Fonseca G, Vieira DC. Overcoming the challenges of data integration in ecosystem studies with machine learning workflows: an example from the Santos project. Ocean and Coastal Research. 2023;71:e23021. doi: 10.1590/2675-2824071.22044gf. [DOI] [Google Scholar]
  • Fu, Leduc & Zhao (2019).Fu S, Leduc D, Zhao ZQ. Two new and one known deep-sea Comesomatidae Filipjev, 1918 species (Nematoda: Araeolaimida) from New Zealand’s continental margin. Marine Biodiversity. 2019;49(4):1931–1949. doi: 10.1007/s12526-019-00955-x. [DOI] [Google Scholar]
  • Griffing (2001).Griffing LR. Who invented the dichotomous key? Richard Waller’s watercolors of the herbs of Britain. American Journal of Botany. 2001;98(12):1911–1923. doi: 10.3732/ajb.1100188. [DOI] [PubMed] [Google Scholar]
  • Guo et al. (2022).Guo M, Yuan C, Tao L, Cai Y, Zhang W. Life barcoded by DNA barcodes. Conservation Genetics Resources. 2022;14(4):351–365. doi: 10.1007/s12686-022-01291-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Hakim et al. (2018).Hakim A, Mor Y, Toker IA, Levine A, Neuhof M, Markovitz Y, Rechavi O. WorMachine: machine learning-based phenotypic analysis tool for worms. BMC Biology. 2018;16(1):1–11. doi: 10.1186/s12915-017-0477-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Hastie, Tibshirani & Friedman (2001).Hastie T, Tibshirani R, Friedman J. The elements of statistical learning; data mining, inference and prediction. Vol. 2. New York: Springer; 2001. p. 758. [Google Scholar]
  • Hauquier et al. (2019).Hauquier F, Macheriotou L, Bezerra TN, Egho G, Martínez AP, Vanreusel A. Distribution of free-living marine nematodes in the Clarion-Clipperton Zone: implications for future deep-sea mining scenarios. Biogeosciences. 2019;16(18):3475–3489. doi: 10.5194/bg-16-3475-2019. [DOI] [Google Scholar]
  • Holovachov (2020).Holovachov O. The nomenclatural status of new nematode nomina proposed in 1993 in the doctoral thesis of Christian Bussau, entitled Taxonomische und ökologische Untersuchungen an Nematoden des Peru-Beckens (Nematoda) Bionomina. 2020;19(1):86–99. doi: 10.11646/bionomina.19.1.5. [DOI] [Google Scholar]
  • Islam et al. (2019).Islam S, Khan SIA, Abedin MM, Habibullah KM, Das AK. Bird species classification from an image using VGG-16 network. Proceedings of the 2019 7th International Conference on Computer and Communications Management; 2019. pp. 38–42. [Google Scholar]
  • Jadhav, Patil & Parasar (2020).Jadhav Y, Patil V, Parasar D. Machine learning approach to classify birds on the basis of their sound. 2020 International Conference on Inventive Computation Technologies (ICICT); Piscataway: IEEE; 2020. pp. 69–73. [Google Scholar]
  • Jenner (2004).Jenner RA. The scientific status of metazoan cladistics: why current research practice must change. Zoologica Scripta. 2004;33(4):293–310. doi: 10.1111/j.0300-3256.2004.00153.x. [DOI] [Google Scholar]
  • Jensen (1979).Jensen P. Nematodes from the brackish waters of the southern archipelago of Finland. Benthic species. Annales Zoology Fennici. 1979;16:151–168. [Google Scholar]
  • Kasinathan, Singaraju & Uyyala (2021).Kasinathan T, Singaraju D, Uyyala SR. Insect classification and detection in field crops using modern machine learning techniques. Information Processing in Agriculture. 2021;8(3):446–457. doi: 10.1016/j.inpa.2020.09.006. [DOI] [Google Scholar]
  • Knauer et al. (2019).Knauer U, von Rekowski CS, Stecklina M, Krokotsch T, Pham Minh T, Hauffe V, Seiffert U. Tree species classification based on hybrid ensembles of a convolutional neural network (CNN) and random forest classifiers. Remote Sensing. 2019;11(23):2788. doi: 10.3390/rs11232788. [DOI] [Google Scholar]
  • Kotwicki, Grzelak & Bełdowski (2016).Kotwicki L, Grzelak K, Bełdowski J. Benthic communities in chemical munitions dumping site areas within the Baltic deeps with special focus on nematodes. Deep Sea Research Part II: Topical Studies in Oceanography. 2016;128:123–130. doi: 10.1016/j.dsr2.2015.12.012. [DOI] [Google Scholar]
  • Kremic & Subasi (2016).Kremic E, Subasi A. Performance of random forest and SVM in face recognition. The International Arab Journal of Information Technology. 2016;13(2):287–293. [Google Scholar]
  • Larrazabal-Filho, Neres & Esteves (2018).Larrazabal-Filho AL, Neres PF, Esteves AM. The genus Bolbonema Cobb, 1920 (Nematoda: Desmodoridae): emended diagnosis, key to males, and description of three new species from the continental shelf off northeastern Brazil. Zootaxa. 2018;4420(4):551–570. doi: 10.11646/ZOOTAXA.4420.4.6. [DOI] [PubMed] [Google Scholar]
  • Leduc (2013).Leduc D. Seven new species and one new species record of Sabatieria (Nematoda: Comesomatidae) from the continental slope of New Zealand. Zootaxa. 2013;3693(1):1–35. doi: 10.11646/zootaxa.3693.1.1. [DOI] [PubMed] [Google Scholar]
  • Leduc & Zhao (2023).Leduc D, Zhao ZQ. The Marine Biota of Aotearoa New Zealand. Ngā toke o Parumoana: common free-living Nematoda of Pāuatahanui Inlet, Te-Awarua-o-Porirua Harbour, Wellington. NIWA Biodiversity Memoir. 2023;135:212. [Google Scholar]
  • Liu et al. (2022).Liu Y, Yang M, Wang Y, Li Y, Xiong T, Li A. Applying machine learning algorithms to predict default probability in the online credit market: evidence from China. International Review of Financial Analysis. 2022;79(1):101971. doi: 10.1016/j.irfa.2021.101971. [DOI] [Google Scholar]
  • Macheriotou et al. (2019).Macheriotou L, Guilini K, Bezerra TN, Tytgat B, Nguyen DT, Phuong Nguyen TX, Derycke S. Metabarcoding free-living marine nematodes using curated 18S and CO1 reference sequence databases for species-level taxonomic assignments. Ecology and Evolution. 2019;9(3):1211–1226. doi: 10.1002/ece3.4814. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Manoel, Esteves & Neres (2022).Manoel A, Esteves AM, Neres PF. Two new species of Acantholaimus (Nematoda, Chromadoridae) from the deep southeastern Atlantic (Santos Basin) Zootaxa. 2022;5209(2):238–256. doi: 10.11646/zootaxa.5209.2.5. [DOI] [PubMed] [Google Scholar]
  • Maria et al. (2009).Maria TF, Esteves AM, Smol N, Vanreusel A, Decraemer W. Chromaspirina guanabarensis sp. n. (Nematoda: Desmodoridae) and a new illustrated dichotomous key to Chromaspirina species. Zootaxa. 2009;2092(1):21–36. doi: 10.11646/zootaxa.2092.1.2. [DOI] [Google Scholar]
  • Mehyadin et al. (2021).Mehyadin AE, Abdulazeez AM, Hasan DA, Saeed JN. Birds sound classification based on machine learning algorithms. Asian Journal of Research in Computer Science. 2021;9(4):1–11. doi: 10.9734/ajrcos/2021/v9i430227. [DOI] [Google Scholar]
  • Micoletzky (1924).Micoletzky H. Last report of free-living nematodes from Suez. Sber. Academic science Vienna Mathematics and natural sciences Class [Letzter Bericht über freilebende Nematoden aus Suez. Sber. Akad. Wiss. Wien Mathem.-naturw. Klasse. Abteilung I, Band 133 Heft] 4/6: 137–179. 1924. (In German)
  • Miljutin et al. (2010).Miljutin DM, Gad G, Miljutina MM, Mokievsky VO, Fonseca-Genevois V, Esteves AM. The state of knowledge on deep-sea nematode taxonomy: how many valid species are known down there? Marine Biodiversity. 2010;40(3):143–159. doi: 10.1007/s12526-010-0041-4. [DOI] [Google Scholar]
  • Miljutin & Miljutina (2016a).Miljutin DM, Miljutina MA. Review of Acantholaimus Allgén, 1933 (Nematoda: Chromadoridae), a genus of marine free-living nematodes, with a tabular key to species. Nematology. 2016a;18(5):537–558. doi: 10.1163/15685411-00002976. [DOI] [Google Scholar]
  • Miljutin & Miljutina (2016b).Miljutin DM, Miljutina MA. Intraspecific variability of morphological characters in the species-rich deep-sea genus Acantholaimus Allgén, 1933 (Nematoda: Chromadoridae) Nematology. 2016b;18(4):455–473. doi: 10.1163/15685411-00002970. [DOI] [Google Scholar]
  • Mincks et al. (2021).Mincks SL, Pereira TJ, Sharma J, Blanchard AL, Bik HM. Composition of marine nematode communities across broad longitudinal and bathymetric gradients in the Northeast Chukchi and Beaufort Seas. Polar Biology. 2021;44(1):85–103. doi: 10.1007/s00300-020-02777-1. [DOI] [Google Scholar]
  • Mitra et al. (2019).Mitra R, Marchitto TM, Ge Q, Zhong B, Kanakiya B, Cook MS, Lobaton E. Automated species-level identification of planktic foraminifera using convolutional neural networks, with comparison to human performance. Marine Micropaleontology. 2019;147(4):16–24. doi: 10.1016/j.marmicro.2019.01.005. [DOI] [Google Scholar]
  • Moreno et al. (2011).Moreno M, Semprucci F, Vezzulli L, Balsamo M, Fabiano M, Albertelli G. The use of nematodes in assessing ecological quality status in the Mediterranean coastal ecosystems. Ecological Indicators. 2011;11(2):328–336. doi: 10.1016/j.ecolind.2010.05.011. [DOI] [Google Scholar]
  • Natekin & Knoll (2013).Natekin A, Knoll A. Gradient boosting machines, a tutorial. Frontiers in Neurorobotics. 2013;7:21. doi: 10.3389/fnbot.2013.00021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Osborne (1963).Osborne DV. Some aspects of the theory of dichotomous keys. New Phytologist. 1963;62(2):144–160. doi: 10.1111/j.1469-8137.1963.tb06322.x. [DOI] [Google Scholar]
  • Pantó et al. (2021).Pantó G, Pasotti F, Macheriotou L, Vanreusel A. Combining traditional taxonomy and metabarcoding: assemblage structure of nematodes in the shelf sediments of the Eastern Antarctic Peninsula. Frontiers in Marine Science. 2021;8:1175. doi: 10.3389/fmars.2021.629706. [DOI] [Google Scholar]
  • Platt (1985).Platt HM. The free-living marine nematode genus Sabatieria (Nematoda: Comesomatidae). Taxonomic revision and pictorial keys. Zoological Journal of the Linnean Society. 1985;83(1):27–78. doi: 10.1111/j.1096-3642.1985.tb00872.x. [DOI] [Google Scholar]
  • Ridall & Ingels (2021).Ridall A, Ingels J. Suitability of free-living marine nematodes as bioindicators: status and future considerations. Frontiers in Marine Science. 2021;8:685327. doi: 10.3389/fmars.2021.685327. [DOI] [Google Scholar]
  • Rouville (1903).Rouville E. From Enumeration of free nematodes from the Bourdignes canal (This). [De Enumeration des Nematodes libres du canal des Bourdignes (Cette)] Comptes rendus des seances de la Societe de biologie et de ses filiales. 1903;55:1527–1529. (In French) [Google Scholar]
  • Rueffler et al. (2006).Rueffler C, Van Dooren TJ, Leimar O, Abrams PA. Disruptive selection and then what? Trends in Ecology & Evolution. 2006;21(5):238–245. doi: 10.1016/j.tree.2006.03.003. [DOI] [PubMed] [Google Scholar]
  • Sandulli, Semprucci & Balsamo (2014).Sandulli R, Semprucci F, Balsamo M. Taxonomic and functional biodiversity variations of meiobenthic and nematode assemblages across an extreme environment: a study case in a Blue Hole cave. Italian Journal of Zoology. 2014;81(4):508–516. doi: 10.1080/11250003.2014.952356. [DOI] [Google Scholar]
  • Schmidt-Rhaesa (2014).Schmidt-Rhaesa A. Handbook of zoology: Gastrotricha, Cycloneuralia and Gnathifera. Nematoda. Vol. 2. Berlin, Germany: De Gruyter; 2014. [Google Scholar]
  • Schratzberger & Ingels (2018).Schratzberger M, Ingels J. Meiofauna matters: the roles of meiofauna in benthic ecosystems. Journal of Experimental Marine Biology and Ecology. 2018;502:12–25. doi: 10.1016/j.jembe.2017.01.007. [DOI] [Google Scholar]
  • Sergeeva (1973).Sergeeva NG. New species of free-living nematodes from the order Chromadorida in the Black Sea (Novye Vidy Svobodnozhivushchikh Nematod Chernogo Moria iz otriada Chromadorida) Zoologicheskii Zhurnal. 1973;52(8):1238–1241. [Google Scholar]
  • Shaik & Srinivasan (2019).Shaik AB, Srinivasan S. A brief survey on random forest ensembles in classification model. International Conference on Innovative Computing and Communications; Singapore: Springer; 2019. pp. 253–260. [Google Scholar]
  • Shokoohi & Moyo (2022).Shokoohi E, Moyo N. Molecular character of Mylonchulus hawaiiensis and Morphometric differentiation of six Mylonchulus (Nematoda; Order: Mononchida; Family: Mylonchulidae) species using multivariate analysis. Microbiology Research. 2022;13(3):655–666. doi: 10.3390/microbiolres13030047. [DOI] [Google Scholar]
  • Shugar, Drake & Kelley (2021).Shugar AN, Drake BL, Kelley G. Rapid identification of wood species using XRF and neural network machine learning. Scientific Reports. 2021;11(1):1–10. doi: 10.1038/s41598-021-96850-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Spedicato et al. (2020).Spedicato A, Sánchez N, Pastor L, Menot L, Zeppilli D. Meiofauna community in soft sediments at TAG and snake pit hydrothermal vent fields. Frontiers in Marine Science. 2020;7(200):10. doi: 10.3389/fmars.2020.00200. [DOI] [Google Scholar]
  • Stock & Kaya (1996).Stock SP, Kaya HK. A multivariate analysis of morphometric characters of Heterorhabditis species (Nemata: Heterorhabditidae) and the role of morphometrics in the taxonomy of species of the genus. The Journal of Parasitology. 1996;82(5):806–813. doi: 10.2307/3283895. [DOI] [PubMed] [Google Scholar]
  • Sukumar (2014).Sukumar SR. Machine learning in the big data era: are we there yet. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Workshop on Data Science for Social Good (KDD); 2014. pp. 1–5. [Google Scholar]
  • Surmacz, Morek & Michalczyk (2020).Surmacz B, Morek W, Michalczyk Ł. What to do when ontogenetic tracking is unavailable: a morphometric method to classify instars in Milnesium (Tardigrada) Zoological Journal of the Linnean Society. 2020;188(3):797–808. doi: 10.1093/zoolinnean/zlz099. [DOI] [Google Scholar]
  • Tan et al. (2021).Tan HY, Goh ZY, Loh K, Then AY, Omar H, Chang S. Cephalopod species identification using integrated analysis of machine learning and deep learning approaches. PeerJ. 2021;9(7):e11825. doi: 10.7717/peerj.11825. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Thevenoux et al. (2021).Thevenoux R, Van Linh LE, Villessèche H, Buisson A, Beurton-Aimar M, Grenier E, Parisey N. Image based species identification of Globodera quarantine nematodes using computer vision and deep learning. Computers and Electronics in Agriculture. 2021;186:106058. doi: 10.1016/j.compag.2021.106058. [DOI] [Google Scholar]
  • Tumanov (2020).Tumanov DV. Analysis of non-morphometric morphological characters used in the taxonomy of the genus Pseudechiniscus (Tardigrada: Echiniscidae) Zoological Journal of the Linnean Society. 2020;188(3):753–775. doi: 10.1093/zoolinnean/zlz097. [DOI] [Google Scholar]
  • Valentini, Pompanon & Taberlet (2009).Valentini A, Pompanon F, Taberlet P. DNA barcoding for ecologists. Trends in Ecology & Evolution. 2009;24(2):110–117. doi: 10.1016/j.tree.2008.09.011. [DOI] [PubMed] [Google Scholar]
  • Vanreusel, Fonseca & Danovaro (2010).Vanreusel A, Fonseca G, Danovaro R. The contribution of deep-sea macrohabitat heterogeneity to global nematode diversity. Marine Ecology. 2010;31(1):6–20. doi: 10.1111/j.1439-0485.2009.00352.x. [DOI] [Google Scholar]
  • Venekey et al. (2019).Venekey V, Gheller P, Kandratavicius, Cunha BP, Vilas-Boas AC, Fonseca G, Maria TF. The state of the art of Chromadoridae (Nematoda, Chromadorida): a historical review, diagnoses and comments about valid and dubious genera and a list of valid species. Zootaxa. 2019;4578(1):1–67. doi: 10.11646/zootaxa.4578.1.1. [DOI] [PubMed] [Google Scholar]
  • Vieira & Fonseca (2022).Vieira DC, Fonseca G. iMESc: an interactive machine learning app for environmental science (imesc_v2.2) Zenodo. 2022 doi: 10.5281/zenodo.6484391. [DOI] [Google Scholar]
  • Wäldchen & Mäder (2018).Wäldchen J, Mäder P. Machine learning for image-based species identification. Methods in Ecology and Evolution. 2018;9(11):2216–2225. doi: 10.1111/2041-210X.13075. [DOI] [Google Scholar]
  • Walter & Winterton (2007).Walter DE, Winterton S. Keys and the crisis in taxonomy: extinction or reinvention? Annual Review of Entomology. 2007;52(1):1193–1208. doi: 10.1146/annurev.ento.51.110104.151054. [DOI] [PubMed] [Google Scholar]
  • Warrens (2015).Warrens MJ. Five ways to look at Cohen’s kappa. Journal of Psychology & Psychotherapy. 2015;5(4):1. doi: 10.4172/2161-0487.1000197. [DOI] [Google Scholar]
  • Weiss (1995).Weiss DJ. Polychotomous or polytomous?. University of Minnesota. Applied Psychological Measurement. 1995;19:4. doi: 10.1177/014662169501900102. [DOI] [Google Scholar]
  • Wieser (1954).Wieser W. Free-living marine nematodes II. Chromadoroidea. Acta Universitatis Lundensis. 1954;50(16):1–148. [Google Scholar]
  • Yan & Zhu (2022).Yan X, Zhu H. A novel robust support vector machine classifier with feature mapping. Knowledge-Based Systems. 2022;257(3):109928. doi: 10.1016/j.knosys.2022.109928. [DOI] [Google Scholar]
  • Yang et al. (2019).Yang P, Guo Y, Chen Y, Lin R. Four new free-living marine nematode species (Sabatieria) from the Chukchi Sea. Zootaxa. 2019;4646(1):31–54. doi: 10.11646/zootaxa.4646.1.2. [DOI] [PubMed] [Google Scholar]
  • Zeppilli et al. (2019).Zeppilli D, Bellec L, Cambon-Bonavita MA, Decraemer W, Fontaneto D, Fuchs S, Sarrazin J. Ecology and trophic role of Oncholaimus dyvae sp. nov. (Nematoda: Oncholaimidae) from the lucky strike hydrothermal vent field (Mid-Atlantic Ridge) BMC Zoology. 2019;4(1):1–15. doi: 10.1186/s40850-019-0044-y. [DOI] [Google Scholar]
  • Zhai, Wang & Huang (2020).Zhai H, Wang C, Huang Y. Sabatieria sinica sp. nov. (Comesomatidae, Nematoda) from Jiaozhou Bay. China Journal of Oceanology and Limnology. 2020;38(2):539–544. doi: 10.1007/s00343-019-9030-z. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Information 1. List of valid Acantholaimus species according to Worms.

The green color indicates species that were excluded from the analysis due to poor taxonomical descriptions either by the absence of information of characters or were limited to a single specimen.

DOI: 10.7717/peerj.16216/supp-1
Supplemental Information 2. List of valid Sabatieria species according to Worms.

The green color indicates species that were excluded from the analysis due to poor taxonomical descriptions either by the absence of information of characters or were limited to a single specimen.

DOI: 10.7717/peerj.16216/supp-2
Supplemental Information 3. Number of individuals for Acantholaimus species Classification.

The number of individuals required for carrying out the classification of Acantholaimus species.

DOI: 10.7717/peerj.16216/supp-3
Supplemental Information 4. Number of individuals for Sabatieria species classification.

The number of individuals required for carrying out the classification of Sabatieria species.

DOI: 10.7717/peerj.16216/supp-4
Supplemental Information 5. Savepoint_acantholaimus.

Analytical details and all data.

DOI: 10.7717/peerj.16216/supp-5
Supplemental Information 6. Savepoint_sabatieria.

Analytical details and all data.

DOI: 10.7717/peerj.16216/supp-6

Data Availability Statement

The following information was supplied regarding data availability:

The datasets and the model’s results and outputs are available in the Supplemental Files.


Articles from PeerJ are provided here courtesy of PeerJ, Inc

RESOURCES