Ultraconserved Elements and Machine Learning Classifiers Enable Robust Phylogenetics and Taxonomy in Model and Non‐Model Nematodes

Laura Villegas; Lucy Jimenez; Joëlle van der Sprong; Oleksandr Holovachov; Ann‐Marie Waldvogel; Philipp H Schiffer

doi:10.1111/1755-0998.70046

. 2025 Oct 8;25(8):e70046. doi: 10.1111/1755-0998.70046

Ultraconserved Elements and Machine Learning Classifiers Enable Robust Phylogenetics and Taxonomy in Model and Non‐Model Nematodes

Laura Villegas ^1,^✉, Lucy Jimenez ¹, Joëlle van der Sprong ², Oleksandr Holovachov ³, Ann‐Marie Waldvogel ⁴, Philipp H Schiffer ^1,^✉

PMCID: PMC12550484 PMID: 41060247

ABSTRACT

Nematodes are among the most diverse animals, yet only around 28,000 of an estimated one million species have been morphologically described. Their small size, morphological simplicity, and cryptic diversity complicate phylogenetic analyses. Traditional morphological and single‐locus molecular approaches often lack resolution for both recent and ancient divergences. To address these limitations, we developed the first ultraconserved elements (UCEs) probe sets for two nematode families: Panagrolaimidae, a group of non‐model organisms with limited genomic resources when compared to model taxa, and Rhabditidae, which includes the model species Caenorhabditis elegans . Our probe sets targeted 1612 loci for Panagrolaimidae and 100,397 for Rhabditidae. In vitro testing recovered up to 1457 loci in Panagrolaimidae, supporting robust phylogenetic reconstruction. Results were largely consistent with previous analyses, except for one strain reclassified as Neocephalobus halophilus BSS8. Using machine learning, we determined the minimum number of loci needed for accurate genus‐level classification. For Rhabditidae, XGBoost achieved high accuracy with just 46 loci. For Panagrolaimidae, 39 loci were most informative. Our UCE‐based approach offers a scalable and cost‐effective framework for phylogenomics, enhancing taxonomic resolution and evolutionary inference in nematodes. It is well suited for biodiversity assessments and shallow, field‐based sequencing, expanding research possibilities across this ecologically important phylum.

Keywords: genus classification, machine learning, Nematoda, Phylogenomics, ultraconserved element

1. Introduction

Nematodes (Phylum Nematoda) are one of the most diverse and abundant animal taxa, occupying nearly all ecosystems on earth and playing crucial ecological roles, such as regulating microbial and fungal communities, facilitating nutrient cycling, and contributing to soil health and decomposition processes as well as being parasites of plants and animals, including humans (Blaxter 2011; van den Hoogen et al. 2020). Nematode taxonomic diversity remains poorly characterised, with recent estimates of the total number of species in the phylum ranging between half a million and up to 10 million species (Hodda 2022). As only a little over 28,000 species are formally described, an immense ‘taxonomic gap’ not only limits our understanding of biodiversity, but also constrains our ability to explore the evolutionary history and diversification of this group. Better resolution of nematode phylogeny is furthermore crucial for advancing the understanding of ecological dynamics, for example, in soils and sediments, where nematode diversity can be linked to patterns on the environmental scale (Villegas et al. 2024).

Resolving phylogenetic placement of species is the basis for reconstructing evolutionary relationships, shedding light on lineage diversification, biogeography, and adaptation. Current approaches, such as morphology‐based classification, are often limited by cryptic diversity, convergent evolution, and the need for specialised taxonomic expertise. Similarly, traditional molecular markers (e.g., 18S rRNA, 28S rRNA, COI and others) may lack sufficient phylogenetic resolution to reliably distinguish closely related species, while some nematode taxa exhibiting particularly high molecular evolution rates can, in turn, produce skewed or incongruent topologies in phylogenetic reconstructions (Blaxter et al. 1998; Van Megen et al. 2009). This issue highlights the importance of developing new methods for phylogenetically robust species placement, with a focus on the careful selection of informative loci. Several sequencing methods have emerged as a complement or alternative to amplicon sequencing, like whole genome sequencing (WGS), transcriptome sequencing, target sequence capture, and restriction‐site associated sequencing (RAD‐seq). Each of these methods offers distinct advantages. WGS enables the comprehensive investigation of complex genomic traits and adaptive processes (Lu et al. 2025), allowing the study of coding and non‐coding regions and the analysis of structural variation. However, this comes at increased costs, and data analysis can be computationally intensive. RAD‐seq provides a lower cost alternative for population and species‐level studies when compared to whole genome approaches by focusing on a reduced subset of the genome (Andrews et al. 2016); however, determining orthology relationships remains challenging. Missing data due to mutations at restriction sites and potential non‐independence of adjacent loci due to linkage disequilibrium are some of the potential drawbacks of this approach (Rubin et al. 2012).

For targeted sequence capture methods, specific genomic regions are enriched for sequencing. In this way, a large set of orthologous loci can be selected with probes, making the method particularly useful for phylogenetics and evolutionary studies (Jones and Good 2015). However, a major drawback is the requirement of prior genomic or transcriptomic data availability for probe design. Consequently, successful implementation depends on the availability of high‐quality genomes, which may be limited for non‐model organisms. One common family of bait sets targets highly conserved genomic regions, among them ultraconserved elements (UCEs), which recover sets of loci that are highly conserved and thus can be captured across divergent groups of organisms. The presence of highly conserved regions (cores) that are anchored between highly variable informative sites (flanking regions) makes it a cost‐efficient approach that captures both deep phylogenetic relationships and recent divergences, enabling high‐resolution phylogenomic reconstructions (Faircloth et al. 2012). UCEs' approaches have been successfully implemented in a wide range of lineages (Blaimer et al. 2015; Erickson et al. 2020; Gilbert et al. 2015; Quattrini et al. 2017; van der Sprong et al. 2023; Winker et al. 2018). Beyond evolutionary studies, UCE‐based analyses can also provide critical insights into ecologically relevant patterns, such as population structure, connectivity, and demography, which are essential for conservation planning and bio‐monitoring efforts (Duckett et al. 2023).

Ultimately, the most suitable method depends on the focus taxo nomic group and specific research objectives, as different approaches vary in their effectiveness depending on genome complexity, evolutionary scale, and data requirements. In this study, we designed two ultraconserved elements (UCEs) bait sets for nematodes, particularly focusing on the Rhabditidae (to which the model nematode C. elegans belongs) and Panagrolaimidae (non‐model species) families. Using Rhabditidae for method benchmarking, we obtained phylogenetic reconstructions based on UCE loci that were congruent with those previously proposed in the literature based on orthologs from genome‐wide analysis. For the Panagrolaimidae family, for which only a few genomic resources are available compared to Rhabditidae, we show that using UCEs and target capture provides a high‐resolution phylogenetic reconstruction. Additionally, we developed a machine learning framework to identify the most informative UCEs for the taxonomic assignment of genera, offering a scalable and efficient tool for future ecological and evolutionary studies.

The genomic era has generated high‐dimensional datasets that pose significant challenges to traditional taxonomy, particularly in groups with high diversity, like nematodes (Blaxter 2011; Hodda 2022). To effectively interpret these complex data, it is essential to adopt computational approaches that can uncover intricate patterns within genomic variation. The application of machine learning (ML) techniques offers a promising way to analyse UCE presence–absence and highlights the ability of ML algorithms to leverage and identify informative features and build robust predictive models (Libbrecht and Noble 2015). Notably, presence–absence data have also proven effective for phylogenetic inference, recovering meaningful evolutionary signal even in complex datasets (Natsidis et al. 2021). The necessity to select relevant features within these complex datasets requires efficient feature selection methods, which allow for improved model interpretability and reduced dimensionality without compromising accuracy (Guyon and Elisseeff 2003; Saeys et al. 2007). Therefore, the integration of ML and UCE analysis represents a promising strategy to address taxonomic challenges in nematodes, offering a scalable and objective framework for genus‐level classification (Lv et al. 2023). The use of presence–absence patterns of UCEs in this context is of particular interest, as these binary patterns may carry important phylogenetic signal that could improve genus‐level classification and contribute to a more accurate understanding of evolutionary relationships. This approach is especially valuable in groups where traditional taxonomic expertise is limited or insufficient, providing a new way to explore complex phylogenetic structures within nematodes.

2. Materials and Methods

2.1. Phylogenomic Analysis of the Family Panagrolaimidae: In Silico and in Vivo Testing

2.1.1. Bait Set Design and in Silico Testing

2.1.1.1. Bait Set Design

Phyluce 1.7.2 was used for the bait set design; the “Tutorial IV: Identifying UCE Loci and Designing Baits To Target Them” was followed (phyluce.readthedocs) (Faircloth 2015; Faircloth et al. 2012). Eight base genomes were tested for the bait set design: Panagrolaimus sp. PS1159 (GenBank accession number: GCA_901765195.1), Panagrolaimus sp. ES5, Panagrolaimus sp. PS1579 (GenBank accession number: GCA_901779485.1), Panagrolaimus kolymaensis (GenBank accession number: GCA_028622995.1), Panagrolaimus sp. LJ2400 (GenBank accession number: GCA_024447215.1), Panagrolaimus sp. LJ2406 (GenBank accession number: GCA_024447205.1), Panagrolaimus sp. LJ2414 (GenBank accession number: GCA_024447195.1), and Panagrolaimus superbus (GenBank accession number: GCA_901766145.1) (Figure 1).

Overview of the workflow implemented in this study. Bait sets were designed and tested for both the Panagrolaimidae and Rhabditidae families. From UCEs harvested in the different families with various approaches, a presence/absence matrix was obtained to train and test machine learning classifiers to identify the most important UCEs for genus‐level classification. For Panagrolaimidae, the model was further tested using UCE data extracted from genome skims generated with Nanopore sequencing. Graph generated based on Canva template from elversa.

To test whether ultraconserved elements (UCEs) could be retrieved for nematodes, assessed for their ability to produce robust phylogenetic reconstructions, and evaluated for the existence of a minimal set suitable for taxon identification or assignment, two nematode groups were analyzed: the family Rhabditidae, which includes the model species C. elegans, and the family Panagrolaimidae (Figure 1).

Bait sets were individually designed using each of the base genomes and further assessed based on the quality of the base genome, the number of conserved loci consistently found between exemplar taxa, and the final probe count (for further synthesis).

Illumina reads for each of the strains used as base genomes were obtained through the Sequence Read Archive (SRA), with the exception of Panagrolaimus kolymaensis, which was de novo sequenced using a NovaSeq Illumina platform 2 × 150 bp by Illumina UK Company Limited. Publicly available re‐sequencing data for the other strains tested were obtained through the Sequence Read Archive (SRA) (Table S3).

Reads were mapped against the base genomes using stampy (v 1.0.32) (Lunter and Goodson 2010), allowing for sequence divergence ≤ 5. Unmapped reads were removed using samtools (v 1.18) (Danecek et al. 2021). Filtered BAM files were converted to bed format, and overlapping intervals were merged using bedtools (v 2.31.1) (Quinlan and Hall 2010) for putatively conserved regions. Repetitive intervals and ambiguous bases were removed using phyluce_probe_strip_masked_loci_from_set, intervals shorter than 80 bp and 25% of the regions were masked using the same tool. Sequences with the identified UCEs were extracted using phyluce_probe_get_genome_sequences_from_bed specifying a buffer region of 160 bp. Temporary bait sets targeting those sequences were obtained using phyluce_probe_get_tiled_probes with a tilling density set to three, using the ‐masking, ‐remove‐gc flags, potentially problematic baits with over 25% repeat content and GC content outside of 30%–70% were also removed. Duplicate baits were screened and removed using phyluce_probe_easy_lastz and phyluce_probe_remove_duplicate_hits_from_probes_using_lastz.

The baits were then aligned against each of the tested base genomes using phyluce_probe_run_multiple_lastzs_sqlite with an identity value of 50%. Baits that matched multiple contigs were removed. Using phyluce_probe_slice_sequence_from_genomes, fasta sequences from each exemplar taxon were extracted, buffering each locus to 180 bp. The final bait design was obtained using phyluce_probe_get_tiled_probe_from_multiple_inputs. The baits were designed with a tilling density of three, baits with 25% masking were removed, and two probes targeting each locus were designed. Duplicate baits were removed (with an identity value of 50%) using phyluce_probe_easy_lastz and phyluce_probe_remove_duplicate_hits_from_probes_using_lastz.

The selection of the preferred bait set was based on three criteria: the quality of the base genome, the total number of loci targeted, and the total number of probes (see “ProbeDesignPS1159_selection” in the Zenodo repository—10.5281/zenodo.15395838). The different bait lists were further examined by the bioinformatics team from the company BIOCAT to confirm the number of baits (sequences), GC content, and size of the probes. The selected bait set was synthesized using myBaits Custom (1–20,000) target capture kit by Daicel Arbor BioSciences (Ann Arbor, MI, USA).

2.1.2. Sampling, DNA Extraction and Sequencing

Nematodes were harvested from cultures in agar plates by washing off plates with 2 mL of nuclease‐free water, pelleted by centrifuging at 4 degrees for 5 min at 3000 g. Nematodes were kept as laboratory cultures at 15°C on low‐nutrient agar plates inoculated with OP50 (E. coli). Reproductive mode assessment has previously been performed by (Lewis et al. 2009; Villegas et al. 2024) on a set of the strains here tested. DNA extractions for in vitro testing of the baits were performed using the Quick‐DNA Microprep Plus Kit (catalogue number: D4074) manufacturer's protocol for tissue samples. The protocol was modified by increasing the proteinase K volume to 20 μL per sample and incubating overnight for 20 h. One entire plate was used per strain for DNA extraction. After DNA extraction for 21 strains (Table S5), library preparation and sequencing were done at the Cologne Genome Center for Genomics (Table S6). DNA extractions, library preparation, and sequencing were performed following the Arbor + TruSeq Nano DNA protocol (version 02.2024). For DNA fragmentation, 600 ng of high‐quality DNA per sample was processed using a Bioruptor (Diagenode) with a target fragment size of 350 bp (range: 310–350 bp), and quality was assessed using a 4200 TapeStation (Agilent) with D1000 ScreenTape. Library preparation was performed using the TruSeq Nano DNA Library Prep Kit (Illumina) with 100 ng of fragmented DNA and eight PCR cycles. Library quality was checked with a 4200 TapeStation (Agilent) D1000 ScreenTape. Pooled libraries were prepared with a total of 2 μg DNA per pool, corresponding to either seven samples at approximately 223 ng each or nine samples at approximately 286 ng each, and volume was reduced to 7 μL by evaporation. Hybridization was performed for 24 h at 62°C, followed by capture wash steps at 62°C and elution in 25 μL of Buffer E. Post‐capture PCR was performed using reagents from the TruSeq Nano DNA Kit, using 10 PCR cycles following the Arbor‐PCR program (initial denaturation at 95°C for 3 min, followed by 10 cycles of 98°C for 20 s, 60°C for 30 s, 72°C for 45 s, and a final extension at 75°C for 5 min). Final libraries were cleaned using a 1:1 ratio of AMPure XP beads and eluted in 30 μL of EB buffer, with quality control performed using a 4200 TapeStation (Agilent). Sequencing was carried out on a NovaSeq 6000 Illumina sequencer (S4 flowcell) with 2 × 151 bp paired‐end reads.

2.1.3. Analysis of Sequencing Data: In Vivo Testing and Phylogenomic Reconstruction

Sequencing data resulting from target capture was analysed following the “Tutorial I: UCE Phylogenomics” (phyluce.readthedocs.io) with Phyluce 1.7.3. Sequencing adapters were trimmed using Cutadapt (v 4.9) (Martin 2011). Quality of the trimmed reads was assessed using FastQC (v 0.11.2) (Andrews 2010). Assemblies for each strain were conducted using Spades (v 3.14.1) (Prjibelski et al. 2020) incorporated in phyluce (phyluce_assembly_assemblo_spades). Quality of the assemblies was checked using phyluce_assembly_get_fasta_lengths. Genomes available through NCBI for the Panagrolaimidae family were used to obtain a more comprehensive phylogenetic reconstruction of this family and test the robustness of using UCEs as a smaller subset of the genome for phylogenetic reconstructions (Table S4). UCE loci from genomes were harvested following the “Tutorial III: Harvesting UCE Loci From Genome”(phyluce.readthedocs). Genomes were converted into 2bit format using faToTwoBit (v 445 and 472) (Kuhn et al. 2012). The UCE probe set was aligned to the genomes, and the fasta sequences corresponding to the loci were extracted using phyluce_probe_run_multiple_lastzs_sqlite and phyluce_probe_slice_sequence_from_genomes. The extracted FASTA sequences were then processed along with the UCE loci obtained from the Spades assemblies for the sequenced target capture data.

For known triploid taxa, we used the program dedupe.sh from the BBMap suite (Bushnell 2014). This tool compares all sequences to each other and removes those that are highly similar, keeping only the longest version of each sequence. We set the minimum identity threshold to 90% (minidentity = 90), meaning that any sequences sharing 90% or more nucleotide identity were considered duplicates and collapsed into a single representative. This step was used to avoid including multiple copies of the same UCE locus that may result from allelic variation, assembly errors, or the presence of homologous copies in triploid genomes. We then matched the assembled contigs and available genomes of each strain to the UCE baits using phyluce_assembly_match_contigs_to_probes. The UCE loci were then extracted using phyluce_assembly_get_fastas_from_match_counts. In the case of Panagrolaimus kolymaensis, the contigs were fragmented into 2000 bp long, and the assembly file was split into three different files using a round‐robin method since harvesting UCEs from this genome yields a high number of duplicated UCEs due to the triploid nature, and the deduping method was not useful.

The UCE loci, both harvested from genomes and from target capture data, were aligned and edge‐trimmed using phyluce_align_seqcap_align. The alignments were then cleaned using phyluce_align_remove_locus_name‐ _from_files. Subsequently, a data matrix of 60% occupancy was created using phyluce_align_get_only_loci_with_min_taxa (each UCE that was present in at least 60% of the taxa analysed). The alignments were concatenated using phyluce_align_concatenate_alignments. The resulting alignment was used as input for IQ‐tree (v 2.3.6) (Minh et al. 2020); a maximum likelihood inference was made using ultra‐fast bootstrap (1000 bootstrap replicates) with the Model Finder Plus feature for automatic selection of the best substitution model for the dataset. The resulting phylogenetic tree was visualised using TreeViewer (v 2.2.0) (Bianchini and Sánchez‐Baracaldo 2024) using the Further transformation module to root the tree and rename labels from accession number and sequencing code to strain or species name, and the Plot actions module to display the legend with bootstrap values and add the scale bar.

2.1.4. Species Reassessment Based on Morphological and Molecular Evidence

For the strain previously identified as Panagrolaimus detritophagus BSS8, morphometric measurements were taken and compared with those of closely related species to accurately determine its taxonomic identity. Measures included, among others, body length, vulva location, lip region width, stoma length, corpus length, pharyngeal region length, and tail length in both females and males.

For light microscopy, specimens were relaxed using heat, fixed in a cold 4% formaldehyde solution, gradually transferred to pure glycerine using a slow evaporation method, and mounted on permanent slides in glycerine, with paraffin wax used to support the coverslip. Nine male and nine female individuals isolated from current cultures were compared to closely related species: Neocephalobus halophilus Paetzold (1958), Neocephalobus aberrans Steiner (1929), and Panagrolaimus orthomici Korentchenko (1992), as well as to the description of the original wild population of the same “Panagrolaimus population II” (Bostrom 1988), see Supporting Information.

2.2. Ultraconserved Elements for Accurate Species Delimitation: Test Case Using Caenorhabditis

2.2.1. Bait‐Set Design and in Silico Testing

Phyluce 1.7.3 was used for the bait set design. The “Tutorial IV: Identifying following UCE and Designing Baits To Target Them” was followed from the official documentation phyluce.readthedocs. Only one base genome was tested in this case; this corresponded to Caenorhabditis elegans (Bristol N2) (GenBank accession number: GCA_000002985.3). The probe design was performed in the same way as for Panagrolaimidae, with the exception of using artificially generated reads in this case, using illumina_art as specified in the phyluce tutorials instead of real sequencing data. For the generation of this family's bait set, the following Caenorhabditis species were used to identify putatively conserved regions: Caenorhabditis brenneri (GenBank accession number: GCA_964036135.1), Caenorhabditis briggsae (GenBank accession number: GCA_021491975.1), Caenorhabditis elegans (Bristol N2) (GenBank accession number: GCA_000002985.3), Caenorhabditis nigoni (GenBank accession number: GCA_002742825.1), Caenorhabditis remanei (GenBank accession number: GCA_001643735.4), and Caenorhabditis tropicalis (GenBank accession number: GCA_016735795.1).

The master bait set list was then used to harvest UCEs from 199 genomes available for Rhabditidae through NCBI for the genera Oscheius, Diploscapter, Auanema, and Caenorhabditis (Table S12).

2.2.2. Phylogenetic Reconstruction of Caenorhabditis

To test for the accuracy of phylogenetic reconstructions within Rhabditidae using ultraconserved elements compared to previously reported methods (e.g., orthologous genes), a phylogeny for Caenorhabditis species and Rhabditidae species was reconstructed. A data matrix of 75% occupancy was created using phyluce_align_get_only_loci_with_min_taxa (each UCE that was present in at least 75% of the taxa analysed). The alignments were concatenated using phyluce_align_concatenate_alignments. The resulting alignment was used as input for IQ‐TREE (v 2.3.6), and a maximum likelihood inference was obtained (1000 ultrafast bootstrapping) using the Model Finder Plus feature for selecting the best substitution model for the dataset. The resulting phylogenetic tree was compared to (Stevens et al. 2019) phylogenetic reconstruction through a tanglegram obtained using ape (v 5.8‐1) (Paradis and Schliep 2019) and phytools (v 2.4‐4) (Revell 2024).

2.3. Classification Model With Ultraconserved Elements (UCEs)

2.3.1. UCE Data Preparation and Preprocessing

To analyse the taxonomic signal captured by UCEs, a multi‐step workflow was implemented. First, UCE identifiers were extracted from species‐specific FASTA files for nematodes within the families Panagrolaimidae and Rhabditidae. These identifiers were then used to construct binary presence‐absence matrices, indicating whether each UCE was detected in each strain. It is essential to note that, although we refer to species throughout this section, the dataset comprises strains, with some species represented by multiple strains. Once the presence‐absence matrix was constructed, an exploratory data analysis (EDA) was conducted to examine the patterns of the UCE presence across the samples, including the number of UCEs shared among genera and the variability within genera.

To facilitate subsequent machine learning analyses, strains were assigned to their respective genera. This taxonomic assignment was performed using accession numbers retrieved from the European Nucleotide Archive (ENA). The final output consists of genus‐level UCE presence–absence matrices, which are the foundation for predictive modelling. For a detailed description of the extraction steps, matrix construction, and taxonomic assignment, refer to Supporting Information A.3, where we provide full methodological details, dataset summaries, and the associated scripts.

The data sets used for training and evaluation were derived from genus‐level UCE presence‐absence matrices, where genera with insufficient representation (frequency ≤ 1) and rows with missing values were removed. After filtering, the Rhabditidae dataset consisted of 197 samples, 8336 features (UCEs), and four genera: Oscheius, Diploscapter, Auanema, and Caenorhabditis. For Panagrolaimidae, 1595 predictors (UCEs) were analysed. Following the removal of genera with insufficient representation and rows with missing values, the Panagrolaimidae dataset (including the outgroup from Cephalobidae—Acrobeloides) contained 49 samples and five genera: Acrobeloides, Halicephalobus, Panagrellus, Panagrolaimus, and Propanagrolaimus. In both datasets, UCE columns were used as predictors (features), and the genus column was converted to a factor for classification (see Table 1).

TABLE 1.

Summary of UCE datasets for machine learning.

Family	Genera		Features	Samples
Rhabditidae	Oscheius	47	8336	197
	Diploscapter	3
	Auanema	8
	Caenorhabditis	139
Panagrolaimidae	Acrobeloides	6	1595	49
	Halicephalobus	3
	Panagrellus	3
	Panagrolaimus	35
	Propanagrolaimus	2

Open in a new tab

2.3.2. Model Training, Evaluation, and Feature Selection

The Rhabditidae dataset, which contained the largest number of features and entries, was used to evaluate four machine learning models in R: Random Forest (RF), Logistic Regression (LR), k‐Nearest Neighbours (k‐NN), and Extreme Gradient Boosting (XGBoost). A stratified partitioning strategy, implemented using the caret package (Kuhn 2008), was employed to divide the data into training (65%) and testing (35%) subsets, preserving the original distribution of genera classes in both sets. Class weights were calculated and incorporated into the Random Forest (RF) and XGBoost models to address class imbalance. Model training utilized five‐fold cross‐validation, with performance metrics calculated using the multiClassSummary function from the caret package. Random Forest was trained by optimizing the mtry parameter with class weights applied, using the randomForest package (Liaw and Wiener 2002). Logistic Regression with PCA involved dimensionality reduction through PCA, followed by feature scaling and regularization via the glmnet package (Friedman et al. 2010). k‐Nearest Neighbours was implemented using default parameters, using the class package (Venables and Ripley 2002). Extreme Gradient Boosting was trained with class weights incorporated into the learning process, using the xgboost package (Chen and Guestrin 2016). Performance metrics, including overall accuracy, kappa statistics, sensitivity, and specificity, were used to compare model performance. A confusion matrix provided detailed insight into classification accuracy for each genus. To identify the most relevant features for nematode genera classification, we employed a feature selection approach based on the importance scores generated by the XGBoost model using the xgboost package (Chen and Guestrin 2016). Specifically, we retained only those features that exhibited a non‐zero “Overall” importance score, as determined by the varImp function in the caret package (Kuhn 2008).

2.3.3. Optimised Model Application to Panagrolaimidae Data

To evaluate the performance of the feature‐selected XGBoost model in classifying nematode genera within the Panagrolaimidae family, we applied it to the corresponding UCE presence‐absence dataset. The “Genera” column was converted to a factor, and genera represented by single occurrences were removed. Missing values were handled using “na.omit()”. The resulting dataset, which included the following sample sizes per genus: Acrobeloides (6), Halicephalobus (3), Panagrellus (3), Panagrolaimus (35), and Propanagrolaimus (2), exhibited class imbalance (see Table 1). To address this, a stratified split of 65% training and 35% testing was performed using createDataPartition from the caret package (Kuhn 2008), with a fixed seed and manual adjustment to ensure proportional representation. Class weights, inversely proportional to genus frequency, were applied. The XGBoost model was trained using the train function from caret with the xgbTree method, employing five‐fold cross‐validation (cv, number = 5) and multiClassSummary for performance evaluation. The Area Under the Curve (AUC) was used as the primary performance metric (metric = “AUC”). Subsequently, the model was retrained using only the most important features, identified by an overall importance value greater than 0, to reduce dataset dimensionality. The test set AUC was then calculated using multiclass.roc from the pROC package (Robin et al. 2011). A confusion matrix was generated to visualize the model's classification performance.

3. Results

3.1. Bait Design

The bait set design for Panagrolaimidae was conducted by testing eight distinct base genomes and evaluating the number of conserved loci that were shared between one and nine taxa (the eight Panagrolaimus taxa from the base genomes and an outgroup of the Propanagrolaimus genus). The number of loci targeted across different bait sets varied from 0 to 12,740. To ensure a balance between taxonomic representation and the number of loci targeted, only bait sets that included more than 1000 loci shared among six or seven taxa were selected for further analysis. In these cases, the number of targeted loci ranged from 1096 to 2080, with the corresponding number of probes ranging from 14,018 to 24,217 prior to duplicate removal. After duplicates were removed, the final probe count ranged from 13,565 to 24,006. The Panagrolaimidae bait set was tested on the Rhabditidae family; however, no significant amount of loci could be retrieved. Therefore, a family‐specific bait set was designed for Rhabditidae. The preferred bait set for synthesis ultimately targeted 1612 loci using 18,789 probes.

The bait set designed for Rhabditidae was designed using the Caenorhabditis elegans Bristol N2 genome as a base. For the test bait set, six Caenorhabditis species were used. The preferred bait set in this case targets 10,397 loci shared among the six datasets used. The number of probes prior to duplicate removal was 124,612, while the final probe count was 121,966.

3.2. In Vitro and in Silico Testing Panagrolaimidae

SPAdes assemblies of target capture data ranged between 42,479 and 28,7752 contigs (mean 88,628.05 ± 56,482.50529) for Panagrolaimidae exemplars. The two assemblies from the outgroup specimens (Cephalobidae) were more fragmented than in the Panagrolaimidae exemplars (1,121,821 and 1,237,255 contigs). In general, the mean length of the contigs ranged between 215.19 and 1639.88 base pairs (mean 531.86 ± 269.55).

In total, 51 data sets were analysed and used to reconstruct the phylogeny of the Panagrolaimidae family, including data sets from the outgroup family Cephalobidae (Acrobeloides). In total, 1572 loci with 694,995 informative sites were used, a mean of 442.11 sites per locus, a 95% confidence interval of 8.11, a minimum of 0 sites and a maximum of 1593 sites, with 396 alignments out of 1572 containing more than 0.6 proportion of taxa (n = 30) (alignments can be found in the Zenodo repository—10.5281/zenodo.15395838).

In the Panagrolaimidae analysis, a total of 49–84 ultraconserved elements (UCEs) were retrieved from the outgroup and utilised for phylogenomic analysis, regardless of the data source (target capture or available genome assemblies). Within the genus Panagrolaimus, the number of UCEs retrieved and incorporated into the phylogenomic analysis ranged from 479 to 1457. Specifically, in the target capture dataset, 66–80 UCEs were captured in the outgroup, while 277–1200 UCEs were recovered for Panagrolaimus. There were 84 UCEs commonly found between all genera analysed, and 500 UCEs uniquely found in Panagrolaimus.

3.3. Morphological and Molecular Evidence for Species Reassessment

Observations of adult individuals, including both females and males of a recent culture of the strain previously referred to as Panagrolaimus detritophagus BSS8, revealed that the nematode originally identified as Panagrolaimus “population II” in Boström (1988) and subsequently as Panagrolaimus detritophagus BSS8 and Panagrolaimus BSS8 is in fact Neocephalobus halophilus Paetzold 1958. The main diagnostic feature of Neocephalobus, compared to Panagrolaimus, is the presence of distinct papilliform precloacal sensillum in males and its location some distance anterior to the cloacal opening (Bhat et al. 2025). Additional distinct morphometric features include a longer tail in both males and females in Neocephalobus when compared to Panagrolaimus, greater pharyngeal region length in Neocephalobus, and greater body length–body width ratio at mid‐body in Neocephalobus when compared to Panagrolaimus Supporting Information A.2. Furthermore, based on UCEs it is placed as a close relative to Halicephalobus species and Propanagrolaimus rather than Panagrolaimus with a bootstrap support value of 100 (Figure 2), in agreement with recently published single‐gene phylogenies based on several markers (Bhat et al. 2025).

Phylogenetic reconstruction of Panagrolaimidae (*Acrobeloides* is an outgroup) using maximum likelihood, 51 taxa are included in the analysis. Ultraconserved elements (UCEs) from the target capture approach are highlighted in aquamarine, while UCE data harvested from available genomes are highlighted in black. The phylogeny is based on 396 alignments. Only bootstrap values below 99 are shown in grey and orange.

3.4. In Silico Testing Rhabditidae

The bait set designed with Caenorhabditis genomes was then tested on several genera within the Rhabditidae family (Oscheius, Diploscapter, Auanema, and Caenorhabditis). The number of UCE loci harvested from the genomes ranged from 1 (Oscheius myriophilus) to 5700 ( Caenorhabditis elegans ). Only 50 UCEs were shared between all genera of the family, with most UCEs being uniquely found in Caenorhabditis. A phylogenetic reconstruction with high support was obtained for 64 taxa of the Rhabditidae family (Figure S11).

In total, 27 data sets were analysed and used to reconstruct the phylogeny of the genus Caenorhabditis. Altogether, 945 loci with 437,091 informative sites were used, a mean of 462.53 sites per locus, a 95% confidence interval of 10.03, a minimum of 33 sites, and a maximum of 829 sites, with 945 alignments containing more than 0.75 proportion of taxa (alignments can be found in the Zenodo repository—10.5281/zenodo.15395838). The resulting tanglegram compared the placement of taxa using our UCE dataset to that of Stevens et al. 2019 (orthology based analysis). Both methods showed a congruent reconstruction (Figure 3).

Tanglegram for the genus *Caenorhabditis* comparing UCE‐derived reconstruction (left) obtained in this work and gene orthology‐based reconstruction (right) from (Stevens et al. 2019). The alternative topology for *C. wallacei* and *C. brenneri* is also proposed in Stevens et al. (2019) depending on the method used.

3.5. Classification Model With Ultraconserved Elements (UCEs)

3.5.1. Presence and Characteristics of UCEs in Rhabditidae and Panagrolaimidae

We analysed the total number of UCEs per strain within the families Rhabditidae and Panagrolaimidae to assess their presence across strains. The results, visualised in the Supporting Information (Figures S12 and S13), indicate substantial variation in UCE counts between strains within each family. Although most strains exhibit comparable UCE numbers, some show significantly lower counts. Specifically, in Rhabditidae, UCE counts range from 26 to 11,400, with a median of 7762, while in Panagrolaimidae, they range from 15 to 1457, with a median of 767.5. This variability may reflect differences in genome characteristics, such as assembly completeness or lineage‐specific variation in the retention of UCEs. This variability may also stem from the bait set design, which was based only on a few taxa; for instance, for Rhabditidae, the bait set was obtained only by analysing the genus Caenorhabditis, yet later applied to a broader range of lineages within Rhabditidae, including genera such as Diploscapter that were not represented in the initial design.

To ensure data quality, we applied an outlier detection approach based on UCE counts. Instead of relying solely on interquartile range (IQR) methods, which can be sensitive to skewed count distributions of UCE occurrences per strain, we opted for a more robust approach by removing the bottom 1% of strains with the lowest UCE counts. This method is particularly useful for genomic datasets where sequencing depth and assembly completeness vary, ensuring that only strains with sufficiently high UCE representation are retained for subsequent analyses. After this procedure, three strains were identified as outliers and removed, two from Rhabditidae (GCA_002207785.1 and GCA_964036155.1) and two from Panagrolaimidae (GCA_028622995.1—Panagrolaimus kolymaensis prior to splitting contigs, and GCA_963969345.1 Turbatrix aceti ).

After filtering, we analysed how frequently each UCE appears across the remaining strains. The UCE occurrence frequency profile for both families is presented (Figure 4), showing how many UCEs are shared across different proportions of strains. In Panagrolaimidae, the distribution is unimodal, with most UCEs appearing in an intermediate range of strains. In contrast, the Rhabditidae distribution exhibits bimodality, indicating two groups of UCEs with different levels of conservation. The low‐frequency group (0–50) may represent UCEs that are present in some species but absent in others, reflecting potential lineage‐specific losses or variation in presence or absence. Conversely, the high‐frequency group (100–150) comprises UCEs that are widely shared across Rhabditidae, suggesting they may be under stronger evolutionary constraints, potentially due to their regulatory or structural functions.

Divergent patterns of UCE occurrence frequency across strains in Rhabditidae and Panagrolaimidae. The x‐axis represents the percentage of strains in which a given UCE is present, while the y‐axis indicates the number of UCEs observed at each frequency level. The unimodal pattern in Panagrolaimidae contrasts with the bimodal pattern in Rhabditidae, suggesting differences in evolutionary constraints and genomic conservation across the two families.

We analysed the number of UCEs shared across genera within each family. In Rhabditidae, a total of 50 UCEs are shared across the genera Auanema, Caenorhabditis, Diploscapter, and Oscheius (Figure 5a). In Panagrolaimidae, 84 UCEs are shared across the genera Panagrolaimus, Halicephalobus, Panagrellus, Propanagrolaimus, and the outgroup genus Acrobeloides (Figure 5b). The corresponding Venn diagrams summarising these overlaps are presented in Figure 5, and the full list of shared UCEs is available in Table S13.

Venn diagrams for shared UCEs across genera in Rhabditidae and Panagrolaimidae. The diagrams illustrate the number of UCEs shared across the genera within each family.

3.5.2. Benchmark of Machine Learning Models on Rhabditidae Data

The performance evaluation of the four machine learning models: Random Forest (RF), Logistic Regression (LR) with data transformed with PCA, k‐Nearest Neighbours (k‐NN), and Extreme Gradient Boosting (XGBoost) on the Rhabditidae dataset of UCEs, along with their respective performance metrics, is shown in Table 2. A comparative analysis revealed differences in classification performance among the models. XGBoost achieved the highest AUC score of 0.9997, demonstrating superior predictive power. It correctly classified all Auanema and Diploscapter samples and misclassified only one Oscheius sample as Caenorhabditis. Its strong performance is attributed to its gradient boosting framework and the integration of class weights, which effectively handled class imbalance. Random Forest performed well, achieving an AUC of 0.9939, but struggled with Oscheius and Diploscapter. Specifically, it misclassified nine out of sixteen Oscheius samples as Diploscapter, leading to reduced accuracy in predicting minority classes. While class weighting helped mitigate imbalance, it was not as effective as XGBoost's approach. k‐Nearest Neighbours (k‐NN) obtained an AUC of 0.9858 and showed difficulty in classifying Oscheius and Diploscapter. Two Caenorhabditis samples were misclassified as Oscheius, and one Oscheius sample was wrongly predicted as Diploscapter, indicating sensitivity to the choice of k and the distance metric in high‐dimensional data. Logistic Regression with PCA had the lowest performance, with an AUC of 0.8290. While PCA successfully reduced dimensionality and improved computational efficiency, the model struggled with minority class classification. It failed to predict Auanema and Diploscapter entirely and misclassified multiple Oscheius samples, demonstrating limitations despite regularisation techniques to prevent overfitting. Overall, XGBoost emerged as the most effective model, demonstrating the highest classification accuracy and robustness across all classes. Its ability to handle imbalanced datasets, coupled with its efficiency and strong generalisation capacity, makes it particularly suited for classifying genera within Rhabditidae. The complete confusion matrices and class‐specific performance metrics for each model are provided in the Supporting Information A.4.

TABLE 2.

Performance metrics for machine learning models on the Rhabditidae family.

Model	AUC	Accuracy (%)	Kappa	Sensitivity (%)	Specificity (%)	Best hyperparameter
RF	0.9939	85.07	0.6819	85.42	96.21	mtry = 3
LR with PCA	0.8290	94.03	0.8599	74.46	98.55	PCA‐transformed
k‐NN	0.9858	94.03	0.8652	73.96	97.68	k = 5
XGBoost	0.9997	98.51	0.9657	99.48	99.52	Depth = 6, Learning rate = 0.3

Open in a new tab

3.5.2.1. Identification of Important Features UCEs

Following the application of the non‐zero importance score feature selection method on the XGBoost model, a total of 46 features were identified as relevant for nematode genera classification on the Rhabditidae family. These features, along with their corresponding “Overall” importance scores, are detailed in the Table S14. To provide a concise overview of the feature importance, the top 20 features ranked by their “Overall” importance scores are presented in Figure S16. The feature uce.15118 exhibited the highest importance score (100.00), indicating its central contribution to the model's predictive accuracy. Notably, the top six features (uce.15118, uce.1361, uce.2297, uce.6206, uce.17604, uce.16976) demonstrated significantly higher importance scores compared to the remaining features, suggesting their critical role in discriminating between nematode genera for the family Rhabditidae. Furthermore, six features were also found to be crucial in the cross‐validation with varying features, where the model's AUC significantly improved when these features were included.

3.5.3. Classification Performance on Panagrolaimidae UCE Data

The optimised XGBoost model worked well on the Panagrolaimidae UCE dataset. To address the class imbalance, stratified sampling and weighted training were applied. The model was first trained on all UCE features, and next, the training was done using just the top 39 most important UCEs (i.e., those with an overall importance value > 1, see Table S15). Both the full model and the reduced feature model achieved the same overall accuracy of 94.12%. However, there were slight differences in the confusion matrices and other classification metrics Supporting Information A.5. These results suggest that the reduced set of features retained the model's predictive power while reducing dimensionality. Detailed statistics and metrics per class are provided in the Supporting Information A.5.

4. Discussion

4.1. Ultraconserved Elements Can Be Identified and Retrieved for Nematodes and Can Be Used for Robust Phylogenetic Reconstructions

We designed two bait sets to capture ultraconserved elements within the phylum Nematoda, with a focus on the families Rhabditidae and Panagrolaimidae. We then identified the most informative UCEs using machine learning classifiers to further aid in nematode classification, utilizing genetic information at high resolution. We demonstrate that using the target‐captured loci, a well‐resolved phylogeny can be obtained for Panagrolamidae, placing correctly described taxa within the family and showing high support values for most nodes, providing insights into the phylogenetic relationships among taxa that previously had no genetic data available.

Our phylogenetic analysis is consistent with previous studies based on both molecular markers and whole‐genome data (Abolafia and Vecchi 2021; Qing et al. 2024; Shokoohi and Masoko 2024), with the exception of Panagrolaimus BSS8, which we now confirm with both molecular and morphological evidence that it is in fact Necephalobus halophilus, being closely related to the genus Halicephalobus in our phylogenetic reconstruction. The nematode genus Panagrolaimus appears to be more derived within the Panagrolaimidae family, and it is resolved as the closest relative to the genus Propanagrolaimus. Both genera form a monophyletic clade that is closely related to Halicephalobus, which in turn is sister to Panagrellus. All of these genera are distinct from the outgroup, Acrobeloides, which is placed outside of the Panagrolaimidae clade.

We also find asexual Panagrolaimus clustering together (Panagrolaimus sp. PS1579, Panagrolaimus sp. PS1159 Lewis et al. (2009) and the newly isolated Panagrolaimus sp. ALT.22.08) (Villegas et al. 2024) and separately from sexually reproducing ones (Panagrolaimus einhardii and Panagrolaimus superbus Lewis et al. (2009)). Furthermore, the asexual nematode Panagrolaimus kolymanesis, previously placed as an outgroup to all other Panagrolaimus nematodes Shatilovich et al. (2023), is here shown to cluster with Panagrolaimus JU1645 and does not sit on a branch of its own. We find with high support that sexually reproducing strains from the Atacama Desert Villegas et al. (2024) cluster together and are more closely related to other basal Panagrolaimus strains as Panagrolaimus sp. JU1367 and Panagrolaimus sp. JU1365.

In the Rhabditidae family, we see that our phylogenetic reconstruction using UCEs is congruent when compared to previous work (Stevens et al. 2019), showing a clear distinction between the ‘Elegans’ and ‘Japonica’ groups and both forming the ‘Elegans’ supergroup with Caenorhabditis monodelphis as an outgroup. While C. brenneri and C. wallacei do not exhibit a direct one‐to‐one correspondence between their placement in the phylogenetic reconstruction, this alternative hypothesis remains consistent with the phylogenetic trees reported in (Stevens et al. 2019), which were generated using different methods.

While using UCEs in both families is a promising approach for phylogenetic reconstructions and further analysis, it is important to note that these “ultraconserved” elements often exhibit conservation only within relatively narrow taxonomic scales (e.g., within families rather than across orders). This pattern suggests a higher rate of sequence divergence across broader nematode lineages, consistent with the rapid molecular evolution observed in the phylum.

4.2. Panagrolaimus Detritophagus BSS8 Is in Fact Neocephalobus Halophilus BSS8 (Paetzold 1958)

The Panagrolaimus BSS8 strain is one of the cultures of nematodes established by Dr. Björn Sohlenius from samples from Surtsey Island (Lewis et al. 2009). There were three different populations of Panagrolaimus collected on Surtsey, established in cultures and described by (Bostrom 1988), of which the Panagrolaimus “population II” matches the morphology of the BSS8 strain, in both morphology and in measurements, but especially in the presence of a distinct midventral precloacal papilliform sensillum, located at a distance from the cloacal opening, at the level with the spicule manubrium. The population was never identified to species level by (Bostrom 1988). Still, it was considered to be most similar to Panagrolaimus chalcographi by using the key from (Andrássy 1984) or Panagrolaimus detritophagus by using the (Williams 1986) grouping. The conclusion of the manuscript was that population II from Surtsey “appears intermediate between Panagrolaimus detritophagus and Panagrolaimus chalcographi” (Bostrom 1988), but no conclusive decision was made. In subsequent publications, this culture was named (without any obvious justification) as Panagrolaimus detritophagus BSS8 (Felix et al. 2000; Lewis et al. 2009; Schiffer et al. 2014; Shannon et al. 2005). Here we provide additional morphological evidence to confirm the taxonomic identity of the BSS8 strain.

Presence of distinct midventral precloacal papilliform sensillum located at a distance from the cloacal opening (at the level of spicule manubrium) is one distinct feature of this population, separating it from most known species currently classified in the genus Panagrolaimus and all currently accepted genera of Panagrolaimidae. Only the following three species in Panagrolaimidae are morphologically similar to the BSS8 strain:

Neocephalobus aberrans Steiner (1929), originally described as Cephalobus (Neocephalobus) aberrans by and later transferred to the genus Panagrolaimus (= Panagrolaimus aberrans ), (Steiner 1929) Goodey 1963 was found in the faeces of a guinea pig, kept at the University of California. No measurements were included in the original description, which is rather incomplete in general. However, the type population of Neocephalobus aberrans can be easily distinguished from the BSS8 in having a narrow long stoma with a minute tooth, whereas the stoma in BSS8 is broad and the tooth is distinct. Moreover, the excretory pore in N. aberrans is surrounded by a cuticularised ring (absent in BSS8) and has a shorter tail in both females and males. The newly discovered population of the same species (Bhat et al. 2025) can also be clearly separated from the BSS8 strain in having a longer and narrower stoma (just like in the original description), the presence of a median bulb, a shorter tail, and a different arrangement of male caudal sensilla.

Neocephalobus halophilus Paetzold 1958, also later transferred to the genus Panagrolaimus and renamed to Panagrolaimus paetzoldi Goodey, 1963 (homonym of Panagrolaimus halophilus Meyl, 1954), was collected in salt marshes in central Germany. Morphologically, it matches the BSS8 strain in all parameters, including the shape of stoma, number and arrangement of sensory structures on the male tail. Bhat et al. did not make any taxonomic decisions on the status of the BSS8 strain; however, they positively identified Dutch populations of Panagrolaimus paetzoldi as Neocephalobus halophilus, based solely on the molecular data. These populations form a well‐supported clade with BSS8 but are different by about 24 substitutions. Unfortunately, there is no morphological voucher data available for these sequences. Here we propose to consider them a distinct population of Neocephalobus halophilus until new specimens are collected and sequenced.

Panagrolaimus orthomici Korentchenko (1992) can be easily separated from BSS8 in having two subdorsal teeth in the stoma (single in BSS8), a more anterior position of the excretory pore, and a different arrangement of sensory structures on the male tail. This species is here transferred to the genus Neocephalobus, as Neocephalobus orthomici (Korentchenko 1992) comb. n.

In conclusion, currently available morphological evidence, along with phylogenetic placement based on UCEs, as well as recently published reinstatement of the genus Neocephalobus (Bhat et al. 2025), confirms that the nematode originally identified as Panagrolaimus “population II” in Bostrom (1988) and subsequently as Panagrolaimus detritophagus BSS8 and Panagrolaimus BSS8 is in fact Neocephalobus halophilus (Paetzold 1958).

4.3. A Minimum UCE Informative Set Can Accurately Classify Taxa to Genus Level

Applying machine learning to UCE presence‐absence data provided valuable insights into taxonomic classification within Nematoda. In Rhabditidae, the XGBoost model outperformed all other models (Random Forest, PCA‐Logistic Regression, and k‐Nearest Neighbours) with an AUC score of 0.9997. Feature selection using the non‐zero importance scores derived from XGBoost revealed that 46 UCEs are important for genus‐level classification, with uce.15118 being the most informative. The optimised XGBoost model strategy was applied to the Panagrolaimidae dataset, showing its versatility. Both the full model (with the complete set of features) and the model using only the top 39 most important UCEs (importance > 1) reached the same overall accuracy of 94.12%. This shows that an efficient classifier for taxonomic assignment can be constructed from a smaller set of informative UCEs.

Through comparative analyses, it was observed that certain UCEs are shared across genera. The most frequently observed UCEs were also important in the XGBoost model. In Panagrolaimidae, the uce.9910, uce.194, uce.8674, uce.459, uce.8786, uce.323 appeared in all three categories, indicating their potential role in genus differentiation. Although XGBoost can predict well, it is essential to recognize its limitations. The performance of the model people create will depend upon the quality and completeness of the input data, and any species sampling biases or gaps in the UCE dataset can affect results.

4.4. UCE‐Based Nematode Classification Potential for Biodiversity Assessments and Agricultural Monitoring

Nematodes serve as essential bioindicators of soil quality and ecosystem function, making their rapid and accurate classification essential for environmental monitoring and agricultural research. Their classification in coloniser‐persister categories (cp) as well as the characterisation of feeding types based on taxonomic classification (family and genera respectively) has been used for soil health characterisation (du Preez et al. 2022; Ferris et al. 2001; Martin et al. 2022). For instance, opportunistic nematodes (cp‐1 and cp‐2) reproduce fast and indicate disturbed, nutrient‐enriched, or polluted soils, whereas persisters (cp‐4 and cp‐5) reproduce slowly and indicate stable ecosystems with good soil health. Additionally, the types of feeding and the proportion of different feeding types in soils can be indicative of high organic input, organic matter decomposition, or potential nutrient imbalances. Their rapid identification can enable accurate, near‐real‐time biodiversity assessments, for instance, through the use of portable sequencing technologies (e.g., Nanopore) and the harvesting of UCEs, as well as model classification. Recent studies have demonstrated that whole genome amplification techniques (e.g., MDA) can generate sufficient DNA from single nematode individuals for successful Nanopore sequencing (Lee et al. 2023; Roberts et al. 2024). This opens up the possibility of combining UCE‐based approaches with single‐worm sequencing, allowing for accurate taxonomic identification and downstream analyses directly on‐site, even in cases where culturing is challenging or sample availability is limited.

Such methods can be extended beyond free‐living nematodes and applied to the analysis of parasitic nematodes, which are currently a critical target for molecular diagnostics due to their significant impact on global agricultural systems and animal health (Abad et al. 2008; Gang and Hallem 2016; Strydom et al. 2023). Particularly focusing on agricultural systems, genera such as Meloidogyne, Heterodera, and Pratylenchus are known for causing significant crop losses globally (Khan 2023). Their early detection is vital for timely management strategies (Jones et al. 2013; Žibrat et al. 2023). Integrating molecular tools such as ultra‐conserved elements (UCEs) with portable sequencing technologies (e.g., Nanopore) has the potential to substantially accelerate the detection of plant‐parasitic nematodes (PPNs) by soil testing facilities and quarantine labs. Notably, the aforementioned PPN genera are closely related to each other (Thapa et al. 2019), and belong to Clade IV of the Nematoda, alongside the families Panagrolaimidae and Cephalobidae, which were analysed in this study (Blaxter 2011). This phylogenetic proximity suggests that the bait set developed here could also be effective for targeting UCEs from PPN taxa. Consequently, this approach could serve as a valuable tool to advance both ecological research and applied agricultural diagnostics.

Future advancements in automation could further enhance the speed and scalability of taxonomic classification pipelines, making them even more suitable for large‐scale ecological and environmental monitoring. Currently, our approach focuses on taxa within Rhabditidae and Panagrolaimidae; however, we anticipate that extending this method to other ecologically relevant nematode taxa could provide valuable insights, particularly in soil health assessments.

Our study highlights the effectiveness of the newly designed bait sets for multiple loci (UCEs) and target enrichment as a valuable genomic resource for investigating evolutionary relationships among nematodes, specifically within the Rhabditidae and Panagrolaimidae families. This approach enables the retrieval of thousands of genetic markers with known homology, producing results comparable to those obtained using orthologous genes, while allowing for the study of a broader range of taxa at lower costs. While not explored here, the high variability of the flanking region in the targeted conserved loci can allow for fine‐scale analysis at the population level, enabling comparisons between the same species occurring at different geographical locations or under different environmental stressors.

Author Contributions

The study was designed by L.V., L.J., and P.H.S. J.v.S provided support with bioinformatic pipelines. Laboratory work was conducted by L.V. Bioinformatic and statistical work was conducted by L.V. and L.J. P.H.S., A.W., and O.H. provided supervision during the entire project. The manuscript draft was written by L.V. and L.J.; all authors further edited the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Supporting information

Data S1: Supporting Information.

MEN-25-e70046-s001.pdf^{(2.2MB, pdf)}

Acknowledgements

Laura I. Villegas position and part of this research were supported by the DFG funded collaborative research centre CRC1211 (Earth ‐ Evolution at the Dry Limit) [grant number 268236062], conducted within the subproject B08 lead by P.H. Schiffer and A‐M Waldvogel. Lucy Jimenez was supported through an ENP grant awarded to P.H. Schiffer [grant number: 434028868], and the Biodiversity Genomics Center Cologne (BioC2) project funded by the UoC forum under the Excellent Research Support Program of the University of Cologne. We would like to thank the Cologne Center for Genomics (CCG) for their support, as well as Marie‐Anne Félix for collecting and providing nematode cultures. Open Access funding enabled and organized by Projekt DEAL.

Funding: This work was supported by Deutsche Forschungsgemeinschaft (Collaborative Research Centre funding 268236062 and Emmy Noether Programme funding 434028868) and the UoC forum under the Excellent Research Support Program, BioC2.

Handling Editor: Isheng Jason Tsai

Contributor Information

Laura Villegas, Email: lvilleg1@uni-koeln.de.

Philipp H. Schiffer, Email: p.schiffer@uni-koeln.de.

Data Availability Statement

The sequencing data is available under the bioproject PRJNA1335917 and can also be found in the Zenodo repository (10.5281/zenodo.15395838). The bioinformatic pipeline developed for machine learning genera classification is available on GitHub (https://github.com/LucyJimenez/uce‐ml‐analysis). Slides for the following nematodes used in the target capture analysis have been deposited at the Swedish Museum of Natural History: Neocephalobus halophilus (SMNH‐226079–SMNH‐226083); Panagrolaimus sp. PS1579 (SMNH‐226085); Panagrolaimus sp. JU1371 (SMNH‐226086); Panagrolaimus sp. JU1387 (SMNH‐226087); Panagrolaimus sp. JU1645 (SMNH‐226088); Acrobeloides cf. guoghiensis ARO.22.05 (SMNH‐226113); Acrobeloides tricornis PAP.22.17 (SMNH‐226112); Panagrolaimus sp. ALT.22.04 (SMNH‐226089); Panagrolaimus sp. ALT.22.08 (SMNH‐226090); Panagrolaimus sp. PAP.22.29 (SMNH‐226091); Panagrolaimus sp. PAP.22.39 (SMNH‐226092–SMNH‐226083).

References

Abad, P. , Gouzy J., Aury J.‐M., et al. 2008. “Genome Sequence of the Metazoan Plant‐Parasitic Nematode Meloidogyne Incognita.” Nature Biotechnology 26, no. 8: 909–915. 10.1038/nbt.1482. [DOI] [PubMed] [Google Scholar]
Abolafia, J. , and Vecchi M.. 2021. “Redescription and Phylogenetic Analysis of the Type Species of the Genus Panagrellus Thorne, 1938 (Rhabditida, Panagrolaimidae), p. Pycnus Thorne, 1938, Including the First Sem Study.” Journal of Nematology 53, no. 1: 1–20. 10.21307/jofnem-2021-080. [DOI] [PMC free article] [PubMed] [Google Scholar]
Andrássy, I. 1984. Klasse Nematoda: (Ordnungen Monhysterida, Desmoscolecida, Araeolaimida, Chromadorida, Rhabditida). Gustav Fischer Verlag. [Google Scholar]
Andrews, K. R. , Good J. M., Miller M. R., Luikart G., and Hohenlohe P. A.. 2016. “Harnessing the Power of Radseq for Ecological and Evolutionary Genomics.” Nature Reviews Genetics 17, no. 2: 81–92. 10.1038/nrg.2015.28. [DOI] [PMC free article] [PubMed] [Google Scholar]
Andrews, S. 2010. “Fastqc: A Quality Control Tool for High Throughput Sequence Data.” https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
Bhat, A. H. , Yadav K., Abolafia J., Chaubey A. K., Fouad D., and Machado R. A.. 2025. “Redescription of Neocephalobus Aberrans Steiner, 1929 (Rhabditida, Panagrolaimidae) With Restoration of the Genus and Its Taxonomic Implications.” Nematology 27, no. 5: 561–581. 10.1163/15685411-bja10407. [DOI] [Google Scholar]
Bianchini, G. , and Sánchez‐Baracaldo P.. 2024. “Flexible, Modular Software to Visualise and Manipulate Phylogenetic Trees.” Ecology and Evolution 14, no. 2: e10873. 10.1002/ece3.10873. [DOI] [PMC free article] [PubMed] [Google Scholar]
Blaimer, B. B. , Brady S. G., Schultz T. R., Lloyd M. W., Fisher B. L., and Ward P. S.. 2015. “Phylogenomic Methods Outperform Traditional Multi‐Locus Approaches in Resolving Deep Evolutionary History: A Case Study of Formicine Ants.” BMC Evolutionary Biology 15, no. 1. 10.1186/s12862-015-0552-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Blaxter, M. 2011. “Nematodes: The Worm and Its Relatives.” PLoS Biology 9, no. 4: e1001050. 10.1371/journal.pbio.1001050. [DOI] [PMC free article] [PubMed] [Google Scholar]
Blaxter, M. L. , De Ley P., Garey J. R., et al. 1998. “A Molecular Evolutionary Framework for the Phylum Nematoda.” Nature 392, no. 6671: 71–75. 10.1038/32160. [DOI] [PubMed] [Google Scholar]
Bostrom, S. 1988. “Descriptions and Morphological Variability of Three Populations of Panagrolaimus Fuchs, 1930 (Nematoda: Panagrolaimidae).” Nematologica 34, no. 2: 144–155. 10.1163/002825988x00233. [DOI] [Google Scholar]
Bushnell, B. 2014. “Bbmap: A Fast, Accurate, Splice‐Aware Aligner.” https://sourceforge.net/projects/bbmap/.
Chen, T. , and Guestrin C.. 2016. Xgboost: A Scalable Tree Boosting System. CoRR. http://arxiv.org/abs/1603.02754. [Google Scholar]
Danecek, P. , Bonfield J. K., Liddle J., et al. 2021. “Twelve Years of Samtools and Bcftools.” GigaScience 10, no. 2: giab008. 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
du Preez, G. , Daneel M., de Goede R., et al. 2022. “Nematode‐Based Indices in Soil Ecology: Application, Utility, and Future Directions.” Soil Biology and Biochemistry 169: 108640. 10.1016/j.soilbio.2022.108640. [DOI] [Google Scholar]
Duckett, D. J. , Calder K., Sullivan J., Tank D. C., and Carstens B. C.. 2023. “Reduced Representation Approaches Produce Similar Results to Whole Genome Sequencing for Some Common Phylogeographic Analyses.” PLoS One 18, no. 11: e0291941. 10.1371/journal.pone.0291941. [DOI] [PMC free article] [PubMed] [Google Scholar]
Erickson, K. L. , Pentico A., Quattrini A. M., and McFadden C. S.. 2020. “New Approaches to Species Delimitation and Population Structure of Anthozoans: Two Case Studies of Octocorals Using Ultraconserved Elements and Exons.” Molecular Ecology Resources 21, no. 1: 78–92. 10.1111/1755-0998.13241. [DOI] [PubMed] [Google Scholar]
Faircloth, B. C. 2015. “Phyluce Is a Software Package for the Analysis of Conserved Genomic Loci.” Bioinformatics 32, no. 5: 786–788. 10.1093/bioinformatics/btv646. [DOI] [PubMed] [Google Scholar]
Faircloth, B. C. , McCormack J. E., Crawford N. G., Harvey M. G., Brumfield R. T., and Glenn T. C.. 2012. “Ultraconserved Elements Anchor Thousands of Genetic Markers Spanning Multiple Evolutionary Timescales.” Systematic Biology 61, no. 5: 717–726. 10.1093/sysbio/SYS004. [DOI] [PubMed] [Google Scholar]
Felix, M.‐A. , De Ley P., Sommer R. J., et al. 2000. “Evolution of Vulva Development in the Cephalobina (Nematoda).” Developmental Biology 221: 68–86. [DOI] [PubMed] [Google Scholar]
Ferris, H. , Bongers T., and de Goe R.. 2001. “A Framework for Soil Food Web Diagnostics: Extension of the Nematode Faunal Analysis Concept.” Applied Soil Ecology 18, no. 1: 13–29. 10.1016/s0929-1393(01)00152-4. [DOI] [Google Scholar]
Friedman, J. H. , Hastie T., and Tibshirani R.. 2010. “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software 33, no. 1: 1–22. 10.18637/jss.v033.i01. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gang, S. S. , and Hallem E. A.. 2016. “Mechanisms of Host Seeking by Parasitic Nematodes.” Molecular and Biochemical Parasitology 208, no. 1: 23–32. 10.1016/j.molbiopara.2016.05.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gilbert, P. S. , Chang J., Pan C., et al. 2015. “Genome‐Wide Ultraconserved Elements Exhibit Higher Phylogenetic Informativeness Than Traditional Gene Markers in Percomorph Fishes.” Molecular Phylogenetics and Evolution 92: 140–146. 10.1016/j.ympev.2015.05.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
Guyon, I. , and Elisseeff A.. 2003. “An Introduction to Variable and Feature Selection.” Journal of Machine Learning Research 3: 1157–1182. [Google Scholar]
Hodda, M. 2022. “Phylum Nematoda: A Classification, Catalogue and Index of Valid Genera, With a Census of Valid Species.” Zootaxa 5114, no. 1: 1–289. [DOI] [PubMed] [Google Scholar]
Jones, J. T. , Haegeman A., Danchin E. G. J., et al. 2013. “Top 10 Plant‐Parasitic Nematodes in Molecular Plant Pathology.” Molecular Plant Pathology 14, no. 9: 946–961. 10.1111/mpp.12057. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jones, M. R. , and Good J. M.. 2015. “Targeted Capture in Evolutionary and Ecological Genomics.” Molecular Ecology 25, no. 1: 185–202. 10.1111/mec.13304. [DOI] [PMC free article] [PubMed] [Google Scholar]
Khan, M. R. 2023. “Nematode Pests of Agricultural Crops, a Global Overview.” In Novel Biological and Biotechnological Applications in Plant Nematode Management, 3–45. Springer Nature Singapore. 10.1007/978-981-99-2893-4_1. [DOI] [Google Scholar]
Korentchenko, E. A. 1992. “Panagrolaimus Orthomici sp.n. (Cephalobina), a Nematode of Bark Beetles of the Genus Panagrolaimus in North‐Eastern Asia.” Parazitologia 6: 530–534. [Google Scholar]
Kuhn, M. 2008. “Building Predictive Models in r Using the Caret Package.” Journal of Statistical Software 28, no. 5: 1–26. 10.18637/jss.v028.i05.27774042 [DOI] [Google Scholar]
Kuhn, R. M. , Haussler D., and Kent W. J.. 2012. “The Ucsc Genome Browser and Associated Tools.” Briefings in Bioinformatics 14, no. 2: 144–161. 10.1093/bib/bbs038. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee, Y.‐C. , Ke H.‐M., Liu Y.‐C., et al. 2023. “Single‐Worm Long‐Read Sequencing Reveals Genome Diversity in Free‐Living Nematodes.” Nucleic Acids Research 51, no. 15: 8035–8047. 10.1093/nar/gkad647. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lewis, S. C. , Dyal L. A., Hilburn C. F., et al. 2009. “Molecular Evolution in Panagrolaimus Nematodes: Origins of Parthenogenesis, Hermaphroditism and the Antarctic Species p. Davidi.” BMC Evolutionary Biology 9, no. 1. 10.1186/1471-2148-9-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liaw, A. , and Wiener M.. 2002. “Classification and Regression by Random Forest.” R News 2, no. 3: 18–22. [Google Scholar]
Libbrecht, M. W. , and Noble W. S.. 2015. “Machine Learning Applications in Genetics and Genomics.” Nature Reviews. Genetics 16, no. 6: 321–332. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lu, Y. , Li M., Gao Z., et al. 2025. “Advances in Whole Genome Sequencing: Methods, Tools, and Applications in Population Genomics.” International Journal of Molecular Sciences 26, no. 1: 372. 10.3390/ijms26010372. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lunter, G. , and Goodson M.. 2010. “Stampy: A Statistical Algorithm for Sensitive and Fast Mapping of Illumina Sequence Reads.” Genome Research 21, no. 6: 936–939. 10.1101/gr.111120.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lv, Z. , Li M., Wang Y., and Zou Q.. 2023. “Editorial: Machine Learning for Biological Sequence Analysis.” Frontiers in Genetics 14: 1150688. [DOI] [PMC free article] [PubMed] [Google Scholar]
Martin, M. 2011. “Cutadapt Removes Adapter Sequences From High‐Throughput Sequencing Reads.” EMBnet.Journal 17, no. 1: 10. 10.14806/ej.17.1.200. [DOI] [Google Scholar]
Martin, T. , Wade J., Singh P., and Sprunger C. D.. 2022. “The Integration of Nematode Communities Into the Soil Biological Health Framework by Factor Analysis.” Ecological Indicators 136: 108676. 10.1016/j.ecolind.2022.108676. [DOI] [Google Scholar]
Minh, B. Q. , Schmidt H. A., Chernomor O., et al. 2020. “Iq‐Tree 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era.” Molecular Biology and Evolution 37, no. 5: 1530–1534. 10.1093/molbev/msaa015. [DOI] [PMC free article] [PubMed] [Google Scholar]
Natsidis, P. , Kapli P., Schiffer P. H., and Telford M. J.. 2021. “Systematic Errors in Orthology Inference and Their Effects on Evolutionary Analyses.” iScience 24, no. 2: 102110. [DOI] [PMC free article] [PubMed] [Google Scholar]
Paetzold, R. 1958. “Neocephalobus Halophilus n. sp. (Nematoda: Cephalobidae) Aus Salzböden der Umgebung von Halle (Saale).” Archiv Für Naturgeschichte 22, no. 2–3: 157–167. [Google Scholar]
Paradis, E. , and Schliep K.. 2019. “Ape 5.0: An Environment for Modern Phylogenetics and Evolutionary Analyses in R.” Bioinformatics 35: 526–528. 10.1093/bioinformatics/bty633. [DOI] [PubMed] [Google Scholar]
Prjibelski, A. , Antipov D., Meleshko D., Lapidus A., and Korobeynikov A.. 2020. “Using Spades de Novo Assembler.” Current Protocols in Bioinformatics 70, no. 1: e102. 10.1002/cpbi.102. [DOI] [PubMed] [Google Scholar]
Qing, X. , Zhang Y. M., Sun S., et al. 2024. “Phylogenomic Insights Into the Evolution and Origin of Nematoda.” Systematic Biology 74, no. 3: 349–358. 10.1093/sysbio/syae073. [DOI] [PubMed] [Google Scholar]
Quattrini, A. M. , Faircloth B. C., Dueñas L. F., et al. 2017. “Universal Target‐Enrichment Baits for Anthozoan (Cnidaria) Phylogenomics: New Approaches to Long‐Standing Problems.” Molecular Ecology Resources 18, no. 2: 281–295. 10.1111/1755-0998.12736. [DOI] [PubMed] [Google Scholar]
Quinlan, A. R. , and Hall I. M.. 2010. “Bedtools: A Flexible Suite of Utilities for Comparing Genomic Features.” Bioinformatics 26, no. 6: 841–842. 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
Revell, L. J. 2024. “Phytools 2.0: An Updated r Ecosystem for Phylogenetic Comparative Methods (And Other Things).” PeerJ 12: e16505. 10.7717/peerj.16505. [DOI] [PMC free article] [PubMed] [Google Scholar]
Roberts, N. G. , Gilmore M. J., Struck T. H., and Kocot K. M.. 2024. “Multiple Displacement Amplification Facilitates Smrt Sequencing of Microscopic Animals and the Genome of the Gastrotrich Lepidodermella squamata (Dujardin 1841).” Genome Biology and Evolution 16, no. 12: evae254. 10.1093/gbe/evae254. [DOI] [PMC free article] [PubMed] [Google Scholar]
Robin, X. , Turck N., Hainard A., et al. 2011. “pROC: An Open‐Source Package for R and s+ to Analyze and Compare ROC Curves.” BMC Bioinformatics 12, no. 1: 77. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rubin, B. E. R. , Ree R. H., and Moreau C. S.. 2012. “Inferring Phylogenies From Rad Sequence Data.” PLoS One 7, no. 4: e33394. 10.1371/journal.pone.0033394. [DOI] [PMC free article] [PubMed] [Google Scholar]
Saeys, Y. , Inza I., and Larrañaga P.. 2007. “A Review of Feature Selection Techniques in Bioinformatics.” Bioinformatics 23, no. 19: 2507–2517. 10.1093/bioinformatics/btm344. [DOI] [PubMed] [Google Scholar]
Schiffer, P. H. , Nsah N. A., Grotehusmann H., Kroiher M., Loer C., and Schierenberg E.. 2014. “Developmental Variations Among Panagrolaimid Nematodes Indicate Developmental System Drift Within a Small Taxonomic Unit.” Development Genes and Evolution 224, no. 3: 183–188. 10.1007/s00427-014-0471-2. [DOI] [PubMed] [Google Scholar]
Shannon, A. J. , Browne J. A., Boyd J., Fitzpatrick D. A., and Burnell A. M.. 2005. “The Anhydrobiotic Potential and Molecular Phylogenetics of Species and Strains Ofpanagrolaimus(Nematoda, Panagrolaimidae).” Journal of Experimental Biology 208, no. 12: 2433–2445. 10.1242/jeb.01629. [DOI] [PubMed] [Google Scholar]
Shatilovich, A. , Gade V. R., Pippel M., et al. 2023. “A Novel Nematode Species From the Siberian Permafrost Shares Adaptive Mechanisms for Cryptobiotic Survival With c. Elegans Dauer Larva.” PLoS Genetics 19, no. 7: e1010798. 10.1371/journal.pgen.1010798. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shokoohi, E. , and Masoko P.. 2024. “Morphological and Molecular Analysis of Propanagrolaimus Siweyae sp. n. (Nematoda: Panagrolaimidae) From Molepo Dam, Limpopo Province, South Africa, and Its Relationship With Water Parameters.” Biologia 79, no. 12: 3573–3587. 10.1007/s11756-024-01806-2. [DOI] [Google Scholar]
Steiner, G. 1929. “Cephalobus (Neocephalobus) Aberrans n. Sg. n. sp., (Rhabditidae, Nematodes) From the Feces of a Guinea‐Pig.” Journal of Parasitology 16, no. 2: 88–90. 10.2307/3271914. [DOI] [Google Scholar]
Stevens, L. , Félix M.‐A., Beltran T., et al. 2019. “Comparative Genomics of 10 Newcaenorhabditisspecies.” Evolution Letters 3, no. 2: 217–236. 10.1002/evl3.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
Strydom, T. , Lavan R. P., Torres S., and Heaney K.. 2023. “The Economic Impact of Parasitism From Nematodes, Trematodes and Ticks on Beef Cattle Production.” Animals 13, no. 10: 1599. 10.3390/ani13101599. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thapa, S. , Gates M. K., Reuter‐Carlson U., Androwski R. J., and Schroeder N. E.. 2019. “Convergent Evolution of Saccate Body Shapes in Nematodes Through Distinct Developmental Mechanisms.” EvoDevo 10, no. 1: 5. 10.1186/s13227-019-0118-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
van den Hoogen, J. , Geisen S., Wall D. H., et al. 2020. “A Global Database of Soil Nematode Abundance and Functional Group Composition.” Scientific Data 7, no. 1: 103. 10.1038/s41597-020-0437-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
van der Sprong, J. , de Voogd N. J., McCormack G. P., Sandoval K., Schätzle S., and Voigt O.. 2023. “A Novel Target‐Enriched Multilocus Assay for Sponges (Porifera): Red Sea Haplosclerida (Demospongiae) as a Test Case.” Molecular Ecology Resources 24, no. 2: e13891. 10.1111/1755-0998.13891. [DOI] [PubMed] [Google Scholar]
Van Megen, H. , van den Elsen S., Holterman M., et al. 2009. “A Phylogenetic Tree of Nematodes Based on About 1200 Full‐Length Small Subunit Ribosomal DNA Sequences.” Nematology 11, no. 6: 927–950. 10.1163/156854109X456862. [DOI] [Google Scholar]
Venables, W. N. , and Ripley B. D.. 2002. Modern Applied Statistics With s. Fourth ed. Springer. https://www.stats.ox.ac.uk/pub/MASS4/. [Google Scholar]
Villegas, L. , Pettrich L., Acevedo‐Trejos E., Suwanngam A., Wassey N., and Allende M. L.. 2024. “Hierarchical Patterns of Soil Biodiversity in the Atacama Desert: Insights Across Biological Scales.” bioRxiv. 10.1101/2024.09.30.615889. [DOI] [Google Scholar]
Williams, M. 1986. “The Use of Scanning Electron Microscopy in the Taxonomy of Panagrolaimus (Nematoda: Panagrolaimidae).” Nematologica 32, no. 1: 89–97. 10.1163/187529286X00057. [DOI] [Google Scholar]
Winker, K. , Glenn T. C., and Faircloth B. C.. 2018. “Ultraconserved Elements (Uces) Illuminate the Population Genomics of a Recent, High‐Latitude Avian Speciation Event.” PeerJ 6: e5735. 10.7717/peerj.5735. [DOI] [PMC free article] [PubMed] [Google Scholar]
Žibrat, U. , Viaene N., Širca S., van Beek J., Susič N., and Stare B. G.. 2023. “Nemdetect: Early Detection of Quarantine Nematodes in Potatoes Using Remote Sensing.” EFSA Supporting Publications 20, no. 12. 10.2903/sp.efsa.2023.EN-8143. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data S1: Supporting Information.

MEN-25-e70046-s001.pdf^{(2.2MB, pdf)}

Data Availability Statement

[men70046-bib-0001] Abad, P. , Gouzy J., Aury J.‐M., et al. 2008. “Genome Sequence of the Metazoan Plant‐Parasitic Nematode Meloidogyne Incognita.” Nature Biotechnology 26, no. 8: 909–915. 10.1038/nbt.1482. [DOI] [PubMed] [Google Scholar]

[men70046-bib-0002] Abolafia, J. , and Vecchi M.. 2021. “Redescription and Phylogenetic Analysis of the Type Species of the Genus Panagrellus Thorne, 1938 (Rhabditida, Panagrolaimidae), p. Pycnus Thorne, 1938, Including the First Sem Study.” Journal of Nematology 53, no. 1: 1–20. 10.21307/jofnem-2021-080. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0003] Andrássy, I. 1984. Klasse Nematoda: (Ordnungen Monhysterida, Desmoscolecida, Araeolaimida, Chromadorida, Rhabditida). Gustav Fischer Verlag. [Google Scholar]

[men70046-bib-0004] Andrews, K. R. , Good J. M., Miller M. R., Luikart G., and Hohenlohe P. A.. 2016. “Harnessing the Power of Radseq for Ecological and Evolutionary Genomics.” Nature Reviews Genetics 17, no. 2: 81–92. 10.1038/nrg.2015.28. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0005] Andrews, S. 2010. “Fastqc: A Quality Control Tool for High Throughput Sequence Data.” https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.

[men70046-bib-0006] Bhat, A. H. , Yadav K., Abolafia J., Chaubey A. K., Fouad D., and Machado R. A.. 2025. “Redescription of Neocephalobus Aberrans Steiner, 1929 (Rhabditida, Panagrolaimidae) With Restoration of the Genus and Its Taxonomic Implications.” Nematology 27, no. 5: 561–581. 10.1163/15685411-bja10407. [DOI] [Google Scholar]

[men70046-bib-0007] Bianchini, G. , and Sánchez‐Baracaldo P.. 2024. “Flexible, Modular Software to Visualise and Manipulate Phylogenetic Trees.” Ecology and Evolution 14, no. 2: e10873. 10.1002/ece3.10873. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0008] Blaimer, B. B. , Brady S. G., Schultz T. R., Lloyd M. W., Fisher B. L., and Ward P. S.. 2015. “Phylogenomic Methods Outperform Traditional Multi‐Locus Approaches in Resolving Deep Evolutionary History: A Case Study of Formicine Ants.” BMC Evolutionary Biology 15, no. 1. 10.1186/s12862-015-0552-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0009] Blaxter, M. 2011. “Nematodes: The Worm and Its Relatives.” PLoS Biology 9, no. 4: e1001050. 10.1371/journal.pbio.1001050. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0010] Blaxter, M. L. , De Ley P., Garey J. R., et al. 1998. “A Molecular Evolutionary Framework for the Phylum Nematoda.” Nature 392, no. 6671: 71–75. 10.1038/32160. [DOI] [PubMed] [Google Scholar]

[men70046-bib-0011] Bostrom, S. 1988. “Descriptions and Morphological Variability of Three Populations of Panagrolaimus Fuchs, 1930 (Nematoda: Panagrolaimidae).” Nematologica 34, no. 2: 144–155. 10.1163/002825988x00233. [DOI] [Google Scholar]

[men70046-bib-0012] Bushnell, B. 2014. “Bbmap: A Fast, Accurate, Splice‐Aware Aligner.” https://sourceforge.net/projects/bbmap/.

[men70046-bib-0013] Chen, T. , and Guestrin C.. 2016. Xgboost: A Scalable Tree Boosting System. CoRR. http://arxiv.org/abs/1603.02754. [Google Scholar]

[men70046-bib-0014] Danecek, P. , Bonfield J. K., Liddle J., et al. 2021. “Twelve Years of Samtools and Bcftools.” GigaScience 10, no. 2: giab008. 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0015] du Preez, G. , Daneel M., de Goede R., et al. 2022. “Nematode‐Based Indices in Soil Ecology: Application, Utility, and Future Directions.” Soil Biology and Biochemistry 169: 108640. 10.1016/j.soilbio.2022.108640. [DOI] [Google Scholar]

[men70046-bib-0016] Duckett, D. J. , Calder K., Sullivan J., Tank D. C., and Carstens B. C.. 2023. “Reduced Representation Approaches Produce Similar Results to Whole Genome Sequencing for Some Common Phylogeographic Analyses.” PLoS One 18, no. 11: e0291941. 10.1371/journal.pone.0291941. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0017] Erickson, K. L. , Pentico A., Quattrini A. M., and McFadden C. S.. 2020. “New Approaches to Species Delimitation and Population Structure of Anthozoans: Two Case Studies of Octocorals Using Ultraconserved Elements and Exons.” Molecular Ecology Resources 21, no. 1: 78–92. 10.1111/1755-0998.13241. [DOI] [PubMed] [Google Scholar]

[men70046-bib-0018] Faircloth, B. C. 2015. “Phyluce Is a Software Package for the Analysis of Conserved Genomic Loci.” Bioinformatics 32, no. 5: 786–788. 10.1093/bioinformatics/btv646. [DOI] [PubMed] [Google Scholar]

[men70046-bib-0019] Faircloth, B. C. , McCormack J. E., Crawford N. G., Harvey M. G., Brumfield R. T., and Glenn T. C.. 2012. “Ultraconserved Elements Anchor Thousands of Genetic Markers Spanning Multiple Evolutionary Timescales.” Systematic Biology 61, no. 5: 717–726. 10.1093/sysbio/SYS004. [DOI] [PubMed] [Google Scholar]

[men70046-bib-0020] Felix, M.‐A. , De Ley P., Sommer R. J., et al. 2000. “Evolution of Vulva Development in the Cephalobina (Nematoda).” Developmental Biology 221: 68–86. [DOI] [PubMed] [Google Scholar]

[men70046-bib-0021] Ferris, H. , Bongers T., and de Goe R.. 2001. “A Framework for Soil Food Web Diagnostics: Extension of the Nematode Faunal Analysis Concept.” Applied Soil Ecology 18, no. 1: 13–29. 10.1016/s0929-1393(01)00152-4. [DOI] [Google Scholar]

[men70046-bib-0022] Friedman, J. H. , Hastie T., and Tibshirani R.. 2010. “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software 33, no. 1: 1–22. 10.18637/jss.v033.i01. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0023] Gang, S. S. , and Hallem E. A.. 2016. “Mechanisms of Host Seeking by Parasitic Nematodes.” Molecular and Biochemical Parasitology 208, no. 1: 23–32. 10.1016/j.molbiopara.2016.05.007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0024] Gilbert, P. S. , Chang J., Pan C., et al. 2015. “Genome‐Wide Ultraconserved Elements Exhibit Higher Phylogenetic Informativeness Than Traditional Gene Markers in Percomorph Fishes.” Molecular Phylogenetics and Evolution 92: 140–146. 10.1016/j.ympev.2015.05.027. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0025] Guyon, I. , and Elisseeff A.. 2003. “An Introduction to Variable and Feature Selection.” Journal of Machine Learning Research 3: 1157–1182. [Google Scholar]

[men70046-bib-0026] Hodda, M. 2022. “Phylum Nematoda: A Classification, Catalogue and Index of Valid Genera, With a Census of Valid Species.” Zootaxa 5114, no. 1: 1–289. [DOI] [PubMed] [Google Scholar]

[men70046-bib-0027] Jones, J. T. , Haegeman A., Danchin E. G. J., et al. 2013. “Top 10 Plant‐Parasitic Nematodes in Molecular Plant Pathology.” Molecular Plant Pathology 14, no. 9: 946–961. 10.1111/mpp.12057. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0028] Jones, M. R. , and Good J. M.. 2015. “Targeted Capture in Evolutionary and Ecological Genomics.” Molecular Ecology 25, no. 1: 185–202. 10.1111/mec.13304. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0029] Khan, M. R. 2023. “Nematode Pests of Agricultural Crops, a Global Overview.” In Novel Biological and Biotechnological Applications in Plant Nematode Management, 3–45. Springer Nature Singapore. 10.1007/978-981-99-2893-4_1. [DOI] [Google Scholar]

[men70046-bib-0030] Korentchenko, E. A. 1992. “Panagrolaimus Orthomici sp.n. (Cephalobina), a Nematode of Bark Beetles of the Genus Panagrolaimus in North‐Eastern Asia.” Parazitologia 6: 530–534. [Google Scholar]

[men70046-bib-0031] Kuhn, M. 2008. “Building Predictive Models in r Using the Caret Package.” Journal of Statistical Software 28, no. 5: 1–26. 10.18637/jss.v028.i05.27774042 [DOI] [Google Scholar]

[men70046-bib-0032] Kuhn, R. M. , Haussler D., and Kent W. J.. 2012. “The Ucsc Genome Browser and Associated Tools.” Briefings in Bioinformatics 14, no. 2: 144–161. 10.1093/bib/bbs038. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0033] Lee, Y.‐C. , Ke H.‐M., Liu Y.‐C., et al. 2023. “Single‐Worm Long‐Read Sequencing Reveals Genome Diversity in Free‐Living Nematodes.” Nucleic Acids Research 51, no. 15: 8035–8047. 10.1093/nar/gkad647. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0034] Lewis, S. C. , Dyal L. A., Hilburn C. F., et al. 2009. “Molecular Evolution in Panagrolaimus Nematodes: Origins of Parthenogenesis, Hermaphroditism and the Antarctic Species p. Davidi.” BMC Evolutionary Biology 9, no. 1. 10.1186/1471-2148-9-15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0035] Liaw, A. , and Wiener M.. 2002. “Classification and Regression by Random Forest.” R News 2, no. 3: 18–22. [Google Scholar]

[men70046-bib-0036] Libbrecht, M. W. , and Noble W. S.. 2015. “Machine Learning Applications in Genetics and Genomics.” Nature Reviews. Genetics 16, no. 6: 321–332. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0037] Lu, Y. , Li M., Gao Z., et al. 2025. “Advances in Whole Genome Sequencing: Methods, Tools, and Applications in Population Genomics.” International Journal of Molecular Sciences 26, no. 1: 372. 10.3390/ijms26010372. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0038] Lunter, G. , and Goodson M.. 2010. “Stampy: A Statistical Algorithm for Sensitive and Fast Mapping of Illumina Sequence Reads.” Genome Research 21, no. 6: 936–939. 10.1101/gr.111120.110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0039] Lv, Z. , Li M., Wang Y., and Zou Q.. 2023. “Editorial: Machine Learning for Biological Sequence Analysis.” Frontiers in Genetics 14: 1150688. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0040] Martin, M. 2011. “Cutadapt Removes Adapter Sequences From High‐Throughput Sequencing Reads.” EMBnet.Journal 17, no. 1: 10. 10.14806/ej.17.1.200. [DOI] [Google Scholar]

[men70046-bib-0041] Martin, T. , Wade J., Singh P., and Sprunger C. D.. 2022. “The Integration of Nematode Communities Into the Soil Biological Health Framework by Factor Analysis.” Ecological Indicators 136: 108676. 10.1016/j.ecolind.2022.108676. [DOI] [Google Scholar]

[men70046-bib-0042] Minh, B. Q. , Schmidt H. A., Chernomor O., et al. 2020. “Iq‐Tree 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era.” Molecular Biology and Evolution 37, no. 5: 1530–1534. 10.1093/molbev/msaa015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0043] Natsidis, P. , Kapli P., Schiffer P. H., and Telford M. J.. 2021. “Systematic Errors in Orthology Inference and Their Effects on Evolutionary Analyses.” iScience 24, no. 2: 102110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0044] Paetzold, R. 1958. “Neocephalobus Halophilus n. sp. (Nematoda: Cephalobidae) Aus Salzböden der Umgebung von Halle (Saale).” Archiv Für Naturgeschichte 22, no. 2–3: 157–167. [Google Scholar]

[men70046-bib-0045] Paradis, E. , and Schliep K.. 2019. “Ape 5.0: An Environment for Modern Phylogenetics and Evolutionary Analyses in R.” Bioinformatics 35: 526–528. 10.1093/bioinformatics/bty633. [DOI] [PubMed] [Google Scholar]

[men70046-bib-0046] Prjibelski, A. , Antipov D., Meleshko D., Lapidus A., and Korobeynikov A.. 2020. “Using Spades de Novo Assembler.” Current Protocols in Bioinformatics 70, no. 1: e102. 10.1002/cpbi.102. [DOI] [PubMed] [Google Scholar]

[men70046-bib-0047] Qing, X. , Zhang Y. M., Sun S., et al. 2024. “Phylogenomic Insights Into the Evolution and Origin of Nematoda.” Systematic Biology 74, no. 3: 349–358. 10.1093/sysbio/syae073. [DOI] [PubMed] [Google Scholar]

[men70046-bib-0048] Quattrini, A. M. , Faircloth B. C., Dueñas L. F., et al. 2017. “Universal Target‐Enrichment Baits for Anthozoan (Cnidaria) Phylogenomics: New Approaches to Long‐Standing Problems.” Molecular Ecology Resources 18, no. 2: 281–295. 10.1111/1755-0998.12736. [DOI] [PubMed] [Google Scholar]

[men70046-bib-0049] Quinlan, A. R. , and Hall I. M.. 2010. “Bedtools: A Flexible Suite of Utilities for Comparing Genomic Features.” Bioinformatics 26, no. 6: 841–842. 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0050] Revell, L. J. 2024. “Phytools 2.0: An Updated r Ecosystem for Phylogenetic Comparative Methods (And Other Things).” PeerJ 12: e16505. 10.7717/peerj.16505. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0051] Roberts, N. G. , Gilmore M. J., Struck T. H., and Kocot K. M.. 2024. “Multiple Displacement Amplification Facilitates Smrt Sequencing of Microscopic Animals and the Genome of the Gastrotrich Lepidodermella squamata (Dujardin 1841).” Genome Biology and Evolution 16, no. 12: evae254. 10.1093/gbe/evae254. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0052] Robin, X. , Turck N., Hainard A., et al. 2011. “pROC: An Open‐Source Package for R and s+ to Analyze and Compare ROC Curves.” BMC Bioinformatics 12, no. 1: 77. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0053] Rubin, B. E. R. , Ree R. H., and Moreau C. S.. 2012. “Inferring Phylogenies From Rad Sequence Data.” PLoS One 7, no. 4: e33394. 10.1371/journal.pone.0033394. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0054] Saeys, Y. , Inza I., and Larrañaga P.. 2007. “A Review of Feature Selection Techniques in Bioinformatics.” Bioinformatics 23, no. 19: 2507–2517. 10.1093/bioinformatics/btm344. [DOI] [PubMed] [Google Scholar]

[men70046-bib-0055] Schiffer, P. H. , Nsah N. A., Grotehusmann H., Kroiher M., Loer C., and Schierenberg E.. 2014. “Developmental Variations Among Panagrolaimid Nematodes Indicate Developmental System Drift Within a Small Taxonomic Unit.” Development Genes and Evolution 224, no. 3: 183–188. 10.1007/s00427-014-0471-2. [DOI] [PubMed] [Google Scholar]

[men70046-bib-0056] Shannon, A. J. , Browne J. A., Boyd J., Fitzpatrick D. A., and Burnell A. M.. 2005. “The Anhydrobiotic Potential and Molecular Phylogenetics of Species and Strains Ofpanagrolaimus(Nematoda, Panagrolaimidae).” Journal of Experimental Biology 208, no. 12: 2433–2445. 10.1242/jeb.01629. [DOI] [PubMed] [Google Scholar]

[men70046-bib-0057] Shatilovich, A. , Gade V. R., Pippel M., et al. 2023. “A Novel Nematode Species From the Siberian Permafrost Shares Adaptive Mechanisms for Cryptobiotic Survival With c. Elegans Dauer Larva.” PLoS Genetics 19, no. 7: e1010798. 10.1371/journal.pgen.1010798. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0058] Shokoohi, E. , and Masoko P.. 2024. “Morphological and Molecular Analysis of Propanagrolaimus Siweyae sp. n. (Nematoda: Panagrolaimidae) From Molepo Dam, Limpopo Province, South Africa, and Its Relationship With Water Parameters.” Biologia 79, no. 12: 3573–3587. 10.1007/s11756-024-01806-2. [DOI] [Google Scholar]

[men70046-bib-0059] Steiner, G. 1929. “Cephalobus (Neocephalobus) Aberrans n. Sg. n. sp., (Rhabditidae, Nematodes) From the Feces of a Guinea‐Pig.” Journal of Parasitology 16, no. 2: 88–90. 10.2307/3271914. [DOI] [Google Scholar]

[men70046-bib-0060] Stevens, L. , Félix M.‐A., Beltran T., et al. 2019. “Comparative Genomics of 10 Newcaenorhabditisspecies.” Evolution Letters 3, no. 2: 217–236. 10.1002/evl3.110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0061] Strydom, T. , Lavan R. P., Torres S., and Heaney K.. 2023. “The Economic Impact of Parasitism From Nematodes, Trematodes and Ticks on Beef Cattle Production.” Animals 13, no. 10: 1599. 10.3390/ani13101599. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0062] Thapa, S. , Gates M. K., Reuter‐Carlson U., Androwski R. J., and Schroeder N. E.. 2019. “Convergent Evolution of Saccate Body Shapes in Nematodes Through Distinct Developmental Mechanisms.” EvoDevo 10, no. 1: 5. 10.1186/s13227-019-0118-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0063] van den Hoogen, J. , Geisen S., Wall D. H., et al. 2020. “A Global Database of Soil Nematode Abundance and Functional Group Composition.” Scientific Data 7, no. 1: 103. 10.1038/s41597-020-0437-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0064] van der Sprong, J. , de Voogd N. J., McCormack G. P., Sandoval K., Schätzle S., and Voigt O.. 2023. “A Novel Target‐Enriched Multilocus Assay for Sponges (Porifera): Red Sea Haplosclerida (Demospongiae) as a Test Case.” Molecular Ecology Resources 24, no. 2: e13891. 10.1111/1755-0998.13891. [DOI] [PubMed] [Google Scholar]

[men70046-bib-0065] Van Megen, H. , van den Elsen S., Holterman M., et al. 2009. “A Phylogenetic Tree of Nematodes Based on About 1200 Full‐Length Small Subunit Ribosomal DNA Sequences.” Nematology 11, no. 6: 927–950. 10.1163/156854109X456862. [DOI] [Google Scholar]

[men70046-bib-0066] Venables, W. N. , and Ripley B. D.. 2002. Modern Applied Statistics With s. Fourth ed. Springer. https://www.stats.ox.ac.uk/pub/MASS4/. [Google Scholar]

[men70046-bib-0067] Villegas, L. , Pettrich L., Acevedo‐Trejos E., Suwanngam A., Wassey N., and Allende M. L.. 2024. “Hierarchical Patterns of Soil Biodiversity in the Atacama Desert: Insights Across Biological Scales.” bioRxiv. 10.1101/2024.09.30.615889. [DOI] [Google Scholar]

[men70046-bib-0068] Williams, M. 1986. “The Use of Scanning Electron Microscopy in the Taxonomy of Panagrolaimus (Nematoda: Panagrolaimidae).” Nematologica 32, no. 1: 89–97. 10.1163/187529286X00057. [DOI] [Google Scholar]

[men70046-bib-0069] Winker, K. , Glenn T. C., and Faircloth B. C.. 2018. “Ultraconserved Elements (Uces) Illuminate the Population Genomics of a Recent, High‐Latitude Avian Speciation Event.” PeerJ 6: e5735. 10.7717/peerj.5735. [DOI] [PMC free article] [PubMed] [Google Scholar]

[men70046-bib-0070] Žibrat, U. , Viaene N., Širca S., van Beek J., Susič N., and Stare B. G.. 2023. “Nemdetect: Early Detection of Quarantine Nematodes in Potatoes Using Remote Sensing.” EFSA Supporting Publications 20, no. 12. 10.2903/sp.efsa.2023.EN-8143. [DOI] [Google Scholar]

PERMALINK

Ultraconserved Elements and Machine Learning Classifiers Enable Robust Phylogenetics and Taxonomy in Model and Non‐Model Nematodes

Laura Villegas

Lucy Jimenez

Joëlle van der Sprong

Oleksandr Holovachov

Ann‐Marie Waldvogel

Philipp H Schiffer

ABSTRACT

1. Introduction

2. Materials and Methods

2.1. Phylogenomic Analysis of the Family Panagrolaimidae: In Silico and in Vivo Testing

2.1.1. Bait Set Design and in Silico Testing

2.1.1.1. Bait Set Design

FIGURE 1.

2.1.2. Sampling, DNA Extraction and Sequencing

2.1.3. Analysis of Sequencing Data: In Vivo Testing and Phylogenomic Reconstruction

2.1.4. Species Reassessment Based on Morphological and Molecular Evidence

2.2. Ultraconserved Elements for Accurate Species Delimitation: Test Case Using Caenorhabditis

2.2.1. Bait‐Set Design and in Silico Testing

2.2.2. Phylogenetic Reconstruction of Caenorhabditis

2.3. Classification Model With Ultraconserved Elements (UCEs)

2.3.1. UCE Data Preparation and Preprocessing

TABLE 1.

2.3.2. Model Training, Evaluation, and Feature Selection

2.3.3. Optimised Model Application to Panagrolaimidae Data

3. Results

3.1. Bait Design

3.2. In Vitro and in Silico Testing Panagrolaimidae

3.3. Morphological and Molecular Evidence for Species Reassessment

FIGURE 2.

3.4. In Silico Testing Rhabditidae

FIGURE 3.

3.5. Classification Model With Ultraconserved Elements (UCEs)

3.5.1. Presence and Characteristics of UCEs in Rhabditidae and Panagrolaimidae

FIGURE 4.

FIGURE 5.

3.5.2. Benchmark of Machine Learning Models on Rhabditidae Data

TABLE 2.

3.5.2.1. Identification of Important Features UCEs

3.5.3. Classification Performance on Panagrolaimidae UCE Data

4. Discussion

4.1. Ultraconserved Elements Can Be Identified and Retrieved for Nematodes and Can Be Used for Robust Phylogenetic Reconstructions

4.2. Panagrolaimus Detritophagus BSS8 Is in Fact Neocephalobus Halophilus BSS8 (Paetzold 1958)

4.3. A Minimum UCE Informative Set Can Accurately Classify Taxa to Genus Level

4.4. UCE‐Based Nematode Classification Potential for Biodiversity Assessments and Agricultural Monitoring

Author Contributions

Conflicts of Interest

Supporting information

Acknowledgements

Contributor Information

Data Availability Statement

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases